Icon

DS-363_​Example_​WF_​for_​One-Hot_​Encoder_​Component

Example Workflow for One-Hot Encoder (Biological Sequences) Component
In this example workflow we demonstrate the usage of One-Hot Encoder (Biological Sequences) component which is part of the KNIME Verified Components.After reading FASTA files using another verified component (FASTA Reader) created for this purpose, we pass the table containing cDNA sequences to the One-HotEncoder component which turns the sequences to one-hot encoded vectors. We use these one-hot encoded vectors to train a deep learning network (CNN) createdusing the KNIME keras integration. The data contains cDNA sequences where some of these sequences represent RNAs that are binding preferences to ELAVL1A protein. The model is trained and topredict if a sequence is a binding preference for this particular protein or not. ELAVL1A_train.fa.gzvisualise prediction results (confusion matrix & ROC curve)use the interactive view to investigate which sequences are correctly classified and which are not.define a keras deep learning NWELAVL1A_test.fa.gzextract target class &split collection columnextract target class &split collection columnFASTA Reader One-Hot Encoder(Biological Sequences) Keras NetworkExecutor Keras NetworkLearner Format PredictionOutput Visualise ModelPerformance Define NW (CNN) FASTA Reader One-Hot Encoder(Biological Sequences) Preprocess Preprocess In this example workflow we demonstrate the usage of One-Hot Encoder (Biological Sequences) component which is part of the KNIME Verified Components.After reading FASTA files using another verified component (FASTA Reader) created for this purpose, we pass the table containing cDNA sequences to the One-HotEncoder component which turns the sequences to one-hot encoded vectors. We use these one-hot encoded vectors to train a deep learning network (CNN) createdusing the KNIME keras integration. The data contains cDNA sequences where some of these sequences represent RNAs that are binding preferences to ELAVL1A protein. The model is trained and topredict if a sequence is a binding preference for this particular protein or not. ELAVL1A_train.fa.gzvisualise prediction results (confusion matrix & ROC curve)use the interactive view to investigate which sequences are correctly classified and which are not.define a keras deep learning NWELAVL1A_test.fa.gzextract target class &split collection columnextract target class &split collection columnFASTA Reader One-Hot Encoder(Biological Sequences) Keras NetworkExecutor Keras NetworkLearner Format PredictionOutput Visualise ModelPerformance Define NW (CNN) FASTA Reader One-Hot Encoder(Biological Sequences) Preprocess Preprocess

Nodes

Extensions

Links