Icon

Group_​2-Building_​a_​Component

KNIME Pros Learnathon - Group 2 : Life Sciences

Workshop content for the KNIME Pros Learnathon:

Come to our new Learnathon for Advanced Users of KNIME! Today's topic is... components! Are you ready to learn how to build a component, give it its own configuration window and/or its own composite view? In this session you will learn how to build and share reliable and user friendly components that act just like standard KNIME nodes.
The learnathon will begin with a detailed introduction to components and related KNIME features, then we will split into three groups. Each group will work on a different category of Verified Components, focusing on use cases.

Current Group: Life Sciences

Temesgen Dadi, Technical Data Scientist in the KNIME Life Sciences team, will teach how to build and share components for the analysis of biological data. You will be guided through the process of creating a shared component, from building the workflow inside it to sharing the component on the KNIME Hub. The component you will create will be able to read in FASTA files and visualize the sequence length distribution of the biological sequences. Beside that you will learn how to create a configuration option and propagate proper error messages.

IntroductionIn bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein)sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names andcomments to precede the sequences. Here is a toy example on what FASTA file looks like: > Sequence ID1 ATGTGTCCCCGAGCCGCGCGGGCGCCC > Sequence ID2 GCGACGCTACTCCTCGCCCTGGGCGCGGTGCTG TGGCCTGCGGCTGGCGCCTG Note that some of the entries could span multiple lines. Our goal today is to read such files as KNIME table containing three columns------------------------------------------------------------------- ID | Sequence | SequenceLength |------------------------------------------------------------------- Multiline spanning sequences need to be concatinated and presented as a single sequence. We will guide you with instructions provided asworkflow annotations with yellow border Group 2 - Component for reading biological sequences 1. Create a workflow that reads FASTA files and visualizes the sequence lengthdistribution2. Encapsulate the workflow into a component3. Create a configuration option that enables browsing and selecting a FASTA file4. Check the input and write proper error messages5. Add the components description and share it on the KNIME Hub6. Use your shared componentDetailed instructions for these steps are available inside the metanodes. Eachmetanode also shows the solution to the tasks in the previous metanode. Step 1: Build the functionality with KNIME nodesStep 2: Create a component. Customize itsinteractive view.Step 3: Create a configuration option that enables browsing and selecting a FASTA fileStep 4: Check the input and write proper error messagesStep 6: Drag&dropthe component from the KNIME HubStep 5: Edit the description.Share the component on the KNIME Hub.Create the workflow Create a component Create aconfiguration option Check the input Use your sharedcomponent Add descriptionand share IntroductionIn bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein)sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names andcomments to precede the sequences. Here is a toy example on what FASTA file looks like: > Sequence ID1 ATGTGTCCCCGAGCCGCGCGGGCGCCC > Sequence ID2 GCGACGCTACTCCTCGCCCTGGGCGCGGTGCTG TGGCCTGCGGCTGGCGCCTG Note that some of the entries could span multiple lines. Our goal today is to read such files as KNIME table containing three columns------------------------------------------------------------------- ID | Sequence | SequenceLength |------------------------------------------------------------------- Multiline spanning sequences need to be concatinated and presented as a single sequence. We will guide you with instructions provided asworkflow annotations with yellow border Group 2 - Component for reading biological sequences 1. Create a workflow that reads FASTA files and visualizes the sequence lengthdistribution2. Encapsulate the workflow into a component3. Create a configuration option that enables browsing and selecting a FASTA file4. Check the input and write proper error messages5. Add the components description and share it on the KNIME Hub6. Use your shared componentDetailed instructions for these steps are available inside the metanodes. Eachmetanode also shows the solution to the tasks in the previous metanode. Step 1: Build the functionality with KNIME nodesStep 2: Create a component. Customize itsinteractive view.Step 3: Create a configuration option that enables browsing and selecting a FASTA fileStep 4: Check the input and write proper error messagesStep 6: Drag&dropthe component from the KNIME HubStep 5: Edit the description.Share the component on the KNIME Hub.Create the workflow Create a component Create aconfiguration option Check the input Use your sharedcomponent Add descriptionand share

Nodes

Extensions

Links