Icon

00b-KNIME_​Introduction_​Proteomics

1) Reading in the data is often thestarting point for a workflow.KNIME provides many possibleways, but the "CSV Reader" and the"Excel reader" are most handy forany kind of data.This workflow comes with a bundledCSV file, and the node is configuredto read in this file. Usually, youwould point the node to a file onyour harddisk and set theparameters according to your file.To actually run this workflow, youneed to connect the "CSV Reader"to the "Math Formula" node bydrawing a line from the output tothe input port. KNIME Introduction Workflow IIThis workflow shows a more real-life scenario, using proteomics data Tasks- Filter out data points with p-value > 0.05- Create a PCA analysis of the data, using thenormalized quantitative values. Plot yourresults in a scatter plot. 2) KNIME'sstrength lies inquickly handlingany data andallowingreproducibleanalyses.Therefore, manynodes perform justa simple step, likethis node, whichperforms amathematicaloperation on thedata. 3) filtering isanother veryimportant aspect,which can beperformed oncolumn and rowlevel. 4) We want to perform an ANOVA analysis of our data. The "one-way ANOVA" in KNIMEneeds the data of one group (i.e. sample in our case) to be in one row instead of a column.Hence, the data must be transposed (and hence we filtered out all unnecessary data withthe column filter before). Furthermore, the group label must be connected to each row.For this, we the "Table Creator" was used to create a table with the information of the group("a" and "b") for each row. To combine the information to the table, a "Joiner" is used. Thiscombines the rows of both incoming tables on some specific data, in our case the samplename.After running the ANOVA, the data is filtered to contain only the data with the p-values (the"Between groups" results). Finally, the columns are renamed for convinience. 5) All further nodes are repetitions of alreadyknown nodes, or self-explaining. Anotherinteresing aspect are "meta nodes", which allowyou to combine several nodes into one node. Tosee, what's behind a meta-node, double click it.This is great for keeping your workflow clean.Another, not shown aspect, are Java Snippets:these allow the execution of generic Java codeand thus calculate anything on the row data.There are also nodes for R-, Python- and Perl-Snippets, allowing the same for the respectivelanguages, making KNIME workflows evenricher.To write out your results and process them withexternal tools, you can use various CSV, Excel andFile writers. Group all rows by the select columnsand aggregate the rows as given bythe aggregation methodThis node is very mighty, the specificcolumn-wise node is "Column aggregation"volcano plotread in proteomicsquant dataasinh transformmultiple columnstranspose the datakeep the quantdata onlyconnection betweensample and groupjoin bothincoming tablesperform theANOVAfilter outunnecessary rowscombine the p-valueswith the quantitativeinformation in a big tablecolor byp-value oksort by p-valuebin by p-valuesplot the bins(not a good p-valuedistribution in thesecut example data)set p-value thresholdrename the columnsthis is a meta-node,which holds several nodesdouble-click to openWrite out the tableinto CSV formatfilter outp-value >= 0.05GroupBy Scatter Plot CSV Reader Math Formula(Multi Column) Transpose Column Filter Table Creator Joiner One-way ANOVA Row Filter Joiner Color Manager Sorter Auto-Binner Bar Chart Math Formula Column Rename do somecalculations CSV Writer Row Filter 1) Reading in the data is often thestarting point for a workflow.KNIME provides many possibleways, but the "CSV Reader" and the"Excel reader" are most handy forany kind of data.This workflow comes with a bundledCSV file, and the node is configuredto read in this file. Usually, youwould point the node to a file onyour harddisk and set theparameters according to your file.To actually run this workflow, youneed to connect the "CSV Reader"to the "Math Formula" node bydrawing a line from the output tothe input port. KNIME Introduction Workflow IIThis workflow shows a more real-life scenario, using proteomics data Tasks- Filter out data points with p-value > 0.05- Create a PCA analysis of the data, using thenormalized quantitative values. Plot yourresults in a scatter plot. 2) KNIME'sstrength lies inquickly handlingany data andallowingreproducibleanalyses.Therefore, manynodes perform justa simple step, likethis node, whichperforms amathematicaloperation on thedata. 3) filtering isanother veryimportant aspect,which can beperformed oncolumn and rowlevel. 4) We want to perform an ANOVA analysis of our data. The "one-way ANOVA" in KNIMEneeds the data of one group (i.e. sample in our case) to be in one row instead of a column.Hence, the data must be transposed (and hence we filtered out all unnecessary data withthe column filter before). Furthermore, the group label must be connected to each row.For this, we the "Table Creator" was used to create a table with the information of the group("a" and "b") for each row. To combine the information to the table, a "Joiner" is used. Thiscombines the rows of both incoming tables on some specific data, in our case the samplename.After running the ANOVA, the data is filtered to contain only the data with the p-values (the"Between groups" results). Finally, the columns are renamed for convinience. 5) All further nodes are repetitions of alreadyknown nodes, or self-explaining. Anotherinteresing aspect are "meta nodes", which allowyou to combine several nodes into one node. Tosee, what's behind a meta-node, double click it.This is great for keeping your workflow clean.Another, not shown aspect, are Java Snippets:these allow the execution of generic Java codeand thus calculate anything on the row data.There are also nodes for R-, Python- and Perl-Snippets, allowing the same for the respectivelanguages, making KNIME workflows evenricher.To write out your results and process them withexternal tools, you can use various CSV, Excel andFile writers. Group all rows by the select columnsand aggregate the rows as given bythe aggregation methodThis node is very mighty, the specificcolumn-wise node is "Column aggregation"volcano plotread in proteomicsquant dataasinh transformmultiple columnstranspose the datakeep the quantdata onlyconnection betweensample and groupjoin bothincoming tablesperform theANOVAfilter outunnecessary rowscombine the p-valueswith the quantitativeinformation in a big tablecolor byp-value oksort by p-valuebin by p-valuesplot the bins(not a good p-valuedistribution in thesecut example data)set p-value thresholdrename the columnsthis is a meta-node,which holds several nodesdouble-click to openWrite out the tableinto CSV formatfilter outp-value >= 0.05GroupBy Scatter Plot CSV Reader Math Formula(Multi Column) Transpose Column Filter Table Creator Joiner One-way ANOVA Row Filter Joiner Color Manager Sorter Auto-Binner Bar Chart Math Formula Column Rename do somecalculations CSV Writer Row Filter

Nodes

Extensions

Links