Finding disease-causal variants among large amounts of present variants remains a major challenge in next-generation sequencing experiments data analysis ("Needles in stacks of needles", Cooper 2011).
One of the most frequently used formats to store variant information is the Variant Call Format (VCF). As extracting information from complex genetic variation data encoded in VCF files is not a straightforward task, there are several command line tools for filtering and querying information in VCF files with the ultimate goal to detect disease-causal variants.
This workflow illustrates how to mine your VCF files within KNIME Analytics Platform with the ultimate goal to find variants associated with a specific disease.
We utilize three common tools: BCFtools, VCFtools and VEP (via the Ensembl Rest API) to filter and annotate the variants. The domain expert can interactively select variants of interest, filter by allele frequency in the 1000 genomes project and gnomeAD or by predicted deleteriousness of a variant (SIFT Score).
Requirements:
- Run Bash scripts
- Install tabix, VCFtools and BCFtools
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.