Icon

MSNet-Sirius-1.260626

<p>MS-Net was designed to leverage on LC-MS data processing using MZmine or MSdial and Sirius CSI. Follow each step from 1 to 5 by double click on each node of the workflow. The details of each function can be displayed using the "?" button at the bottom of component menu.<br>Since many process are parralellized, its highly recommanded to not doing other tasks during the whole process.</p>

MS-Net was designed to leverage on LC-MS data processing using MZmine or MSdial and Sirius CSI. Follow each step from 1 to 5 by double click on each node of the workflow. The details of each function can be displayed using the "?" button at the bottom of component menu.
Since many process are parralellized, its highly recommanded to not doing other tasks during the whole process.

For NEGATIVE mode
-Tune 1 to 4 and launch.

For POSITIVE mode
-Tune 1 to 4 and launch

For BOTH mode integration
-Tune 1 to 3 in both mode, launch separatly
-Tune 5 and 6 then launch

-Samples in pos and neg mode MUST have the same names or share the same index
-If fail, launch pos and neg mode separatly

MS-Net Input: MZmine/MSdial & SIRIUS

Launch Report PDF to write charts in the output result folder

SOME WARNINGS

  • MS-Net use MS2 data. Features without MS/MS spectra are discarded

  • A column named "Class" must be present in input height/area table defining sample category/class(Control, WT, QC...)

  • MS-Net use pearson correlation among samples in many processing steps. As a consequence, MSnet do not handle one sample only.

  • MS-Net need to seed the Mass Spectral Similarity network with confident annotation (level 1 or 2). Weencourage using mass spectral library matching during data processing (FragHub, GNPS2, MassBank, ...).

  • The size of input chemical space (number of feature x topK hit) will drastically influence processing time

  • If pos and neg mode must be merged, launch node 1 to 3 in one mode before the other to not overload your processor.

Before You Start

- [ ] KNIME 5.2+ installed with RDKit extension

- [ ] All software sites enabled in KNIME preferences

- [ ] Memory allocation configured (leave only 8GB for system)

- [ ] All other programs closed (MS-Net is memory-intensive)

- [ ] Directory structure organized (separate folders for each export)

- [ ] POS and NEG files have matching names/order (if merging modes)

🔄 Typical Workflow Pipeline

Raw Data (mzML/raw)

MZmine/MS-DIAL (Feature Detection)

Sirius-CSI (In Silico Annotation)

MS-Net (Network-Based Prioritization)

Results + Optional Cytoscape Visualization

📁 Directory Organization

Project_Folder/

├── raws_pos/ # LC-MS raw files (positive mode)

├── raws_neg/ # LC-MS raw files (negative mode)

├── MZmine_pos/ # ⚠️ ONE export per folder!

│ ├── [prefix]_full_feature_table.csv

│ ├── [prefix]_edges_ms2_cosine.csv

│ ├── [prefix]_annotations.csv

│ └── [prefix]_sirius.mgf

├── MZmine_neg/ # Same structure for negative mode

├── Sirius_pos/ # ⚠️ ONE export per folder!

│ └── structure_identifications_all.csv and all other sirius outputs

├── Sirius_neg/ # Same for negative mode

└── MSNet-POSNEG/ # Results folder (workflow output)

Module 1: Data Import

# DO NOT LEAVE EMPTY - use "xxxxx" if no value

target_genus: "Cannabis" # Seed elevation criterion

target_family: "Cannabaceae" # Seed elevation criterion

topK_insilicoDB: 50 # Balance: speed vs coverage

inhouse_DBmatch:10 #look for level 3a only in first top n ranked in silico matches

chromato_mode: "_C18pos" # For POS/NEG merging

IIN_filtering: TRUE # Keep most common adducts only

Module 2: Network Population

cosine_threshold: 0.70 # MSS network cutoff (0.6-0.8)

deltaRT_threshold: 8.0 # RT window (minutes) treshold between tow nodes

apply_RT_clustering: TRUE # ⏱️ slow! Disable for short gradient acquisition (<12 min)

RT_bucket_size: 0.01 # For UHPLC: 0.0015

Module 3: Annotation

fingerprint_type: "PubChem" # PubChem/Morgan/RDKit

topN_link_score: 5 # Neighbor links (5-20)

alpha: 0.3 # 0.2-0.3 for most datasets

deltaRT_xlogP: 4.0 # Only for C18 methods, avoid spurious annotation regarding XlogP vs RT

double_ID_filtering: TRUE # Remove ID duplicates

Module 4: Export

tanimoto_threshold: 0.8 # Structural network cutoff

look_for_DB_identifiers: TRUE # ⏱️ Slow, need internet!

🚨 Common Issues & Solutions

Issue: Workflow fails on first execution

Solution: Click ResetStart again (normal behavior)

Issue: "Processing time very long"

Solutions:

- Close all other programs

- Lower `topK_insilicoDB` (try 20 instead of 50)

- Disable RT clustering if not critical

- Do NOT run POS and NEG simultaneously

Issue: "No Level 3a annotations"

Solutions:

- Check taxonomic spelling (case-sensitive!, based on NCBI taxonomy)

- Try broader filters (family instead of genus, in-house SMILES list used in annotation tools)

Issue: "POS/NEG merging fails"

Solution: Files must have:

- Identical names across modes

- Same number of samples

- Same acquisition order

→ If not possible, process modes separately and manual concatenate afterwards

📊 Output Files Explained

| File | Description | Use |

|------|-------------|-----|

| `AllAnnotation[mode].csv` | Raw annotations before processing | Reference check |

| `EdgestopK_MSMS_[mode].csv` | MSS network edges | Cytoscape |

| `EdgestopK_Tanimoto_[mode].csv` | Structural network edges | Cytoscape |

| `Nodes[mode].csv` | Node metadata | Cytoscape |

| `Results[mode].xlsx` | Final annotated feature table | Main output |

| `RTCluster[mode].csv` | RT clustering results | QC |

| `StatTable[mode].csv` | Intensity matrix (sum normalized) | Statistics |

| `Report[mode].pdf` | Figures and summaries | QC/Publication |

🎨 Fingerprint Selection Guide

| PubChem | 881 bits | General-purpose, diverse datasets | Higher score (less discriminative) |

| Morgan | 2048 bits | Structurally similar compounds/MSS cluster| Lower score value (highly discriminative) |

| RDKit | 2048 bits | Balanced, medium diversity/large MSS clusters | Medium |

📈 Alpha Parameter (α) Selection

| 0.2-0.3 | Spectral/structural similarity | High-quality seeds, well MSS clustered network |

| 0.4-0.5 | Balanced | Mixed quality, moderate network density |

| 0.6-0.8 | In silico predictions | Few seeds, high network density|

🔬 Confidence Level Summary

| 1 | Authentic standard | Highest confidence |

| 2a | High-confidence library match (>0.85) | Very High confidence |

| 2b | Moderate library match (0.70-0.85) | High confidence |

| 3a | Taxonomically/ user defined elevated in silico | 1 to 3a:Seed for propagation |

| 3b | Network-propagated annotation | Medium confidence |

| 4 | MS/MS Analogs | Low confidence |

| 5 | Unassigned | Lowest confidence |

⏱️ Expected Processing Times

Dataset Size Guidelines

- Small (<500 features): 5-15 min per mode

- Medium (500-2000 features): 15-30 min per mode

- Large (>2000 features): 30-60+ min per mode

Time-Consuming Steps

1. RT Clustering: Can take minutes for large datasets

2. DB Identifier Lookup: Depends on internet

3. Network Propagation: Scales with network density

Speed Optimization

- Process POS and NEG separately (never simultaneously)

- Lower topK values (20 instead of 50)

- Disable RT clustering if not essential

- Skip DB identifier lookup for initial runs

🎓 Tutorial Contact Information

Authors:

- Guillaume Marti (guillaume.marti@utoulouse.fr)

Institutions:

- Laboratoire de Recherche en Sciences Végétales (LRSV)

- Metatoul-AgromiX Platform, Toulouse, France

- MetaboHUB National Infrastructure

## 📚 Key Citation

MS-Net Manuscript (in preparation):

Pereira Francisco V, Duthen S, Crossay E, et al. (2025)

MS-Net: Multi-Similarity based network annotation for untargeted metabolomics

plots
Report Page Break
Annotate the final network
3. Network Based annotation
Report PDF Writer
Tune the network length
2. Populate Network
Select data to import in POSmode
1. Data Upload-POS
Report Concatenate
Select data to import in NEG mode
1. Data Upload-NEG
Report PDF Writer
Tune the network length
2. Populate Network
Plots
Report Page Break
Plots
Report Page Break
6. Save Data
Report PDF Writer
6. Save Data
Reset this node before processing
Report Template Creator
Annotate the final network
3. Network Based annotation
Plots
Report Page Break
5. Concat POSNEG
6. Save Data

Nodes

Extensions

Links