Icon

Synthetic Data Augmentation with Copulas

<p>The University of Saskatchewan<br>Ph.D. in Interdisciplinary Studies<br><br>Created by: Carlos Enrique Diaz, MBM, P.Eng.<br>Email: carlos.diaz@usask.ca<br><br>Supervisor: Lori Bradford, Ph.D.<br>Email: lori.bradford@usask.ca</p><p></p><p><strong>Description:</strong></p><p>This workflow demonstrates how to assess the quality of synthetic data generated using the <strong>Synthetic Data (Copulas)</strong> component in KNIME. It uses the well-known <strong>Iris dataset</strong> as a reference.</p><p><strong>Section 1: Original Data Analysis with 150 Observations</strong></p><ul><li><p>Loads and preprocesses the Iris dataset (150 rows).</p></li><li><p>Uses <strong>Linear Correlation</strong> and <strong>Statistics</strong> nodes to explore the original data’s structure and relationships.</p></li></ul><p><strong>Section 2: Mixed Data with 650 Observations</strong></p><ul><li><p>Generates <strong>500 synthetic rows</strong> using the <strong>Synthetic Data (Copulas)</strong> component.</p></li><li><p>Merges the synthetic data with the original data (total: 650 rows).</p></li><li><p>Applies the same analysis nodes to compare the combined dataset with the original.</p></li></ul><p><strong>Section 3: Pure Synthetic Data with 500 Observations</strong></p><ul><li><p>Filters to keep only the <strong>500 synthetic rows</strong>.</p></li><li><p>Runs correlation and statistical analysis again to evaluate the synthetic data on its own.</p></li></ul><p>This workflow is a simple and effective way to visualize and compare the statistical quality of synthetic data using built-in KNIME nodes.</p>

URL: Synthetic Data (Copulas) Component https://hub.knime.com/s/UBYggD5QV5lD8kfu

Nodes

Extensions

Links