Icon

3. Data Aggregation

<p>I nearly lost the plot on this one! Data preparation truly does make up 80% of the entire analytics process.</p><p>In this workflow, I attempt bringing in a different dataset for data aggregation, which is basically condensing the data into a more manageable grouping of values by categories instead of listing them individually. This can be done with the following nodes:</p><ul><li><p>Row Aggregator</p></li><li><p>Pivot</p></li></ul><p>This workflow has a more tedious transformation stage because right at the first step the values in the dataset are string values denoted by 'M' and 'B' to indicate millions and billions, but as soon as the characters ‘M’ and ‘B’ are removed from the string value, then there is no way to tell their true values in the very large dataset.&nbsp;</p><p>The comments below each node describes its function in getting these values ready for exploration.</p><p>Some of the code expressions I had to use in the nodes to get to the point of readiness for aggregation are as follows:</p><ul><li><p>regexMatcher()</p></li><li><p>removeChars()</p></li><li><p>floor()</p></li></ul>

URL: Kaggle dataset I used https://www.kaggle.com/datasets/soumyodippal000/top-2000-companies-financial-data-2024-dataset

Nodes

Extensions

Links