Icon

ciullajn_​FinalProject_​IT4015

Data Cleansing John CiullaFinal ProjectIT 4015CAugust 1, 2023Link to Datasets: https://www.kaggle.com/datasets/kaggleprollc/planetary-systems-dataset-nasa?resource=download DataManipulation Visualizations In this section, I read from the CSV file and filtered onseveral things.1. I removed all of the missing values regarding thehd_name column. The outcome was the removal of saidrows.2. I only included in the result set the sy_snum's with 3 orabove, which result in such.3. The first two columns (Row ID and rowid) were off byone, so I made them the same for easy consistency. Data ScienceModels In this section, I joined the two CSV filestogether and then performed a GroupByon them. I wanted to see the number ofdifferent pl_names that are present andthe total of them For the Visualizations, I created a pie chartthat showed the averages of theoccurences of the pl_names. I then did abar chart that showed the sums in aeasily visible way. Lastly, for the data science models, Iconverted the group by into a data toreport node for a cleaner and moreappropriate table display of the results.Then, I took the bar chart image andmoved it into an image to report node for,again, a more appropriate display. NASA_Exoplanet_Composite.csvRemove missing valuein hd_name columnOnly includesy_snum 3 or aboveMade rowid and Row IDsame value - they were1 apartNASA_Exoplanet_Composite.csvNASA_Exoplanet_Planetary.csvInner JoinNumber ofnamesNumber of namesAverage ofOccurrencesNode 12Node 13 CSV Reader Row Filter Row Filter Column Expressions CSV Reader CSV Reader Joiner GroupBy Bar Chart Pie/Donut Chart Image to Report Data to Report Data Cleansing John CiullaFinal ProjectIT 4015CAugust 1, 2023Link to Datasets: https://www.kaggle.com/datasets/kaggleprollc/planetary-systems-dataset-nasa?resource=download DataManipulation Visualizations In this section, I read from the CSV file and filtered onseveral things.1. I removed all of the missing values regarding thehd_name column. The outcome was the removal of saidrows.2. I only included in the result set the sy_snum's with 3 orabove, which result in such.3. The first two columns (Row ID and rowid) were off byone, so I made them the same for easy consistency. Data ScienceModels In this section, I joined the two CSV filestogether and then performed a GroupByon them. I wanted to see the number ofdifferent pl_names that are present andthe total of them For the Visualizations, I created a pie chartthat showed the averages of theoccurences of the pl_names. I then did abar chart that showed the sums in aeasily visible way. Lastly, for the data science models, Iconverted the group by into a data toreport node for a cleaner and moreappropriate table display of the results.Then, I took the bar chart image andmoved it into an image to report node for,again, a more appropriate display. NASA_Exoplanet_Composite.csvRemove missing valuein hd_name columnOnly includesy_snum 3 or aboveMade rowid and Row IDsame value - they were1 apartNASA_Exoplanet_Composite.csvNASA_Exoplanet_Planetary.csvInner JoinNumber ofnamesNumber of namesAverage ofOccurrencesNode 12Node 13 CSV Reader Row Filter Row Filter Column Expressions CSV Reader CSV Reader Joiner GroupBy Bar Chart Pie/Donut Chart Image to Report Data to Report

Nodes

Extensions

Links