Challenge 22 - Analyzing Top Streaming Artists

Challenge 22: Analyzing Top Streaming ArtistsLevel: MediumDescription: You’re part of a music analytics team exploring what drives global listening trends. Using a dataset of most streamed songs up to 2024, your goal is to uncover which artists dominate the charts. Clean and transform the data, and then answer these key questions: (1) Which artists release the most music? (2) Who ranks highest in average track score? (3) Who leads the streams overall?. Finally, discover which artists make the cut on both popularity and consistency.<ul><li>Beginner-friendly objective(s): 1.Load the dataset and filter for only these columns: Track, Artist, Release Date, Track Score, and Spotify Streams. 2.Convert necessary data types for accurate analysis, such as transforming string-based numbers to numerical formats. 3.Handle missing values and perform data transformations, such as converting date strings to date formats and extracting specific date parts. 4. Aggregate data to answer the question: How many tracks were generated per year? Visualize the results using a bar chart.</li><li>Intermediate-friendly objective(s): 1. Implement filtering operations to identify top-performing artists based on: Top 10 artists with the highest mean track score (minimum of 10 tracks), and Top 10 artists with the highest total streams. 2.Find artists who appear in both of these lists.</li></ul>How many artists make the final list? Solution Summary: The solution involves a comprehensive workflow that processes and analyzes a dataset of Spotify's most streamed songs. It begins with reading and filtering the dataset to focus on key columns. The workflow then converts data types for accurate analysis, handles missing values, and performs data transformations. Aggregation techniques are applied to calculate metrics like track count and total streams per artist. The solution also includes filtering and sorting operations to identify top artists, and it concludes with data visualization and joining datasets for a holistic view. Solution Details: The workflow starts with a CSV Reader node configured to read the "Most Streamed Spotify Songs 2024.csv" file, ensuring the correct handling of headers, delimiters, and encoding. A Column Filter node follows, retaining only essential columns like "Track," "Artist," "Release Date," "Track Score," and "Spotify Streams." The String to Number node converts the "Spotify Streams" column to a Long integer, facilitating numerical analysis. A Missing Value node removes rows with missing data, ensuring data integrity. Next, the String to Date&Time node converts the "Release Date" column to a date format, followed by a Date&Time Part Extractor node that extracts the year from the release date. The GroupBy node aggregates data by artist, calculating metrics like track count, mean track score, and total streams. A Number to String node converts the "Year" column to a string format for consistency. The workflow includes a Row Filter node to retain artists with at least 10 tracks, and a Top k Row Filter node identifies the top 10 artists by total streams and mean track score. A Sorter node arranges data by mean track score in descending order. The workflow concludes with a Joiner node that merges datasets based on row keys, and a Bar Chart node visualizes the track count per year. Finally, a Table View node displays the sorted and filtered data for further exploration.

Challenge 22 - Analyzing Top Streaming Artists

Nodes

Extensions

Links

Download