Icon

06 Aggregations - Solution

Solution to an exercise for data aggregation.

Calculate summary statistics for subgroups of data with the GroupBy and Pivoting nodes.

CHECK YOUR ANSWERS:

GroupBy:
a. The average age is 37 years for the women and 39 years for the men
b. Husband is the most common family relationship of the German people in the data
c. The number of rows in the data is 32549

Pivoting:
a. The most common combination of age bin and work class is <35 years and Private. 10936 people belong to this group.
b. The most widespread education level in the Private workclass is HS-grad








Exercise: GroupBy1) Read the adult.csv file by executing the CSV Reader node2) Calculate the total number of rows and average age by gender3) Calculate the modes of all string columns separately for each native country4) Calculate - the number of missing values in the occupation column- the number of non-missing rows in the occupation column- the number of rows in the occupation column- the number of rows in the marital-status column Notice that the last two aggregations should provide the same numbers! The most widespread educationlevel in the private workclass isnumber 9 (HS-grad). The most common combination ofage bin and work class is 34 orless years old and Private. 10936people belong to this group. Exercise: Pivoting1) Read the adult_binned.csv file by executing the CSV Reader node2) Calculate the number of people in groups according to their work class and age bin- What is the most common combination of age bin and work class?- How many people belong to this group?3) Calculate the mode of education level in groups according to their work class and age bin- What is the most widespread education level in the private workclass independently of theage bin? The total number of rows and average age by genderThe number of missing and non-missing values in occupationand total rows in occupation and in marital-statusCreate the table with age-bin as a groupand workclass as a pivot and calculatethe number of people in groupsThe modes of string columns by native countryCreate the table with age-bin as a group and workclass as a pivot and find the mostwidespread level of education in the private workclassRead adult.csvRead adult_binned.csv GroupBy GroupBy Pivoting GroupBy Pivoting CSV Reader CSV Reader Exercise: GroupBy1) Read the adult.csv file by executing the CSV Reader node2) Calculate the total number of rows and average age by gender3) Calculate the modes of all string columns separately for each native country4) Calculate - the number of missing values in the occupation column- the number of non-missing rows in the occupation column- the number of rows in the occupation column- the number of rows in the marital-status column Notice that the last two aggregations should provide the same numbers! The most widespread educationlevel in the private workclass isnumber 9 (HS-grad). The most common combination ofage bin and work class is 34 orless years old and Private. 10936people belong to this group. Exercise: Pivoting1) Read the adult_binned.csv file by executing the CSV Reader node2) Calculate the number of people in groups according to their work class and age bin- What is the most common combination of age bin and work class?- How many people belong to this group?3) Calculate the mode of education level in groups according to their work class and age bin- What is the most widespread education level in the private workclass independently of theage bin? The total number of rows and average age by genderThe number of missing and non-missing values in occupationand total rows in occupation and in marital-statusCreate the table with age-bin as a groupand workclass as a pivot and calculatethe number of people in groupsThe modes of string columns by native countryCreate the table with age-bin as a group and workclass as a pivot and find the mostwidespread level of education in the private workclassRead adult.csvRead adult_binned.csv GroupBy GroupBy Pivoting GroupBy Pivoting CSV Reader CSV Reader

Nodes

Extensions

Links