Icon

06 Aggregations - Solution

Solution to an exercise for data aggregation.

Calculate summary statistics for subgroups of data with the GroupBy and Pivoting nodes.

CHECK YOUR ANSWERS:

GroupBy:
a. The average age is 37 years for the women and 39 years for the men
b. Husband is the most common family relationship of the German people in the data
c. The number of rows in the data is 32561

Pivoting:
a. The most common combination of age bin and work class is <35 years and Private. 10936 people belong to this group.
b. The most widespread education level in the Private workclass is HS-grad







Exercise: GroupBy1) Read the adult.csv file by executing the File Reader node2) Calculate the total number of rows and average age by gender3) Calculate the modes of all string columns separately for each native country4) Calculate - the number of missing values in the occupation column- the number of non-missing rows in the occupation column- the number of rows in the occupation column- the number of rows in the marital-status column Notice that the last two aggregations should provide the same numbers! The most widespread educationlevel in the private workclass isnumber 9 (HS-grad). The most common combination ofage bin and work class is 34 orless years old and Private. 10936people belong to this group. Exercise: Pivoting1) Read the adult_binned.csv file by executing the File Reader node2) Calculate the number of people in groups according to their work class and age bin- What is the most common combination of age bin and work class?- How many people belong to this group?3) Calculate the mode of education level in groups according to their work class and age bin- What is the most widespread education level in the private workclass independently of theage bin? The total number of rows and average age by genderThe number of missing and non-missing values in occupationand total rows in occupation and in marital-statusCreate the table with age-bin as a groupand workclass as a pivot and calculatethe number of people in groupsRead data adult.csvThe modes of string columns by native countryRead data adult_binned.csvCreate the table with age-bin as a group and workclass as a pivot and find the mostwidespread level of education in the private workclass GroupBy GroupBy Pivoting File Reader GroupBy File Reader Pivoting Exercise: GroupBy1) Read the adult.csv file by executing the File Reader node2) Calculate the total number of rows and average age by gender3) Calculate the modes of all string columns separately for each native country4) Calculate - the number of missing values in the occupation column- the number of non-missing rows in the occupation column- the number of rows in the occupation column- the number of rows in the marital-status column Notice that the last two aggregations should provide the same numbers! The most widespread educationlevel in the private workclass isnumber 9 (HS-grad). The most common combination ofage bin and work class is 34 orless years old and Private. 10936people belong to this group. Exercise: Pivoting1) Read the adult_binned.csv file by executing the File Reader node2) Calculate the number of people in groups according to their work class and age bin- What is the most common combination of age bin and work class?- How many people belong to this group?3) Calculate the mode of education level in groups according to their work class and age bin- What is the most widespread education level in the private workclass independently of theage bin? The total number of rows and average age by genderThe number of missing and non-missing values in occupationand total rows in occupation and in marital-statusCreate the table with age-bin as a groupand workclass as a pivot and calculatethe number of people in groupsRead data adult.csvThe modes of string columns by native countryRead data adult_binned.csvCreate the table with age-bin as a group and workclass as a pivot and find the mostwidespread level of education in the private workclass GroupBy GroupBy Pivoting File Reader GroupBy File Reader Pivoting

Nodes

Extensions

Links