Icon

B) Exercise - Census EDA

Exploratory Data Analysis: b) Univariate ; c) Bivariate Data ingestion a) Data integration and selection INSTRUCTIONS: The data sources of the exercise are organized in two different tables: "Census_DB_Data.xlsx" and "Census_DB_Income.xlsx". These are the steps tofollow to complete the exercise: (a) Data integration and selection: 1. Merge the 2 tables, using the ID as the key column. 2. Filter out unemployed citizens (use the "Work type" variable). (b) Univariate analysis: 1. Compute summary statistics for numeric variables in the dataset. 2. Plot a graph that can provide quick information about the distribution and presence of outliers in the numeric variables. 3. Produce the frequency distribution table for the variable "Work type" and use the appropriate graph to display the distribution of the data. (c) Bivariate Analysis: 1. Analyze with appropriate tools (graph, table, test, etc.) the relationship between "Family status" and "Work type". 2. Analyze with appropriate tools (graph, table, test, etc.) the relationship between "Work type" and "Working Hours per week". 3. Analyze the correlation between "Age" and "Working hours per week" using a graph and a summary measure. EXTRA STEP: Create a comprehensive bivariate analysis to highlight the relationship between INCOME and all other variables in order to evaluate the most promisingpredictors of INCOME. Census DBDATACensus DBINCOME Excel Reader Bivariate analysis Univariate analysis EXTRA STEP: INCOMEbivariate anaysis Excel Reader Exploratory Data Analysis: b) Univariate ; c) Bivariate Data ingestion a) Data integration and selection INSTRUCTIONS: The data sources of the exercise are organized in two different tables: "Census_DB_Data.xlsx" and "Census_DB_Income.xlsx". These are the steps tofollow to complete the exercise: (a) Data integration and selection: 1. Merge the 2 tables, using the ID as the key column. 2. Filter out unemployed citizens (use the "Work type" variable). (b) Univariate analysis: 1. Compute summary statistics for numeric variables in the dataset. 2. Plot a graph that can provide quick information about the distribution and presence of outliers in the numeric variables. 3. Produce the frequency distribution table for the variable "Work type" and use the appropriate graph to display the distribution of the data. (c) Bivariate Analysis: 1. Analyze with appropriate tools (graph, table, test, etc.) the relationship between "Family status" and "Work type". 2. Analyze with appropriate tools (graph, table, test, etc.) the relationship between "Work type" and "Working Hours per week". 3. Analyze the correlation between "Age" and "Working hours per week" using a graph and a summary measure. EXTRA STEP: Create a comprehensive bivariate analysis to highlight the relationship between INCOME and all other variables in order to evaluate the most promisingpredictors of INCOME. Census DBDATACensus DBINCOME Excel Reader Bivariate analysis Univariate analysis EXTRA STEP: INCOMEbivariate anaysis Excel Reader

Nodes

Extensions

Links