Icon

workflow for crime-final

Data Exploration

Data Cleaning

Justification:

This is a large database, so we need to reduce the number of columns appropriately and remove optional columns. Outliers and missing values ​​can also be directly deleted.

We also find that the time format is inconsistent, and the date format is incorrect and inconsistent as well. We can choose a safer way to standardize the format.

Because the data is too complex, we can integrate some data points. For example, we can integrate the time occ into "time of day," which makes analysis easier. We can also select a portion of the data with a high proportion for analysis, such as the top ten crimes. These will be reflected in later steps.

Data Transformation-Study Relationship Between Crime Types and Area

Data Transformation-Study Crime Frequency by Time of Day

Data Transformation-Find Out Top 10 Crimes Appears

Final Visualization-For helping the Data analysis

Load the 2024 crime dataset
CSV Reader
Missing value: remove column which is 'optional' because it will carry big amount of blank cells
Column Filter
Check the frequency of the crime
Histogram
Keep only the Date and Primary Type columns.
Column Filter
Select disorder crimes
Row Filter
Select serious crimes
Row Filter
Missing Value: exclude records with age == 0AND Crm cd missing
Row Filter
Merge disorder andserious counts by Area
Joiner
Remove column with>50% missing value
Missing Value Column Filter
Column Renamer
Remove rowswith outliers
Numeric Outliers
Count serious crimes per area.
GroupBy
Remove rows with missing value
Missing Value
Count disorder crimes per community area
GroupBy
Denormalizer
Test if columncontains constant val
Constant Value Column Filter
Formate: Leave only Date in DATE OCC and put it in a new column named OCCDate
String Manipulation
Format: Leave only Date in Date Rptd and put it in a new column named ReportDate
String Manipulation
Column Renamer
Format: Sort the format for column TIME OCC to new column named OCCTime
String Manipulation
Format: Covert ReportDate and OCCDate columns to Date format
String to Date&Time
Rank the crimes
Sorter
Add the TimeofDay column (Morning, Afternoon...) according to time
Rule Engine
We can see that top 10 crimes already take over 50% of all crimes
Pie Chart
Visual in boxplot
Box Plot
So we pick top 10 crimes only
Row Filter
Relationship between age and crime types
Box Plot
Table View
See the relationships between victim age and time of crime appears
Scatter Plot
Relationship between gender and time crime appears
Box Plot
Check the duplicates and delete them
Duplicate Row Filter
Check the relationship between areas and time crimes appears
Heatmap
Assign the crime levels to clusters
Expression
Give the Rank from High to Low Crime
Sorter
Overview of the statistics data (min, medium, Std...), also check the missing value condition
Statistics View
Overview all the column and type of column (string, number...)
Table Manipulator
Cluster areas into three profiles
k-Means (deprecated)
Recheck if there are any other missing vales and fix them
Missing Value
Normalizer
Count each crime in each time-of-day category
GroupBy
Keep only Crime Type and Area.
Column Filter
Assign color to clusters
Color Manager
Count each crime
GroupBy
Sort categories based on the numeric rank.
Sorter
Bar Chart

Nodes

Extensions

Links