Justification:
This is a large database, so we need to reduce the number of columns appropriately and remove optional columns. Outliers and missing values can also be directly deleted.
We also find that the time format is inconsistent, and the date format is incorrect and inconsistent as well. We can choose a safer way to standardize the format.
Because the data is too complex, we can integrate some data points. For example, we can integrate the time occ into "time of day," which makes analysis easier. We can also select a portion of the data with a high proportion for analysis, such as the top ten crimes. These will be reflected in later steps.