Please arrange to provide the following deliverables for your project. 1. Data exploration: a complete review and
Question:
Please arrange to provide the following deliverables for your project.
1. Data exploration: a complete review and analysis of the dataset including:
- Load and describe data elements (columns), provide descriptions & types, ranges and values of elements as appropriate. - use pandas, numpy and any other python packages.
- Statistical assessments including means, averages, correlations
- Missing data evaluations - use pandas, numpy and any other python packages
- Graphs and visualizations - use pandas, matplotlib, seaborn, numpy and any other python packages, you also can use power BI desktop.
2. Data modelling:
- Data transformations - includes handling missing data, categorical data management, data normalization and standardizations as needed.
- Feature selection - use pandas and sci-kit learn. (The group needs to justify each feature used and any data columns discarded)
- Train, Test data splitting - use numpy, sci-kit learn.
- Managing imbalanced classes if needed. Check here for info: https://elitedatascience.com/imbalanced-classes
- Use pipelines class to streamline all the pre-processing transformations.
3. Predictive model building
- Use logistic regression, decision trees, SVM, Random forest and neural networks algorithms as a minimum- use scikit learn
- Fine tune the models using Grid search and randomized grid search.
4. Model scoring and evaluation
- Present results as accuracy , precision, recall, F1 scores, confusion matrices and plot the ROC curves of the models - use sci-kit learn
- Select and recommend the best performing model
5. Deploying the model
- Using flask framework arrange to turn your selected machine-learning model into an analytics API.
- Using pickle module arrange for Serialization & Deserialization of your model.
- Deploy your model on local host.
- Build a client to test your model API service. Use the test data, which was not previously used to train the module. You can use simple Jinja HTML templates with or without Java script, REACT or any other technology but at minimum use POSTMAN Client API.
6. Prepare a report explaining your project and detailing all the assumptions, constraints you applied should have the following sections:
- Table of contents
- Executive summary (to be written once nearing the end of project work, should describe the problem/solution and key findings)
- Overview of your solution(to be written once nearing the end of project work)
- Data exploration and findings (dataset field descriptions, graphs, visualizations, tools and libraries used....etc.)
- Feature selection (tools and techniques used, results of different combinations...etc.)
- Data modeling (data cleaning strategy, results of data cleaning, data wrangling techniques, assumptions and constraints)
- Model building (train/ test data, sampling, algorithms tested, results: confusion matrixes ...etc.)
4) Data Set
This dataset includes all traffic collisions events where a person was either Killed or Seriously Injured (KSI) from 2006 - 2020 in the city of Toronto. (might change to 2021 depending on frequency of update)
In accordance with the Municipal Freedom of Information and Protection of Privacy Act, the Toronto Police Service has taken the necessary measures to protect the privacy of individuals involved in the reported occurrences. No personal information related to any of the parties involved in the occurrence will be released as open data. The location of the incident occurrences have been deliberately offset to the nearest road intersection node to protect the privacy of parties involved in the occurrence. All location data must be considered as an approximate location of the occurrence and users are advised not to interpret any of these locations as related to a specific address or individual. The reported dataset is intended to provide communities with information regarding public safety and awareness. The data supplied to the Toronto Police Service by the reporting parties is preliminary and may not have been fully verified.
KSI dataset - https://data.torontopolice.on.ca/datasets/TorontoPS::ksi/explore?location=43.722473%2C-79.380682%2C11.66&showTable=true
Accounting Information Systems
ISBN: 978-0133428537
13th edition
Authors: Marshall B. Romney, Paul J. Steinbart