Module 2: Project Data Wrangling
For this portion of the project, you will examine your dataset for incorrect data. Any incorrect data should be removed, corrected, or imputed. Follow these steps:
- Remove irrelevant data. If you are unsure if it is irrelevant, then keep it.
- Remove duplicate records that are repeated.
- Make sure numbers are interpreted as numerical data types.
- Fix typos.
- Investigate outliers.
- Check and manage missing values.
- Format and normalize data if needed.
- Change categorical values into numbers if needed.
Once you have completed this, you will need to provide a Word document summarizing the pre-processing steps performed on your dataset.
Module 3: Project Exploratory Analysis
In this assignment, you will perform an exploratory analysis that will allow you to get a feel for the data and start exploring potential relationships. This may include:
- Descriptive statistics
- Bar charts
- Heat maps
- Line graphs
- Box plots
- Frequency tables
Once your analysis is complete, you will need to provide a Word document showing and describing the results of your exploratory analysis.
- Using your chosen dataset, reevaluate the heat map from the last module.
- Consider ways to perform a visual check to see if there is a relationship between fields.
- With this insight, develop a model using either linear regression or multiple linear regression.
- Report the intercepts, slope, model accuracy, output to predicted comparison, and a scatterplot with line portraying the model.
Once you complete these steps, you will need to provide a Word document showing and explaining the results of your model development.
After finishing Proposal create a final report of 5-6 pages
Use Python, Jupyter and show the visuals of the data analysis with introduction, conclusion