Question: Step 1: Data Loading and Analysis Download the sample_dataset_finalproject dataset provided on Moodle. Step 2: Data Visualization and Cleaning Examine the data structure and
Step 1: Data Loading and Analysis
- Download the " sample_dataset_finalproject" dataset provided on Moodle.
Step 2: Data Visualization and Cleaning
- Examine the data structure and contents. Plot the data points on a graph and examine the trend over time. For your plot, consider Xlable, Ylable, and title.
- Identify and handle missing values by imputing them with an appropriate technique. Present 'before' and 'after' plots of the dataset to demonstrate the effectiveness of your technique. Explain how many missing values you have and describe the technique you used to handle missing values.
- Identify and describe the outliers.
- Perform correlation analysis on the cleaned dataset. Identify relevant variables and calculate their correlation coefficients. Interpret the correlation coefficients to understand the relationships between variables.
Step 3: Regression Modeling
- Select appropriate variables and define the dependent and independent variables for regression.
- Apply a regression model (similar to what you practiced on class activity 3) to the cleaned dataset.
- Plot the error values during the iterations and interpret the regression output.
- Export the cleaned dataset to an Excel file using the following code in colab:
df.to_excel('/content/cleaned_dataset.csv', index=False)
- Download your Python code in .ipynb format, as well as your cleaned dataset in a CSV file.
Step 4: Interactive Visualizations by Tableau
- Import the cleaned dataset into Tableau.
- Create a scatter plot of each feature in Tableau. Scatter plots typically involve two variables (x and y) to visualize the relationship between them. However, in order to create a scatter plot of each single feature in this part, you can create a calculated field by a constant. In the Data pane, right-click on cleaned_dataset.csv and select Create Calculated Field. Name the calculated field (e.g., "Time").
Replace the formula with the following number: 1.
Now, you can create a scatter plot:
- Drag each feature to the Columns shelf.
- Drag the "Time" calculated field to the Rows shelf.
For each feature, apply appropriate filter to remove the outliers and present 'before' and 'after' plots of the features to demonstrate the effectiveness of your technique.
Feature1Feature2Feature3Feature4Feature55211316720-0.365320.2053331470.184680.48304151081935-1.347130.268534721051927-9.716140.2874626110812411.2004140.656756211191864-0.656890.96853783821001-1.046910.60363710212240.536653757118001.1857040.075584750.7189538811919720.9960480.29729124119-0.75680.0920673530.5990452214315881.5013340.623649531241115-0.322680.64850521111665-0.250838811119171.3281940.01511114311020.556230.965015381440.4558880.25089327311652.1650020.676026641870-0.643520.706635810890.927840.6100072116850.0570130.31290733800.2685927689114201.5284680.597668588519930.5078368.660956227311050.5382960.94673489144188601.0725070.105906495501427-0.364950.154829911304-0.839210.944736591331871-1.044810.7365354214119709212413162.0562072.026326601962-1.103210.587586801281634-0.221250.7011415551901-0.276810.6801126214310080.3074076217730.8157370.0153954711110720.8604730.582926621061358-0.583080.2531025101222-0.167120.4502545512814150.282580.957581641241552-0.248690.39903530571.6073460.839802517517150.4909750.188541710012637.3487780.67246219418390.977007739316891.1734740.1018933918590.1810220.00832181191317-1.296830.4335837510830.3996880.0926251171317-0.651360.74838460681110-0.528620.914549141330.5863640.434021914613821.2382830.2587129010060.434403536114580.3088330.7234489610831.7022150.0090548412070.2407530.5894549213918992.6016830.613292606313890.565510.637688711131313-1.760760.2420228710210.7533420.714053814090.3811580.09139147601.2897530.199273514915140.6731810.877471126-0.138460.7387238152368216990.248376500550-0.850522.1440654991013-0.580520.27080625919700.5885780.2475619771.6699050.06252654720.3946720.458941591086-1.195880.73273354930.4446030.606732935119081.19663163621536-6.097830.08114918891300-0.134020.9514919005112420.0146880.838492441331121-0.78490.8050911411460.648280.8229847411201566-0.120950.9327116212217490.4195320.54425414661355-0.887490.20028295581684-0.437460.6167844812419030.7223810.742881151732-0.372830.7379177273016301.7269640.52144978871308-0.399640.068459878416940.371113621430.9325910.92076740144-1.418370.58444985981811-1.76081801181593-1.525660.2688548211119651.2625840.368449531091913-0.551860.89534624991722261271661-0.564250.78738789154800.1845510.4543456013511.542110.630247418313112.0060930.2483842912518720.7054611514810581.2083660.427601458416421.0240630.442546655019230.5925270.649322898917130.7783610.936281711131754-0.551190.0640069711034-0.81820.824743881091444-0.003370.292383101131092-1.701850.44391981421949-0.453230.0219138819580.6963870.301046636012060.9553050.502631116313520.0884070.05617681109190214.77530.4910968791331-1.141690.927111358418160-0.193661.053932351341861-0.716820.76444133861724-1.866540.409671541158-0.082680.655174411321953-0.121750.260237Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
