Question: use r studio Part 1 - Data Exploration ( 4 0 % ) Descriptive Statistics ( 1 5 % ) 1 . ( 2 %

use r studio Part 1- Data Exploration (40%)
Descriptive Statistics (15%)
1.(2%) What is the average value of the "motor_UPDRS" score?
2.(3%) What is the average value of the "total_UPDRS" score for subjects over the
age of 70?
3.(5%) Calculate the following statistics for the "Jitter(%)" column:
rosting o Mean:
o Median:
o Min:
Max:
Standard Deviation:
Interquartile Range (75th percentile -25th percentile):
4.(5%) What is the correlation coefficient between "motor_UPDRS" and "RPDE"?
5.(5%) What is the average age of distinct subjects?
Mean age:
Median age:
Minimum age:
Maximum age:
Standard Deviation:
Py (Numeric Data Type)
i.
iv.
Visualization (25%)
1.(8%) Create a bar chart to display the distribution of subjects by gender.
o (2%) What is the gender ratio in the trial?
2.(10%) Create a scatter plot of "motor_UPDRS" vs "age".
o (3%) Do you observe any trends or patterns between motor UPDRS
scores and age?
(2%) What potential insight could this give about disease progression?
Part 2- Feature Engineering (30%)
Creating New Features:
1. Jitter to Shimmer Ratio (5%)
Create a feature called Jitter_Shimmer_Ratio by dividing Jitter(%) by
Shimmer. This will help analyze the balance between these vocal
features.
2. UPDRS Progression (5%)
o Create a feature called UPDRS_Progression_Rate, which captures the
rate of change in the "motor_UPDRS" score relative to test_time.
3. Vocal Features Mean (5%)
Compute the average of the Jitter(%), Shimmer, NHR, and HNR values for
each subject. Store it as Vocal_Features_Mean.
Part 3- Predictive Modeling (30%)
Linear Regression (15%)
1.(10%) Fit a simple linear regression model to predict "motor_UPDRS" using age,
gender, jitter, shimmer, NHR, RPDE, and DFA as predictors.
2.(5%) Evaluate the model using Mean Absolute Error (MAE) and R-squared on the
test set.
Scatter Plot Analysis (5%)
1. Create scatter plots for "motor_UPDRS" vs "Jitter(%)" and "motor_UPDRS" vs
"Shimmer".
(3%) Do you notice any non-linear patterns in the relationship?
Non-Linear Relationships (10%)
1. Non-Linear Features (15%) : Generate polynomial terms (e.g., square of the
jitter and shimmer) to explore potential non-linear relationships. Use these
features in subsequent modeling.
2. Once non-linearity is discovered, try fitting a polynomial regression model to
account for curved relationships between vocal features (e.g., jitter, shimmer)
and motor UPDRS.
Column Description
name
subject# Unique identifier for cach subject which ranges from 1-31(Numeric Deta
Type)
age Age of Subject (Numeric Data Type)
5ex Gender of Subject (Categorical Data Type)
test_time Time since recruitment into trial (Numeric Data Type)
motor_UPDRS Motor UPDRS score (Numeric Data Type)
total_UPDRS Total UPDRS score (Numeric Data Type)
Jitter(%) Percentage of local variation in fundamental frequency (Numeric Data
Type)
Jitter(Abs) Absolute jitter (Numeric Data Type)
Jitter Difference between consecutive differences of fundamental frequency
(Numeric Data Type)
Shimmer Difference between consecutive differences of amplitude (Numeric Data
Type)
Shimmer(dB) Shimmer in decibels (Numeric Data Type)
NHR Noise-to-harmonics ratio (Numeric Data Type)
HNR Harmonics-to-noise ratio (Numeric Data Type)
RPDE Recurrence period density entropy (Numeric Data Type)
use r studio Part 1 - Data Exploration ( 4 0 % )

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!