Question: use r studio Part 1 - Data Exploration ( 4 0 % ) Descriptive Statistics ( 1 5 % ) 1 . ( 2 %
use r studio Part Data Exploration
Descriptive Statistics
What is the average value of the "motorUPDRS" score?
What is the average value of the "totalUPDRS" score for subjects over the
age of
Calculate the following statistics for the "Jitter column:
rosting o Mean:
o Median:
o Min:
Max:
Standard Deviation:
Interquartile Range th percentile th percentile:
What is the correlation coefficient between "motorUPDRS" and "RPDE"?
What is the average age of distinct subjects?
Mean age:
Median age:
Minimum age:
Maximum age:
Standard Deviation:
Py Numeric Data Type
i
iv
Visualization
Create a bar chart to display the distribution of subjects by gender.
o What is the gender ratio in the trial?
Create a scatter plot of "motorUPDRS" vs "age".
o Do you observe any trends or patterns between motor UPDRS
scores and age?
What potential insight could this give about disease progression?
Part Feature Engineering
Creating New Features:
Jitter to Shimmer Ratio
Create a feature called JitterShimmerRatio by dividing Jitter by
Shimmer. This will help analyze the balance between these vocal
features.
UPDRS Progression
o Create a feature called UPDRSProgressionRate, which captures the
rate of change in the "motorUPDRS" score relative to testtime.
Vocal Features Mean
Compute the average of the Jitter Shimmer, NHR and HNR values for
each subject. Store it as VocalFeaturesMean.
Part Predictive Modeling
Linear Regression
Fit a simple linear regression model to predict "motorUPDRS" using age,
gender, jitter, shimmer, NHR RPDE, and DFA as predictors.
Evaluate the model using Mean Absolute Error MAE and Rsquared on the
test set.
Scatter Plot Analysis
Create scatter plots for "motorUPDRS" vs "Jitter and "motorUPDRS" vs
"Shimmer".
Do you notice any nonlinear patterns in the relationship?
NonLinear Relationships
NonLinear Features : Generate polynomial terms eg square of the
jitter and shimmer to explore potential nonlinear relationships. Use these
features in subsequent modeling.
Once nonlinearity is discovered, try fitting a polynomial regression model to
account for curved relationships between vocal features eg jitter, shimmer
and motor UPDRS.
Column Description
name
subject# Unique identifier for cach subject which ranges from Numeric Deta
Type
age Age of Subject Numeric Data Type
ex Gender of Subject Categorical Data Type
testtime Time since recruitment into trial Numeric Data Type
motorUPDRS Motor UPDRS score Numeric Data Type
totalUPDRS Total UPDRS score Numeric Data Type
Jitter Percentage of local variation in fundamental frequency Numeric Data
Type
JitterAbs Absolute jitter Numeric Data Type
Jitter Difference between consecutive differences of fundamental frequency
Numeric Data Type
Shimmer Difference between consecutive differences of amplitude Numeric Data
Type
ShimmerdB Shimmer in decibels Numeric Data Type
NHR Noisetoharmonics ratio Numeric Data Type
HNR Harmonicstonoise ratio Numeric Data Type
RPDE Recurrence period density entropy Numeric Data Type
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
