Question: CIS 4 7 0 Intro to Data Analytics Lab Title: Logistic Regression in Python: Building Your First Model Estimated Time: 2 hours Learning Objectives By
CIS Intro to Data Analytics
Lab Title: Logistic Regression in Python: Building Your First Model
Estimated Time: hours
Learning Objectives
By the end of this lab, you will be able to:
Load and preprocess data for logistic regression.
Understand and apply logistic regression for binary classification.
Evaluate the model's performance using key metrics like accuracy, precision, recall, and ROCAUC.
Prerequisites
This lab assumes familiarity with Python programming and basic data analysis using libraries such as pandas and numpy. No prior knowledge of regression models is required.
You may need to install packages such as sklearn Sci kit learn using commands such as the below. I ran this in my vscode terminal on a windows PC Please use VS Code.
Dataset
Well use the Titanic dataset, a popular dataset for binary classification tasks. The Titanic dataset contains information about passengers, such as age, sex, class, and survival status. Our goal is to predict the likelihood of a passenger surviving based on these features.
Dataset Info:
Target Variable: Survived if the passenger survived, if not
Features: Pclass, Sex, Age, SibSp, Parch, Fare, Embarked
Lab Steps
Step : Import Required Libraries
In this step, you will import the essential libraries needed for data manipulation, visualization, and model building.
Step : Load and Explore the Dataset
Load the Titanic dataset.
Check the datasets structure and identify missing values or data types that need processing.
Step : Data Preprocessing
Handle Missing Values: Fill in missing values for the Age column with the median age.
Convert Categorical Variables: Convert the Sex and Embarked columns into dummy variables.
Select Features and Target: Choose relevant columns for prediction.
Step : Split the Data
Split the dataset into training and testing sets.
Step : Build the Logistic Regression Model
Initialize the logistic regression model.
Fit the model to the training data.
Step : Make Predictions
Use the trained model to make predictions on the test data.
Step : Evaluate the Model
Calculate evaluation metrics to assess model performance:
Accuracy: Proportion of correct predictions.
Precision: How often the model is correct when it predicts survival.
Recall: How often the model correctly identifies survivors.
ROCAUC: The area under the ROC curve.
Step : Visualize Results
Confusion Matrix: Plot the confusion matrix to understand the models performance in terms of truefalse positives and negatives.
ROC Curve: Plot the ROC curve to visualize the model's performance.
Step :
Interpretation and Conclusion
Discuss the following:
The significance of each metric accuracy precision, recall, ROCAUC and how it reflects the models performance.
How well the logistic regression model performed and potential limitations of using a logistic regression model
Possible steps for improving the model, such as adding more features, tuning hyperparameters, or testing different classification methods.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
