Question: Instructions: - You should submit your answer as a Jupiter Notebook file only. Submit only the .ipynb file that should contain all your answers for

Instructions:

- You should submit your answer as a Jupiter Notebook file only. Submit only the .ipynb file that should contain all your answers for this assignment.

- This is an individual assignment. Any two identical answers will be graded ZERO automatically.

- Assignments submitted after due date will NOT be accepted.

Date Set Description:

We'll be using the "mammographic masses" public dataset from the UCI repository. This data contains 961 instances of masses detected in mammograms, and contains the following attributes:

1. BI-RADS assessment: 1 to 5 (ordinal)

2. Age: patient's age in years (integer)

3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)

4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)

5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)

6. Severity: benign=0 or malignant=1 (binominal) (The dependent variable)

BI-RADS is an assessment of how confident the severity classification is; it is not a "predictive" attribute and so we will discard it. The age, shape, margin, and density attributes are the features that we will build our model with, and "severity" is the classification we will attempt to predict based on those attributes.

Although "shape" and "margin" are nominal data types, which sklearn typically doesn't deal with well, they are close enough to ordinal that we shouldn't just discard them. The "shape" for example is ordered increasingly from round to irregular.

Note: You may use pandas, numpy, sklearn or any essential python package you need.

Instructions: - You should submit your answer as a Jupiter Notebook

file only. Submit only the .ipynb file that should contain all your

answers for this assignment. - This is an individual assignment. Any two

identical answers will be graded ZERO automatically. - Assignments submitted after due

A. [1.5 PoINTS] Read the given "mammographic_masses.data.txt" data set into masses_data The output should be like this: B. [1.5 POINTS] Provide a plot to show the counts of both classes in "severity" dependent variable. The output should be like this: For data preprocessing: A. [1 POINTS] list all rows that contain missing values in the data set: The nutnut should he like thic. 130 rows 6 columns B. [1 POINTS] Impute the missing values in the data set using the median value for each column containing missing value. A. [.5 Points] Separate the columns of "messes_data" into " X " and " y ", where " X " contains the columns "BI-RADS", "age", "shape", "margin", and "density" and " y " contain "severity" only. B. [.5 PoINTS] Use train_test_split command to prepare Xtrain, Xtest, ytrain, and $ y_{\text {_test. }} $ A. [1 PoINTS] Use logistic regression algorithm to model the prepared training data. B. [1 PoINTS] Use logistic regression algorithm to predict the test data. C. [1 POINTS] Use the sklearn "accuracy_score" to find the accuracy of your developed model on testing data. D. [1 PoINTS] Print the obtained accuracy from previous

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Instructions: - You should submit your answer as a Jupiter Notebook file only. Submit only the .ipynb file that should contain all your answers for this assignment. - This is an individual...

File to submit: One Jupiter Notebook file File required: winequality.csv Dataset description: The dataset is related to variants of white wine. The following shows the variable information: 1 - fixed...

Chapter 3 Answer Assurance of Learning Exercises Question 1 and Question 2 (page 63). DUE NLT SEPT 6, 2020. Read over the assignment on page 63 and submit as a Microsoft Word Document attachment file...

THIS IS INDIVIDUAL WORK Please read the following carefully: 1-PLEASE DON'T FORGET TO WRITE YOUR NAME AND ID, AS THERE ARE MANY SIMILAR NAMES 2-Answers should be in word format or PDF format using...

Good evening I need Part 2 of this assignment Task 1, Task 2 and Task 3 all complete with step by step calculations and answers to the problem.I need help in 12 hours deadline please thank you....

Task 0 - feature definition Write a python function that takes the training set spam_train.csv as input and output a list of words (referred to as feature words) that will be used as features. As...

You are a plant accountant for Kelly Corp. You have been asked to draft a brief issues memo ("to the files") documenting the accounting for the following issue. Kelly Corp has leased a mine from...

Equipment acquired on January 3, 2012, at a cost of $380,000, has an estimated useful life of 16 years, an estimated residual value of $40,000, and is depreciated by the straight line method. a. What...

Question 6 of 8 - 1 1 Match each of the following business use cases with the appropriate data analytics technique. Project management: Managers can predict project costs, schedules, and more using...

part 3. please help Part II: The following balance sheet (statement of financial position) is presented for LevelUp Corporation. LevelUp Corporation Statement of Financial Position At December 31,...

KEY QUESTION Calculate total-revenue data from the demand schedule in question

Graph total revenue below your demand curve. Generalize about the relationship between price elasticity and total revenue.

KEY QUESTION How would the following changes in price affect total revenue? That is, would total revenue increase, decline, or remain unchanged? a. Price falls and demand is inelastic. b. Price rises...