Question: please include code and dataset name is bace.csv, please include in code! Exercise set 8 1. The attached paper by Subramanian et ai. uses Quantitative

please include code and dataset name is "bace.csv", please include in code!

Exercise set 8 1. The attached paper by Subramanian et ai. uses Quantitative Structure-Activity Relationsh (QSAR) models using statistical approaches to estimate the binding affinities (IC50) for diverse siructural and chemical classes of human -secretase 1 (BACE-1) inhibitors. Results are compared with values reported in literature. Govindan Subramanian, Bharath Ramsundar, Vijay Pande, and Rajiah Aldrin Denny, "Computational Modeling of -Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches", Joumal of Chemical Information and Modeling (2016),56, 1936-1949. Use the scikit-learn methods to perform statistical analysis of the BACE dataset and to predict the binding affinity. The dataset is provided within the exercise set. - import the required libraries and modules: numpy, matplotlib.pyplot, seaborn, pandas, datasets and linear_model from sklearn, LinearRegression from sklearn.linear_model, accuracy_score from sklearn.metrics, train_test_split from sklearn.model_selection - downioad the bace.csv dataset A. Read the dataset using pandas. Print information about the data contained in the dataset, such as the header, description or summary. B. Build a 9-feature input dataset using the following physical descriptors: molecular weight ('MW'), partition coefficient ('AlogP'), hydrogen bond acceptor ('HBA'), hydrogen bond donor ('HBD'), rotatable bonds ('RB'), polar surface area ('PSA'), electrotopological states ('ESTATE'), molar refractivity ('MR'), molecular polarizability ('Polar'). Graph the dataset using a pair plot representation. C. Assign the measured value of the measured binding affinity ('pIC50') to the output variable. - Use statsmodels ordinary least squares (OLS) regression model to perform a multiple linear regression of the binding affinity on the set of 9 predictors. Print the statistics using the summary table (use the summary() function in statsmodels). Determine the residuals, standardized (studentized) residuals, the leverages and plot the Residuals versus the fitted values and the Standardized Residuals versus the Leverages. What do these plots tell you? - Split the dataset into a training set, comprising 80% of the data randomly selected, and a test set, comprising the remaining 20% of the original data. - Perform a multiple linear regression of the training set of solubility on the training set of 9 predictors and determine the regression coefficients. - Assess the fit by obtaining the Residual Standard Error (RSE) and the R2 statistic for the training and the test sets. How do these values compare with the RMSE and R2 values in Table 2 of the Subramanian et al. paper? - Perform a simple linear regression of the test output variable on the predicted test values and illustrate the fitted line in a graph along with the scatter plot of the test and predicted test output data. D. Generate an 3 single-feature datasets by selecting only the column ' MW ', or 'HBD', or 'Estate' from the above set in part B. - Perform a single linear regression on each of the single-feature datasets using the approach in part C. - How do the results for each simple linear regression compare with the multiple linear regression from part C

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

how to fix erroe on line 17 in code please!! assignment attached inport nunpy as ing import pandas as pd inpert seaboim as sns. import nstplot lib.pyplot as ptt: fron sklearn inport datapets fron...

please help with code, linear regression line plot not showing up Exercise set 8 1. The attached paper by Subramanian ot al. uses Quantitative Structure-Activity Relationship (QSAR) models using...

JBR-07575; No of Pages 12 Journal of Business Research xxx (2012) xxx-xxx Contents lists available at SciVerse ScienceDirect Journal of Business Research Organizational innovation as an enabler of...

JPMA-01726; No of Pages 12 Available online at www.sciencedirect.com ScienceDirect International Journal of Project Management xx (2015) xxx - xxx www.elsevier.com/locate/ijproman Does Agile work? A...

Need help with the document attached. The company that this paper is about is Nvidia. Project Descriptions SEC 10-K Paper You will be asked to select a company that is publically traded. You must...

Please read chapter 6 and answer the questions and see the ( guide to answer number 3) For each case study, you will view the material as the student's teacher, read the information provided and...

Can someone help me with the Accounting Course Project. Instructions are there in the Course Project Instruction and the 10k for Nike and UNder Armour attached. I have also attached the Teamplate....

I need help with this project. I have attached all of the instructions and relevant information. The final draft is due August 20th. Course Project: A Financial Statement Analysis A Comparative...

Pricing Strategies Professor Jean-Pierre Dub Homework Schedule for Pricing Strategies, Summer 2015 Section 85 Overview of Homework...

Please help me with this assignment, 100% human! Reference book George, J. M. (2024). Contemporary management (12th ed.). McGraw-Hill Education. keiser library Syahbinah, S., & Suhardianto, N....

Tim's portfolio contains two stocks, Lightco and Shineco. Last year his portfolio returned 14 percent. Lightcos return was 5 percent and Shineco returned 20 percent. What are the weights of each in...

(rz, 4 4ryz, cz) be the velocity field of a fluid. Compute the flux of v across the surface 162? + 2522 3D (y 20) where 0

Incorporating Stakeholder Impacts into Business Sustainability Analyses and Decisions Stylz Company, a recent start-up fashion retailer based in the United States, is deciding between opening its...

9. Which of the following companies is most likely to use A) Crimpson Color, a company selling customized garments for niche customers B) Effel & Associates, a consulting firm providing various audit...

3. What are potential solutions?

2. Have the team members identify the potential sources for conflict.

4. I can tell when team members dont mean what they say.