Question: PLEASE HELP IN PYTHON Bagging and boosting are considered ensemble methods of machine learning. Ensemble is a concept in which multiple models, also called decision

PLEASE HELP IN PYTHON
Bagging and boosting are considered ensemble methods of machine learning. Ensemble is a concept in which multiple models, also called decision trees, are trained using the same learning algorithm to produce a better predictive performance than using a single decision tree. The ensembles take part in a bigger group of methods, called multi-classifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem.
For this given code, you implement either a bagging or a boosting version to solve the problem of a pharmaceutical drug analysis. Fill out the missing code sections titled "Write Your Code Here". One section has an error stating: "DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given". There is an excel file provided for this code. Also, try to explain each section of the code on how it works if you can. Thank you.
import pandas as pd
import seaborn as sns #for plotting
from sklearn.ensemble import RandomForestClassifier #for the model
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz #plot tree
from sklearn.model_selection import train_test_split #for data splitting
np.random.seed(123) #ensure reproducibility
#Let's load the data. This is clinical data to predict wether a patient is likely to suufer a heart attack (1) or not (0). For each patient (303 in total)14 variables have been taken into account for the prediction. We will use all these data to train a randon forest as a predictor for heart attacks:
df = pd.read_csv('heart2.csv')
print(df.shape)
df.head()
#Let's have a better look at the data (visualization):
X=np.array(df[df.columns[0:14]])
f,a = plt.subplots(4,4)
f.set_figheight(10)
f.set_figwidth(10)
a = a.ravel()
for idx,ax in enumerate(a):
if idx14:
ax.hist(list(X[:,idx]),color='r')
ax.set_title(df.columns[idx])
plt.tight_layout()
#Now, let's train our random forest. Not all available data will be used for training, some will be used for testing (20%):
X_train, X_test, y_train, y_test = train_test_split(df.drop('target',1), df['target'], test_size =.2, random_state=10)
model = RandomForestClassifier(max_depth=5)
model.fit(X_train, y_train)
#There is a type error on this sections stating: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given
#Now, let's predict the class (likely to have an attak (0) or not (1)) for our testing data of patients, using our trained random forest:
y_predict = model.predict(X_test)
#These are the predicted labels by the random forest:
y_predict
#...and these are the actual labels of the testing data:
y_test=np.array(list(y_test))
y_test
#Let's see the difference between the predicted and the actual labels. See that, out of 61 predictions, the random forest mistook 12:
sum(abs(y_predict - y_test ))
#Now, for boosting, let's train five random forests:
#Write your code here
let's see the predictions for each random forest:
#Write your code here
#Now, our final prediction will be an average of the predictions of the five random forests. If more than two random forests say that the label must be 1, then, accordingly, the final prediction will be 1 and 0 otherwise:
#Write your code here
#See that our final (averaged) predictions have only 3 mistakes, this is, the boosting does improve indeed the predictions of our initial single model (12 mistakes):
sum(abs(y_predict - np.array(final)))
PLEASE HELP IN PYTHON Bagging and boosting are

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!