Question: In this question, we will Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well

In this question, we will Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well as whether they survived from the disaster.

Import the Titanic data using the following R code:

df <- read.csv("Titanic.csv",header=TRUE, sep=",") 

(a) Calculate P (Survived) and P (Survived|P lcass = 1) using R. The value 1 of the Survived variable means survived, 0 means not survived (1 mark).

(b) Calculate the entropy (log2()) of H(Embarked) and H(Pclass). Which entropy is higher? Why? Do not use an entropy function. (1 mark)

2

(c) In this competition, you must predict the fate of the passengers aboard the Titanic. Caroline used two methods to predict survival of passengers. She saved the prediction results as vari- able Survived_guess1 and Survived_guess2. Calculate H (Survived_guess1|P class) andH(Survived_guess2|Pclass), which entropy is higher? (1 mark)

(d) Can you guess which algorithm that Caroline used to obtain the prediction Survived_guess2. Hint: she used two variables Pclass and Embarked 2 in prediction. (optional with 1 bonus marks)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!