Question: In this question, we will Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well
In this question, we will Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well as whether they survived from the disaster.
Import the Titanic data using the following R code:
df <- read.csv("Titanic.csv",header=TRUE, sep=",") (a) Calculate P (Survived) and P (Survived|P lcass = 1) using R. The value 1 of the Survived variable means survived, 0 means not survived (1 mark).
(b) Calculate the entropy (log2()) of H(Embarked) and H(Pclass). Which entropy is higher? Why? Do not use an entropy function. (1 mark)
2
(c) In this competition, you must predict the fate of the passengers aboard the Titanic. Caroline used two methods to predict survival of passengers. She saved the prediction results as vari- able Survived_guess1 and Survived_guess2. Calculate H (Survived_guess1|P class) andH(Survived_guess2|Pclass), which entropy is higher? (1 mark)
(d) Can you guess which algorithm that Caroline used to obtain the prediction Survived_guess2. Hint: she used two variables Pclass and Embarked 2 in prediction. (optional with 1 bonus marks)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
