Question: 1. Click-Through Rate Prediction (3 points) In this problem, will build a model to predict the click-through rate of an advertising demand- side platform (DSP).

1. Click-Through Rate Prediction (3 points)

In this problem, will build a model to predict the click-through rate of an advertising demand- side platform (DSP). Your job is to predict whether an advertisement will be clicked by a mobile phone customer.

This dataset contains the following 8 variables:

dc = Click-through. 1 means clicked, 0 means not clicked. This is the outcome variable the DSP cares about.

atype = the AdExchange platform where the advertisement slot is traded.

bidf = the lowest bid price of the advertisers on the AdExchange.

instl = full-screen advertisment. 1 means full-screen, 0 means half-screen.

isp = code for the telecommunications company of the customer. 0 means unknown, 1 means China Mobile, 2 means China Unicom, 3 means China Telecome.

nt = the broadband cellular network technology code, 0 means unknown, 1 means Wi-Fi, 2 means 2G, 3 means 3G, 4 means 4G, 5 means 5G

mfr = the cellphone manufacturer.

period = time period of a day.

Use the data set "RTB.csv" to address the following questions. You may use the function rtb_new=pd.get_dummies(rtb) to transform the categorical variables in the data frame rtb into dummy variables (i.e., 0-1 variables). For building regularized logistic regression models, please

use the penalty and C parameters in the LogisticRegression module of sklearn. Refer to the online documentation for details: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

(a) (1 point) Build an L2-regularized logistic regression model and use cross-validation to select the best regularization parameter. Please include all the features into the model.

(b) (1 point) Build an L1-regularized logistic regression model and use cross-validation to select the best regularization parameter. Please include all the features into the model.

(c) (1 point) Consider the models you build in parts (a)-(b). Which model you would recommend to predict the click-throughs of this DSP? Report your model's generalization error (0-1 loss) and out-of-sample ROC-AUC. Which factor do you think is most important in predicting the click-throughs of this DSP? Based on your results, what recommendation(s) would you have about the advertising strategies for the DSP?

2. Predicting Titanic Survivals (3 points)

In this problem, you will try to build random forest and gradient boosting tree models to pre- dict whether passengers on Titanic could survive. We use the data set Titanic_Survival.csv, which contains the demographic information of passengers on Titanic and their survival status. This data set has the following variables:

Passengerid: The ID of the passenger Survived: Whether the passenger survived (1=Yes, 0=No) PClass: Ticket class (1=1st, 2=2nd, 3=3rd) Sex: Male or female Age: Age of the passenger SibSp: Number of siblings or spouses aboard the Titanic for the passenger Parch: Number of parents or children aboard the Titanic for the passenger Ticket: Ticket number of the passenger Fare: Passenger fare Cabin: Cabin number Embarked: Port of embarkation (S=Southampton, C=Cherbourg, Q=Queenstown) Name: Name of the passenger

Remark: You can also try your results on this forever-ongoing Kaggle competition: https://www. kaggle.com/c/titanic/overview.

(a) (1.5 points) Please use cross-validation to train and validate a random forest model to predict whether a passenger survived in the Titanic tragedy. Report the out-of-sample ROC-AUC and overall accuracy of our selected model.

(b) (1.5 points) Please use cross-validation to train and validate a gradient boosting tree model to predict whether a passenger survived in the Titanic tragedy. Report the out-of-sample ROC-AUC and overall accuracy of our selected model.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

I need two ratios provided based on the financial statement of a company (fullfinancial attached). I need the price-earnings ratio (the formula of price per share divided by earnings per share) and...

help get the answers.. its a complete question 1. (10 points) Annie and David are painting their apartment. At the paint store, David says he prefers Canary Yellow to Bumblebee Yellow, Lime Yellow,...

This question concerns lexical grammars. (a) Tree Adjoining Grammars contain two types of elementary tree. (i) What are these trees called? [1 mark] (ii) If one were building a grammar for English...

If you invested $320 million into a project today, and achieved operating cash flows of $180million, $60 million, $50 million, $50 million, and $40 million, how would you measure the following: a....

An economy has two types of jobs, Good and Bad, and two types of workers, Qualified and Unqualified. The population consists of 60% Qualified and 40% Unqualified. In a Bad job, either type of worker...

From the following illustrations, determine if the probability is greater than 0.5, lower than 0.5, or cannot be sure. (a) (d) (b) (c) AAA (e) (f)

My Motto Is You will often see inspirational quotes or sayings that people have chosen to include at the bottom of their e-mails. Some are humorous, some relay a personal agenda, and others are...

Toni Pena is the sole owner of Deer Park, a public camping ground near the Lake Mead National Recreation Area. Toni has compiled the following financial information as of December 31, 2012...

[ The following information applies to the questions displayed below. ] Banele had a taxable estate of $ 1 5 . 5 million when they died this year. Calculate the amount of estate tax due ( if any )...

1. What opportunities did Melissa identify, and how did she capitalize on those opportunities? 2. How did Melissas background help her identify and tackle the problems she discovered in the beauty...

3. (12 points) Give simple English language descriptions for the following grammars: b) G2: S OS0 00 0

Who are Netflix s competitors and why would they be likely to take action against Netflix? Include the role of market commonality and resource similarities. Why does Netflix appear to have a...

As shown in [Figure 1-14], the rotatable dump is connected to the truck's bed by a hinge at the support point C and supported by the hydraulic cylinder AB . The weight of the dump is 5 [kN], and the...

The present value of a future amount was calculated to be R207,54 at a discount rate of 5% and the present value of the same future amount was calculated to be R183,30 at a discount rate of 6%. Use...

Use the data from the following file (Fish.csv) table pasted below for running linear regression. Weight Length1 Length2 Length3 Height Width 242 23.2 25.4 30 11.52 4.02 290 24 26.3 31.2 12.48 4.3056...

In the mechanism shown in Fig. 2.53, the crank AB, 70 mm long, rotates clockwise at 110 rev/min. CD is 140 mm long. Link BD is 260 mm long and slides through a swivelling pin E at the lower end of...

Use a for - loop to define a function to convert a text representing a binary number to decimal integer, and then test it as described below. ( 1 ) Write a user - defined MATLAB function ( named...

Vince, Inc. has developed and patented a new laser disc reading device that will be marketed internationally. Which of the following factors should Vince consider in pricing the device? I. Quality of...

Drug testing provides a widely used and legal method for employers to deal with increasing drug problems at work.

Employers increasingly are facing privacy and free speech issues in areas such as blogs and wikis and whistle-blowing.

To be effective, HR policies, procedures, and rules should be consistent, necessary, applicable, understandable, reasonable, and communicated.