Question: The datasets provided nyt1.csv, nyt2.csv, and nyt3.csv represents three (simulated) days of ads shown and clicks recorded on the New York Times homepage. Each row
The datasets provided nyt1.csv, nyt2.csv, and nyt3.csv represents three (simulated) days of ads shown and clicks recorded on the New York Times homepage. Each row represents a single user. There are 5 columns: age, gender (0=female, 1=male), number impressions, number clicks, and logged-in. Use R to handle this data. Perform some exploratory data analysis:
make a new variable, age_group, that categorizes users as "<20", "20-29", "30-39", "40-49", "50-59", "60-69", and "70+".
For each day: o Plot the distribution of number of impressions and click-through-rate (CTR = #clicks / #impressions) for these age categories o Define a new variable to segment or categorize users based on their click behavior. o Explore the data and make visual and quantitative comparisons across user segments/demographics (<20-year-old males versus <20-year-old females or logged-in versus not, for example).
Extend your analysis across days. Visualize some metrics and distributions over time.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
