Question: using R language Scenario 1: Consider the hotels dataset that contains the following columns for all the hotels in the entire world: Column Description Hotel_id

using R language

Scenario 1: Consider the hotels dataset that contains the following columns for all the hotels in the entire world:

Column Description

Hotel_id Hotel's unique id

Hotel_name Hotel's name

Hotel_city The city name that hotel is in

Hotel_country_code 2-letter code of the country that hotel is in (e.g., for France, 2 letter code is FR)

Latitude. Latitude of the hotel's location

Longitude Longitude of the hotel's location

The dataset has 400,000 rows (hotels). For simplicity, assume that there are no rows or columns with NULL values. However, latitude, longitude or hotel_country_code columns might contain incorrect values. According to some analysis you are told that approximately 5% of the dataset (20,000 hotels) has incorrect hotel_country_code values, and only 1% of the dataset (4,000 hotels) has incorrect latitude or longitude values. This dataset is going to be used for very important project. Therefore, incorrect hotel_country_code values should be found and corrected first.

Question 1: What approach would you implement to correct those incorrect hotel_country_codes?

Question 2: If you think the columns in the given dataset is not enough to solve the problem, and you might need additional data, what data do you think you would need, and why?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!