Question: here is the dataset https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings 1.c Data Sanity Checks 1.c.1) It is important to check if there are any internal inconsistencies within the dataset. One

here is the dataset https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings 1.c Data Sanity Checks 1.c.1) It

is important to check if there are any internal inconsistencies within the here is the dataset

https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings

1.c Data Sanity Checks 1.c.1) It is important to check if there are any internal inconsistencies within the dataset. One natural question to ask would be: are "Global_Sales" consistent with the regional sales? That is, are the sums of "NA_Sales", "EU_Sales", "JP_Sales", and "Other_Sales" equal to "Global_Sales" for all entries? Examine this problem by: 1. Creating a new column in df named "Total_Sales" which contains the summation of the columns "NA Sales", "EU_Sales", "JP Sales", and "Other_Sales". 2. Calculating the absolute difference between "Total_Sales" and "Global_Sales" for each entry, and report the largest value of the absolute difference. Store the maximal deviation in a new variable named maxdeviation. Is maxdeviation 0? If not, what are the possible reasons? Is the dataset still acceptible despite nonzero deviations? (You don't need to write any answers) In [14]: ## Your code here maxdeviation =... print("The max deviation between "Total_Sales" and "Global_Sales\" is", maxdeviation In [ ]: grader. check("q1c1") 1.c.2) Recall that we have removed all duplicated entries from the dataframe, but we still want to make sure there is no subtle web scraping issues such as misspellings that prevent redundant entries from being removed. Does each entry represent one unique game? This question can be divided into two parts. How many entries (rows) are there in the dataframe now? Store answer (an integer) in a variable named len_total. How many distinct game names (in the column "Name") are there in the dataset? Store the integer result in a variable named len_name_unique. Each entry represents one unique game if and only if the two numbers are equal. In [17] : ## Your code here len_total = ... len_name_unique = ... print ("The number of non-duplicative entries is", len_total) print ("The number of distinct game names is", len_name_unique) print ("The two numbers are {0}".format("equal" if len_total=rlen_name_unique else "not In [ ]: grader.check("q1c2") 1.c.3) To take a deeper look into the structure of the dataset, 1. Create a subset of the DataFrame containing only entries of which the game names appear more than once among all entries. 2. Sort the new DateFrame according to the Name alphabetically in ascending order. Hint: pandas.DataFrame.groupby and pandas.core.groupby.DataFrame GroupBy.filter may be useful for the tasks in 1. For concrete illustrations and usages, see the Data 100 Lecture. Store the result into df_name_multi_sorted. This practice is intended to address why there are duplicated game names. In [20]: ## Your code here df_name_multi_sorted. head (5) In [ ]: grader. check("q1c3") Important: Before proceeding to the following sections, please make sure you have passed the tests for problems in 1.b. This will ensure df is ready for the following analyses. In [4]: ## Load the required modules import pandas as pd import numpy as np import matplotlib.pyplot as plt The dataset for this homework is based on https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings. Please read the Kaggle page for the complete description of the dataset. We've replaced the column name "Platform" with "Console" to avoid a conflict due to dummy variables generations (see 3.c). We start by loading the dataset with pandas. In [5]: ## No need for modification, just run this cell df = pd. read_csv ("HW1_dataset.csv") df.head (5) Out [5]: Name Console Year_of_Release Genre Publisher NA_Sales EU_Sales JP_Sales Other 0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 28.96 3.77 1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 12.76 3.79 3 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.61 10.93 3.28 Pokemon 4 Red/Pokemon Blue GB 1996.0 Role- Playing Nintendo 11.27 8.89 10.22

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

think about what procedural changes would have the biggest positive impact, without being excessively costly for our lab members at every level (including undergrads!). Reference: the Lab Data Check...

Please help me to write a summary for attached article (about 900 words) Accounting Horizons Vol. 22, No. 4 2008 pp. 453-470 American Accounting Association DOI: 10.2308/acch.2008.22.4.453 On the...

Please provide a maximum one-page summary of how we can analyze the cost of capital for our project (below) vis--visthe peer-reviewed research article attached. Consider risk as you decide. What...

CHAPTER 1 THE BUSINESS AND SOCIETY RELATIONSHIP BUSINESS & SOCIETY Title ISBN Business and Society Archie B. Carroll; Ann K. Buchholtz 978-1-285-73429-3 Publisher Cengage Learning Author FOCUS OF THE...

I'm an undergrad accounting student in an introduction to forensic accounting course.I need help getting started on a final project for this class over a fictitious company called the Grand Teton...

In a one-page paper, answer the following: What is the function of venture capital in the United States and why is it important and/or needed? Describe the seven constants that apply across any...

All the required info is attached. Make sure to address the the requirement stated 1-4. any question let me know. Case Study: Tarheel Textiles (5-7 pages) (The IIA Research Foundation, Case Studies...

Hello, This is the Fina 210 Course about Real estate. Could you help me to solve this problem and show me a solution? because i want to know how to solve this type of question in exam. Thank you! ps:...

Week 3: No Plagiarism No content from other students papers. Post should be in APA 6th edition format, I will need References and in-text citations. This website should be useful for all APA...

A report consisting of 700 words Make a short introduction, where you explain (in your own words) what balanced management is as well as how balanced management is used by Norwegian municipalities....

If the demand and supply functions in a competitive market are Qd = 50 0.2P Qs = 10 + 0.3P and the rate of adjustment of price when the market is out of equilibrium is dP /dt = 0.4(Qd Qs) derive...

L-Corp has the following inventory lots on May 31, 2017: Inventory lot A1 A2 On June 2, 2017 L-Corp sells 50 units. On June 12, 2017, L-Corp buys 45 units at $30/unit. On June 18, 2017, L-Corp sells...

A firm expects to generate $ 4 0 0 million in net income next year. They also have interest - bearing debt service of $ 1 5 0 million that they will pay in the same year. The firm is established and...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

=+country competitive advantages? Why? Support your point of view.

=+from: a) a MNEs perspective? and b) the HRM managers perspective?

=+1 Are there aspects of a countrys culture that parallel its national efforts (or lack thereof) to support innovation?