Use the read_csv function from pandas to read in the dataset from the file path and...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
• Use the read_csv function from pandas to read in the dataset from the file path and return the resulting dataframe. In [ ]: def read_data(file_path): ... Reads in a dataset using pandas. Parameters file_path string containing path to a file Returns. pandas dataframe with data read in from the file path # YOUR CODE HERE In [ ]: tips = read_data('data/tips.csv') assert_equal (type (tips), pd.core.frame.DataFrame, msg="Your function does't return a dataframe") assert_equal(len (tips), 244, msg="The dataset should have 244 rows. Your solution only has %s"%len(tips)) print("2 random rows of the dataset tips:") tips.sample (2) Activa Go to S For this problem you will work on the DataFrame tips created from problem 1 autograder cell. Encode the categorical feature 'sex' by using LabelEncoder. Store encoded sex in a new column named 'sex_code'. After this problem, DataFrame tips should have one more column 'sex_code' in addition to the original columns. In []: from sklearn.preprocessing import LabelEncoder # YOUR CODE HERE In [ ]: assert_true ('sex_code' in tips.columns, msg="tips doesn't have 'sex_code' column") assert true (0 in tips.sex_code.unique(), msg="sex is not properly encoded") assert true (1 in tips.sex_code.unique (), msg="sex is not properly encoded") tips.head (2) Activato! For this problem you will work on the DataFrame tips created from problem 1 and updated by problem 2. In the problem, you will prepare dependent and independent variables from DataFrame tips for a regression problem. To complete this process, do the following: Define dependent variable y which is the 'tip' column. Define independent variable x which contains 'total_bill' and 'sex_code' columns. After this problem, there are two new variables defined, x and y. y is a Pandas Series and x is a Pandas DataFrame with two columns. In [ ]: # YOUR CODE HERE In [ ]: assert_equal (type (x), pd.core.frame.DataFrame, msg="x should be a DataFrame") assert equal (type (y), pd.core.frame. Series, msg="x should be a Series") assert_equal(len (x.columns), 2, msg="x should have two columns") assert true('sex_code' in x.columns, msg="sex_code is not in the independent variable list") assert_equal (y [0], 1.01, msg="dependent variable values are not right") Activate Windows Go to Settings to activate Windows. This problem works on the variables x and y created in problem 3. Split the independent and dependent variables to training and testing set. To complete this process, do the following: • Name the training and testing independent variable to x_train and x_test • Name the training and testing dependent variable to y_train and y_test • The test size argument in train_test_split should be set to 0.3. •The random_state argument in train_test_split should be set to 23. After this problem, there are 4 new variables defined, x_train, x_test, y_train and y_test In []: from sklearn.model_selection import train_test_split # YOUR CODE HERE In [ ]: assert_equal(x_train.shape [0], 170, msg="Training set doesn't have correct size") assert_equal(x_test.shape [0], 74, msg="Testing set doesn't have correct size") # Test independent values assert_equal (x_train.total_bill[0], 16.99, msg="Training indenpendent data is wrong"). assert_equal (y_train [0], 1.01, msg="Training dependent data is wrong") Activat Go to Se • Use the read_csv function from pandas to read in the dataset from the file path and return the resulting dataframe. In [ ]: def read_data(file_path): ... Reads in a dataset using pandas. Parameters file_path string containing path to a file Returns. pandas dataframe with data read in from the file path # YOUR CODE HERE In [ ]: tips = read_data('data/tips.csv') assert_equal (type (tips), pd.core.frame.DataFrame, msg="Your function does't return a dataframe") assert_equal(len (tips), 244, msg="The dataset should have 244 rows. Your solution only has %s"%len(tips)) print("2 random rows of the dataset tips:") tips.sample (2) Activa Go to S For this problem you will work on the DataFrame tips created from problem 1 autograder cell. Encode the categorical feature 'sex' by using LabelEncoder. Store encoded sex in a new column named 'sex_code'. After this problem, DataFrame tips should have one more column 'sex_code' in addition to the original columns. In []: from sklearn.preprocessing import LabelEncoder # YOUR CODE HERE In [ ]: assert_true ('sex_code' in tips.columns, msg="tips doesn't have 'sex_code' column") assert true (0 in tips.sex_code.unique(), msg="sex is not properly encoded") assert true (1 in tips.sex_code.unique (), msg="sex is not properly encoded") tips.head (2) Activato! For this problem you will work on the DataFrame tips created from problem 1 and updated by problem 2. In the problem, you will prepare dependent and independent variables from DataFrame tips for a regression problem. To complete this process, do the following: Define dependent variable y which is the 'tip' column. Define independent variable x which contains 'total_bill' and 'sex_code' columns. After this problem, there are two new variables defined, x and y. y is a Pandas Series and x is a Pandas DataFrame with two columns. In [ ]: # YOUR CODE HERE In [ ]: assert_equal (type (x), pd.core.frame.DataFrame, msg="x should be a DataFrame") assert equal (type (y), pd.core.frame. Series, msg="x should be a Series") assert_equal(len (x.columns), 2, msg="x should have two columns") assert true('sex_code' in x.columns, msg="sex_code is not in the independent variable list") assert_equal (y [0], 1.01, msg="dependent variable values are not right") Activate Windows Go to Settings to activate Windows. This problem works on the variables x and y created in problem 3. Split the independent and dependent variables to training and testing set. To complete this process, do the following: • Name the training and testing independent variable to x_train and x_test • Name the training and testing dependent variable to y_train and y_test • The test size argument in train_test_split should be set to 0.3. •The random_state argument in train_test_split should be set to 23. After this problem, there are 4 new variables defined, x_train, x_test, y_train and y_test In []: from sklearn.model_selection import train_test_split # YOUR CODE HERE In [ ]: assert_equal(x_train.shape [0], 170, msg="Training set doesn't have correct size") assert_equal(x_test.shape [0], 74, msg="Testing set doesn't have correct size") # Test independent values assert_equal (x_train.total_bill[0], 16.99, msg="Training indenpendent data is wrong"). assert_equal (y_train [0], 1.01, msg="Training dependent data is wrong") Activat Go to Se
Expert Answer:
Answer rating: 100% (QA)
Your code looks great It passes all of the assertions Here is a summary of your code PYTHON def read... View the full answer
Related Book For
Introduction to Java Programming, Comprehensive Version
ISBN: 978-0133761313
10th Edition
Authors: Y. Daniel Liang
Posted Date:
Students also viewed these programming questions
-
What considerations (ethical or otherwise) do professional sports leagues prioritize when making decisions about relocating teams, particularly in light of the economic impact on local communities...
-
Luzadis Company makes furniture using the latest automated technology. The company uses a job-order costing system and applies manufacturing overhead cost to products based on machine-hours. The...
-
This assignment reviews object-oriented programming concepts such as classes, methods, constructors, accessor methods, and access modifiers. It makes use of an array of objects as a class data...
-
Alistair bought a house on 1 April 2000 for 125,000 and occupied the entire house as his principal private residence until 1 November 2008. As from that date, he rented out two rooms (comprising...
-
Suppose that you are given an n n checkerboard and a checker. You must move the checker from the bottom edge of the board to the top edge of the board according to the following rule. At each step...
-
Presently, cost estimates for construction, operations & maintenance, and salvage value have been provided for students to perform present, annual, and future worth calculations. This assignment is...
-
Consider the hypotheses for the general linear model, which are of the form \[H_{0}: \mathbf{T} \beta=\mathbf{c}, \quad H_{1}: \mathbf{T} \beta eq \mathbf{c}\] where $\mathbf{T}$ is a $q \times p$...
-
What contemporary factors are contributing to the internationalization of the subject of accounting?
-
1.In Australian juice market, what is the market research and pricing and positioning of new launch juice in the existing market? 2. What kinds of market strategies are used by new product of juice...
-
1. What are the advantages and disadvantages of the in-home method of selling Project Home products? 2. What other channels of distribution might Project Home use? 3. What do you think about the name...
-
The projection is defined by the equations x and y 12 ln ((1 sin ) (1 sin )), where is the longitude of the point in the center of the map Compose a program that accepts and the latitude and...
-
The ethical perspective that suggests that organizational decisions should be made in accordance with established rules or guidelines is known as __________. A. the self-interest view B. the justice...
-
Is it necessary that the five steps in the strategic management process be performed sequentially? Why or why not?
-
A no-frills product targeted at the market at large is consistent with the __________. A. low-cost strategy B. differentiation strategy C. focus strategy D. none of the above
-
The assessment of strategies and related processes that promote superior performance from both market and environmental perspectives is known as _________. A. CSR B. managerial ethics C. management...
-
Which of the following is not part of the marketing strategy? A. pricing B. distribution C. promotion D. none of the above
-
Question 15 (1 point) Which of the following is not a quality perspective? Product-centric Quality Supply chain-centric Quality User-centric Quality Manufacturing-centric Quality
-
One of the significant and relevant accounts for this cycle is equipment. For this account, what would typically be the most relevant assertions for the auditor to consider? Why is it important for...
-
Programming Exercise 7.35 presents a console version of the popular hangman game. Write a GUI program that lets a user play the game. The user guesses a word by entering one letter at a time, as...
-
Write a method to sort a two-dimensional array using the following header: public static void sort(int m[][]) The method performs a primary sort on rows and a secondary sort on columns. For example,...
-
Write a program that animates the AVL tree insert, delete, and search methods, as shown in Figure 26.1. 2 i = hash(key) An entry ikey value N-1 Hash function FIGURE 27.1 A hash function maps a key to...
-
Longfellow Ranch in Pecos County, Texas. While Amerimex would usually provide mobile bunkhouses for its crews on site, the present contract required them to locate the bunkhouses 30 miles north in...
-
T-Mobile customers with qualifying plans can participate in a promotional service called T-Mobile Tuesdays, which offers free items and discounts from various well-known stores. Messages are sent...
-
Frankie and Trena Gibbs and Joel and Madeira Glenn were members of the Creek Baptist Church. On Saturday, May 31, 2008, the Gatlin Creek Baptist Church, which both couples had attended, held a...
Study smarter with the SolutionInn App