project for my Python class. These are the requirements for the project. I'm asking for guidance on
Fantastic news! We've Found the answer you've been seeking!
Question:
project for my Python class.
These are the requirements for the project. I'm asking for guidance on how I should start and present my dataset.
Transcribed Image Text:
Your dataset must have more than 1000 observations/records, 4+ numeric variables, and 1+ categorical variables. If you have a dataset that you are particularly interested in using and it does not fit these criteria, please contact me. Jupyter notebook: Your notebook should include the following sections: 1. Set-up . You must include a short description of your dataset, why you are interested in it, a link to where others can access the data, and a link to documentation about the data (if different from the data access link). 2. Thorough exploration of the non-value qualities of the data You must demonstrate for example that you can identify the number of records, number of features, variable names, variable datatypes (including whether the data is numerical vs categorical, discrete vs continuous, etc), number of unique values for categorical data, and number of missing values You must also explain whether your findings make sense - do they match your expectations? Are they consistent with the documentation for the data? 3. Thorough exploration of the data values You must demonstrate for example that you can get summary statistics about the data, identify the range of values for each variable, identify abnormal values (outliers, null values, infinities, etc), identify the categorical values for categorical data, and quantify relationships between variables (such as by calculating correlation coefficients). You must explain whether your findings make sense - Are there any unusual values or are they consistent with your expectations? Can you double-check any values against reasonable ranges? (for example, are all ages between 0 and ~100?) If there are missing values, you must eliminate them (e.g. by removing records, removing a variable from your entire dataset, or filling in the missing value with an appropriate alternative value) You must comment on whether any values would be better represented in another form (e.g. do you think a categorical variable would be better represented in numerical form? Or a numerical variable in categorical form? Is there a variable that might provide useful insights if it were instead binned, rescaled, or transformed?) Based on these comments, create at least one new column in your dataset that applies a transformation to an existing variable (binning, rescaling, numerical transformation, conversion of numerical to categorical, etc). O If you have not identified any variable on which to do this, just pick one to show the process Your dataset must have more than 1000 observations/records, 4+ numeric variables, and 1+ categorical variables. If you have a dataset that you are particularly interested in using and it does not fit these criteria, please contact me. Jupyter notebook: Your notebook should include the following sections: 1. Set-up . You must include a short description of your dataset, why you are interested in it, a link to where others can access the data, and a link to documentation about the data (if different from the data access link). 2. Thorough exploration of the non-value qualities of the data You must demonstrate for example that you can identify the number of records, number of features, variable names, variable datatypes (including whether the data is numerical vs categorical, discrete vs continuous, etc), number of unique values for categorical data, and number of missing values You must also explain whether your findings make sense - do they match your expectations? Are they consistent with the documentation for the data? 3. Thorough exploration of the data values You must demonstrate for example that you can get summary statistics about the data, identify the range of values for each variable, identify abnormal values (outliers, null values, infinities, etc), identify the categorical values for categorical data, and quantify relationships between variables (such as by calculating correlation coefficients). You must explain whether your findings make sense - Are there any unusual values or are they consistent with your expectations? Can you double-check any values against reasonable ranges? (for example, are all ages between 0 and ~100?) If there are missing values, you must eliminate them (e.g. by removing records, removing a variable from your entire dataset, or filling in the missing value with an appropriate alternative value) You must comment on whether any values would be better represented in another form (e.g. do you think a categorical variable would be better represented in numerical form? Or a numerical variable in categorical form? Is there a variable that might provide useful insights if it were instead binned, rescaled, or transformed?) Based on these comments, create at least one new column in your dataset that applies a transformation to an existing variable (binning, rescaling, numerical transformation, conversion of numerical to categorical, etc). O If you have not identified any variable on which to do this, just pick one to show the process
Expert Answer:
Related Book For
Accounting
ISBN: 978-0324188004
21st Edition
Authors: Carl s. warren, James m. reeve, Philip e. fess
Posted Date:
Students also viewed these algorithms questions
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Design a Java class that represents a cache with a fixed size. It should support operations like add, retrieve, and remove, and it should evict the least recently used item when it reaches capacity.
-
The accountant for a subunit of Mountain Sports Company went on vacation before completing the subunits monthly performance report. This is as far as she got: Requirements 1. Complete the performance...
-
Briefly discuss the cost of quality theories each of these men proposed: Crosby, Juran, and Deming.
-
The tar content in 30 samples of cigar tobacco follows: (a) Is there evidence to support the assumption that the tar content is normally distributed? (b) Find a 99% CI on the mean tar content. (c)...
-
The current loop in Figure P28.34 lies in the \(x y\) plane. For each of the Amperrian paths (a)-(e), is the line integral of the magnetic field positive, negative, or zero? Data from Figure P28.34...
-
Following is financial information describing the six operating segments that make up Fairfield, Inc. (in thousands): Consider the following questions independently. None of the six segments has a...
-
6. For an SHM oscillator, the amplitude is 5 cm and its time period is 4 seconds. The minimum time taken by the particle to pass between points which are at distances 4 cm and 3 cm from the centre of...
-
1. What is the cost of debt for Sunrise Bakery? 2. What is their cost of equity? 3. What is the WACC? 4. Which cost of capital should be used to evaluate the feasibility of the oven purchase? 5....
-
*Shear * calculate the shear stress in the connection. Shown. 800165 "Dia bolts "2" x 8" x 3/4" plates 800lbs
-
How does lean accounting help maintain competitive advantage?
-
What distinguishes a small group from a large one?
-
Define current assets and give three examples.
-
When two roles conflict, which role is likely to prevail?
-
Explain what is meant by economic value added.
-
Problem 2 involves a different scenario, where the caf owner has decided to offer catering packages to local groups. Three catering package types, breakfast, lunch, and coffee break, each consisting...
-
For the following arrangements, discuss whether they are 'in substance' lease transactions, and thus fall under the ambit of IAS 17.
-
City College wishes to monitor the efficiency and quality of its course registration process. a. Identify three input and three output measures for this process. b. Why would City College use...
-
Anguilla Company, an electronics repair store, prepared the following trial balance at the end of its first year of operations: For preparing the adjusting entries, the following data were assembled:...
-
The following conversation took place between Empire Paving Co.'s bookkeeper, Kelly Monroe, and the accounting supervisor, Jan Hargrove. Jan: Kelly, I'm thinking about bringing in a new computerized...
-
Working in a small group, collect credit card marketing information or the summary of account information sent to cardholders for three to five different cards. Be sure to protect the identity of the...
-
Interview individuals who represent the three stages of the financial life cycle about their credit card usage. How many cards do they have? What kind or class of cards (rebate, premium, affinity,...
-
Break into two or three groups to research the use of affinity cards. First, develop a list of affinity cards and their sponsors. Does your university sponsor a card? Next, each group should choose a...
Study smarter with the SolutionInn App