Question: Help me find a data set that satisfies these requirements. it should be a .csv file. preferred U.S Government open data site. 1. Dataset Acquisition:
Help me find a data set that satisfies these requirements. it should be a .csv file. preferred U.S Government open data site.
1. Dataset Acquisition: Visit one of the following sources to download a dataset related to consumer spending: e U.S. Government's Open Data e Johns Hopkins University Data Guide e R Datasets Package Choose a dataset that includes variables such as date/time of purchase, amount spent, category of expenditure, and demographic information of the spender (if available). Your dataset must satisfy the following complexity requirements. Your total data size must be at least 1GB, and across all data *being analyzed* must meet one of the two criteria: Large Observation Criteria: > 50,000 observations and >25 features Large Feature Criteria: > 300 observations and >5,000 features 2. Data Preparation: Write and test your R code. Load the dataset into R and perform necessary data cleaning steps, such as handling missing values, correcting data types, and removing duplicates. 3. Exploratory Data Analysis: Generate summary statistics to understand the distribution and central tendencies of key variables. Create visualizations such as histograms, box plots, and scatter plots to observe relationships and patterns in the data. LEIAUULISIIIPS dilu pPdileiils 1l e udid. 4. Statistical Analysis: e Implement one of the week 6-10 topics for your dataset. e Use linear regression to explore factors that predict the amount spent. 5. Report Writing: e Compile your findings into a structured report. Include an introduction to the dataset and the objectives of the analysis. e Present the visualizations and discuss the statistical findings. Conclude with actionable business insights based on the analysis. For instance, suggest marketing strategies that could target specific consumer segments identified as high spenders. Deliverables: e R scripts used for analysis. A comprehensive report (4-6 pages) detailing the methodology, findings, and business recommendations. Assessment Criteria: e Accuracy and completeness of data cleaning and analysis. Clarity and relevance of visualizations. Depth of statistical analysis and interpretation of results. e Quality and professionalism of the final report. Timeline: Checkpoint 1: Provide a link to your dataset, a description of the data which includes size of dataset in total, number and description of features, and number of observations. Note that if your dataset does not satisfy the complexity requirements, you will be required to find a new dataset before proceeding with the project. (Note to instructor: This makes sure they are starting with an appropriate datasetimportant to check early on! I would assign this once they learn a definition of \"big data\".) Checkpoint 2: Identify one numerical feature in your dataset. Prepare your data, and write code that computes the observations that correspond to the top ten values of that feature. Compute the average value of that feature. If there were missing or corrupted data, explain what you did for those cases. How does your code reflect the size or complexity of your data? Did this affect the tools or packages you used
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
Students Have Also Explored These Related Law Questions!