Question: CSC 1 7 0 FINAL PROJECT DETAILS The final project is meant to be a comprehensive data analysis. You are expected to thoroughly complete the
CSC FINAL PROJECT DETAILS The final project is meant to be a comprehensive data analysis. You are expected to thoroughly complete the following steps: Acquire a dataset, read it into R and define your research question points a Im not placing restrictions on the size of your data set, but it shouldn't be too small at least a hundred rows, preferably more, and roughly columns b You may get your data from any publicly available source. Your original data may be in CSV format, scraped from a web page, or in any other format that R can read. Here are some good examples of sites that allow downloading of data files in CSV or similar, compatible formats: the US Census Bureau the Data Sets link on the left, or any links that say Direct File Access are good bets the United Nations data explorer The University of CaliforniaIrvine's machine learning repository CDC Covid Data bank MeasuringWorth.com features data about Gross Domestic Product, Consumer Price Index, wage earnings, interest rates, stocks, exchange rates, and more for the USA, the UK and Australia. Any topic if fair game sociology, public health, economics, sports, movies, whatever interests you! c Your question may change as you explore your data in step two, but you should have something fairly specific in mind when you start, and it should be documented in your final report. Explore your data points a Start with the basics to get a feel for your data set look at the summaries of the variables, the measures of central tendency like mean, median, and quartile values, and do some basic visualization such as histograms, scatterplots, or box plots. b Look for any special cases in the data set. For example, are there outliers? Are there missing values? Do the variables use a common scale or unit, or are they scaled differently? c All of these questions and any others that you consider should be answered in your final report even if the answers are mostly no you should demonstrate that you investigated them Wrangling and preparing your data points a Do you remove outliers or keep them? b What strategy do you employ, if any, for missing values? c Should you transform, rescale, or normalize any of your data? d Do you need to create any calculated variables to help with your analysis? e Do you want to group and aggregate any of your variables? f Do you need to join multiple tables together? g All of these questions should be mentioned in your final report even if the answers are mostly no you should demonstrate that you investigated them Choose the appropriate statistical or modeling techniques and perform your analysis points a Refine your original research question. Are you searching for correlations in the data? Testing a hypothesis? Making predictions? Clustering or categorizing? b Write code to perform your analysis and implement your chosen models. You should implement multiple models and choose between the best one in the next section. c Comment your code thoroughly and use good coding practices! Evaluate your model points a Write code to evaluate your model in detail. For example, if you used regression, did the data and your model meet all of the required assumptions? What can you conclude from tests of inference? How much of the variance in the data can be explained by your model? Are there confounding factors present in your data? Provide numeric results and visualizations. b Interpret the results. What did you find out? How successful was your model? How might it be improved? As a result of your evaluation, will you exclude any variables from your model that were previously included, or vice versa? Write a report using RMarkdown points a Integrate your code, results, and plots into a narrative and produce a single HTML file for your finished report. b Clearly state your results and conclusions in such a way that a portion of your report could be excerpted and given to a lay audience and be understood. c I should be able to reproduce your entire report by acquiring a copy of your data and running your RMarkdown file. At the end, be sure to state your conclusions and any unanswered questionsfuture workthings you would do differently if you were starting over. Detecting fraudulent transaction patterns. Use this dataset: in kaggle using this Bank Transaction Dataset for Fraud Detection
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
