Question: I had initially picked a different data set that did not work out. This is the new one I found from Kaggle that is similar

I had initially picked a different data set that did not work out. This is the new one I found from Kaggle that is similar to what I initially had but at a much larger scale. I am stuck on where to even start with this data set.

I need help completing Part 3 using the new data set.

Fertility Rate (kaggle.com) Fertility Rate of 186 Countries in 1960-2020

Project.

Finding a data set of your interest and downloading it and describing it.

There are many publicly available data sets that you can use for your project. The library has compiled a list of many possible sources of data. Click on the link below to explore these sources.write

https://davenport.libguides.com/data

The data set you select must have:

  • At least 50 observations (50 rows) and at least 4 variables (columns) excluding identification variables
  • At least one dependent variable

You must provide:

  • A proper citation of the data source using APA style format
  • A discussion on how the data was collected and by whom
  • The number of variables in the data set
  • The number of observations/subjects in the data set
  • A description of each variable together with an explanation of how it is measured (e.g. the unit of measurement).

2. Cleaning the data by checking for outliers and missing data

Data cleaning is the process of inspecting your data for:

  • Unusual entries or outliers
  • Missing data
  • Incorrect data entries
  • Taking action on any data issues identified and accurately documenting the action taken.

For more information on data cleaning and exploration, read the article in the following linkhttps://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/

3. Exploring the data and summarizing it using descriptive statistics, graphs, etc.

You will need to provide summary statistics of each variable in your data set. There are many ways to summarize your data and you are encouraged to be creative but also accurate in how you summarize and present your data.In general:

  • A categorical variable is summarized using a frequency table and visualized using bar charts and pie charts
  • A pair of categorical variables is summarized using a contingency table
  • A numeric variable is summarized using descriptive statistics: measures of central tendency (mean, median, and mode), measures of variation or dispersion (range, standard deviation), and measures of position (z-scores, percentiles).
  • A histogram, dot plot or stem-and-leaf plot, are used to provide visual information on the distribution of a variable
    • An outlier can easily be identified using a box plot
    • Visual inspection of histogram can also be used to assess if a variable is normally distributed
  • A pair of numeric variables is summarized using a scatter plot
    • A scatter plot is usually a good indicator of whether two variables are correlated or not

4. Conducting multivariate analysis such as multiple regression analysis

You will be required to use at least one of the following advanced statistical methods covered in the course for your analysis:

  • Multiple Regression Analysis.
  • T-tests and ANOVA
  • Time Series Analysis
  • Logistic Regression Analysis
  • Factor Analysis

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!