Question: For this Capstone/Final project there are No specific tasks to follow. However, the following is a general outline of what you will be able to

For this Capstone/Final project there are No specific tasks to follow. However, the following is a general outline of what you will be able to demonstrate from various skills learned in Chapters 5 and 1 through 4. Chapter 12 provided a decent summary of most (but not all) tasks required. The majority of your work and analysis should be evident in your JupyterLab notebook with detailed Headings, Code, and Comments. The following is a general outline of expectations:

  1. Get the Data:
    • Find the data on a website or in one of your company's databases or spreadsheets. (health_data.csv)
    • Read the data into a DataFrame or build a DataFrame from the data.
      • Possible Techniques Demonstrated: Finding and importing CSV, Excel, Stata, Zip, Database, or JSON using Python, Chaining methods, Markdown, and/or Magic Commands.
  2. Clean the data:
    • Remove unnecessary rows and columns.
    • Handle invalid or missing values.
    • Change object data types to datetime or numeric data types.
      • Possible Techniques Demonstrated: Drop columns and rows, Rename columns, Fix object columns, Fix data, Early plots with Pandas, Save the DataFrame, info(), nunique(), describe(), accessing subsets of rows and columns, statistical methods, pivot, melt, group, and/or aggregate
  3. Prepare the data:
    • Add columns that are derived from other columns.
    • Shape the data into the forms that are needed for your analysis.
    • Make preliminary visualizations to better understand the data.
      • Possible Techniques Demonstrated: Add columns for grouping and filtering, Create a new DataFrame in long form, Take an early plot of the long data with Seaborn, Add bins to the DataFrame, Add an average percent column, Save the wide and long DataFrames, Pandas library methods, Long vs. Wide data visualizations, plots (line, scatter, bar, histogram, density, box, pie), and/or Enhanced plots.
  4. Analyze the data:
    • Get new views of the data by grouping and aggregating the data.
    • Make visualizations that provide insights and show relationships.
      • Possible Techniques Demonstrated: Plot significant rows/columns, Compare data types, dates, gaps, and/or select values.
  5. Enhance Analysis with More preparation and analysis:
    • Enhance your visualizations so they're appropriate for your target audience
    • Be sure to demonstrate your knowledge of Predictive Analysis and Visualization with Regression Models
      • Possible Techniques Demonstrated: Advanced pots, Seaborn library methods, subplots, titles, labels, ticks, limits, subplots, Relational plots, Categorical plots, Distribution plots, Annotating plots, Colors, and/or Sizing.
  • Data Selection:
    • The World Health Organization's (WHO) "Global Health Observatory (GHO) Data" is what I have chosen for my final capstone project. A variety of global health indicators are captured by this dataset, which is accessible viaWHO GHO. These indicators include death rates, health system resources, and illness prevalence in various nations. I selected this dataset because I'm interested in worldwide trends in public health and how they affect policymaking. The knowledge gained from this data can aid in the comprehension of global health issues and resource distribution tactics. This dataset also provides a strong platform for analyzing health disparities and the advancement of health-related Sustainable Development Goals (SDGs), and it is in line with global health priorities.
  • Dataset Description:
    • The Global Health Observatory dataset has over 200,000 rows and 30 columns, making it extensive. Key variables of importance consist of:
      • Country: A category that represents several nations or areas.
      • Year: A numerical variable that represents the duration of the data collection.
      • The name of the indicator is a categorical variable that lists many health indicators, such as life expectancy and newborn mortality rate.
      • Value: A numerical variable that expresses the measurement of each indication.
      • Sex: A categorical variable that, when appropriate, specifies data based on gender.
      • Age Group: A categorical variable that, when relevant, provides information on age
    • The dataset, which covers several decades, provides insights into health trends across time. The two main data kinds in this dataset are numerical and categorical data. Its diversity of health indicators and worldwide coverage, which offer a multidimensional perspective on health data, are noteworthy aspects. Peculiarities include different approaches to data collecting and reporting, which may result in discrepancies that require cautious handling in the analytic process.
  • Research Questions:
    • Over the past three decades, what have been the global trends in life expectancy? - The purpose of this inquiry is to find variations in life expectancy and trends that may point to advancements or problems in health.
    • What differences exist in health resources between nations with high and low incomes? - Knowing the differences in health resources could help identify important areas that require assistance or intervention.
    • What connection exists between availability to clean water and newborn mortality rates in various geographic areas? - Examining this connection can help guide associated policy actions and provide insights into how basic amenities affect health outcomes.
    • Based on present trends in illness prevalence, is it possible to identify which nations are more vulnerable to public health emergencies? - Proactive resource allocation and preventive measure implementation may be aided by predictive analysis.
  • Analysis Plan:
    • Data and Cleaning Reprocessing:
      • Imputation or removal can be used to deal with missing values.
      • Standardize categorical data, make sure variable names are consistent, and, when necessary, translate non-numerical data.
    • Exploratory Data Analysis (EDA):
      • Use visualizations and descriptive statistics to comprehend data distributions.
      • To determine how variables relate to one another, use correlation matrices.
    • Visualizations:
      • Line plots are used for temporal analysis, such as patterns in life expectancy.
      • To compare the disparities in resource allocation between nations, use bar charts.
      • Use scatter plots to examine correlations (such as between access to clean water and infant mortality).
    • Statistical Methods and Machine Learning:
      • Regression models can be used to forecast possible health emergencies by analyzing patterns in indicators.
      • Find country groups with comparable health profiles by using clustering techniques.
  • Potential Challenges:
    • First, the availability and completeness of the data may be a significant barrier to starting. Analysis may be lacking because some nations still need to provide complete or current health data. One way to deal with this would be to concentrate on areas with the most comprehensive data sets and, where necessary, use statistical techniques to approximate missing values.
    • Second, because different nations have varied standards and approaches for gathering health data, there may be difficulties with data comparability and consistency. Using data normalization and standardization approaches that enable meaningful comparisons across many datasets is crucial to reducing this.
    • Last but not least, the intricacy of global health concerns may cause analysis to be overly simplistic. To address this, more thorough knowledge will be obtained using multifactorial analysis models that consider various socioeconomic and environmental factors and health data.
    • The dataset's limitations could include differences in the frequency and techniques of data collecting, which could affect how reliable the findings are. Additionally, using secondary data implies that results may be impacted by biases present in the original data-gathering method, which should be noted in the study.
  • I need help with this final project.
  • Here are the headlines:
Country Year Indicator Name Value Sex Age Group

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Law Questions!