Question: Question 2 (30 marks) This question is designed to get you started on a data investigation that will be developed into a larger investigation for

Question 2 (30 marks)

This question is designed to get you started on a data investigation that will be developed into a larger investigation for the end-of-module assessment (EMA). The UK Food Standards Agency (FSA) publishes a wide range of data relating to food establishments and food standards [ https://www.food.gov.uk/our-data ]. As well as a data catalogue [https://data.food.gov.uk/catalog], the FSA also publishes information on accessing their data via an open data landing page [ https://ratings.food.gov.uk/open-data ].

While answering this question, you should ask yourself what stories the dataset might contain that could be explored using the techniques you have studied. You may find it useful to consider your data exploration notebooks as portfolio pieces in which you demonstrate what you have learned through studying TM351 in the context of using data analysis and visualisation techniques in an investigative or exploratory setting. You are encouraged to make use of a database, where appropriate, that contains cleaned and validated data when completing this question. For example, you may use the MongoDB database that you created in Question 1. If you are unable to use a database, or prefer not to, you will not be penalised if you do work using data loaded from a file directly into a dataframe, although you should justify why you have not used a database approach. If instead you make a DataFrame from the original datafiles (for example, 2024J_TMA02_data/FSA/FHRS870en-GB.json) and use that for your investigation, you should address any data cleaning and validation considerations, as explored in Question 1, part b, before making analytical use of it.

2(a)

Question 1 will have given you a basic feel for the Food Standards Agency food hygiene ratings scheme (FHRS) data. In this question, you should further explore the FHRS data and investigate another aspect of it, or answer a question of your own devising based on it. You are not limited to reporting on just food hygiene rating scores. For example, you might also use the FHRS dataset as a basis for a range of other exploratory questions, such as geographically profiling various food related businesses, comparing the size of corporate groupings or other forms of "competitive intelligence", etc.

Using additional data

You may retrieve additional data from the Food Standards Agency as part of your investigation if you wish to do so, but you are not required to do so. The additional data may be obtained either as downloaded files or via the API and should be of the same form as the provided sample data. A code fragment demonstrating how to call the API is provided in the yourPI_q2a_lab_notebook.ipynb notebook. If you do retrieve any additional data, you should describe and justify how you manage it, for example, by adding it to your database, or using a sensible file naming and management strategy

Presenting your answer

Use level 1 headings in Markdown cells in the Notebook to help your tutor identify regions in the Notebook that demonstrate you have performed the required steps. In addition, each discrete manipulation of the data should be presented in its own code cell (or cells, if it is clearer to break the code up a bit) and be preceded by at least one markdown cell explaining what the code is intended to do, and followed by at least one markdown cell explaining the code cell's output or return value.

In summary, in your notebook you should investigate a question of your own devising using the FHRS data. Question 1 will have given you a basic feel for the data. You should:

  • Summarise the main characteristics of the dataset that allow you to perform your exploration.
  • Make and label at least two different plots to visualise different aspects of the data. You should use at least two different types of plot, e.g. scatter, line, bar, etc.
    • Bear in mind that your plots should have a purpose: they should be used for exploring or explaining some aspect of the data. So when presenting your plots, you should also interpret the plot in terms of what it says about the data, and what it means for exploring the data.
    • Any plots you include in your report for Part 2(b) must have a meaningful title, appropriately labelled axes and all the text should be legible.
  • Make at least one folium map.
    • Your folium map should illustrate a different aspect of the data from your map in question 1(e).
    • As with your other plots, your map should reflect and communicate something that you have discovered about the data.
  • Include notes critically evaluating what you think your investigations and visualisations tell you. If you use a prompt-based coding assistant such as CoPilot or ChatGPT to generate any of your code, add a comment to the code describing the prompt(s) you used to help generate it.

(20 marks)

2(b)

For this part of question 2, you should work in your solution document, under the heading "Question 2(b)". In this part you will use your findings from part 2(a) to write a brief report using the following outline structure:

  • Aims and objectives
  • Background
  • Sources of data (original source; locally managed source)
  • Analysis pipeline
  • Findings Conclusions
  • References

Your report should be no more than 650 words. Some sections may be very short. You should present your results in a form that highlights the relevant results; you must include at least two, but no more than four, visualisations (including at least one folium map and at least one other visualisation).

You should critically evaluate your results and their presentation, including mentioning any confounding factors that may weaken your conclusions. These could include concerns about the reliability or coverage of the data, or other influences which are not included so you may want to consider what you learnt about the datasets in Question 1.

You should use references in your report, as appropriate, to support your conclusions and give a context for your investigation. All references (both the in-text citations and in the reference list) must be given in Cite them right Harvard style (Open University, 2023). You must include a reference to the notebook you used in your investigation so that your results may be independently verified.

(10 marks)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!