Question: analize the data using R Description : In June 2015, a Knowledge Transfer Partnership between Surrey County Council and the University of Surrey begun to

analize the data using R

Description: In June 2015, a Knowledge Transfer Partnership between Surrey County Council and the University of Surrey begun to increase patronage of bus services in the county.

The United Kingdom consists out of former sovereign states England, Wales, Scotland and Ireland. England is divided into 48 counties having an estimated population of about 56.0 million in 2020. In the south-east of England is the county Surrey with an area of 1,663 km2with a population of 1,189,934 (estimate in 2018, in our database 1.16M 1,161,256). This population can be found in about 29.1k postcode areas. This increases to 50.4k when a buffer is included, and all delivery points are considered. There are 500.6k delivery points (out of which are 481.4k domestic ones within the 29.1k postcode geometries). Including the buffer there are 800.8k delivery points (out of which are 760k domestic ones with 43.9k geometries). Delivery points, addresses and other general geographical data can be obtained from Royal Mails Postcode Address File (PAF) and Ordnance Surveys portal. Please see BusAnalytics.uk for more details. The population used in the database is 1.16M and 1.86M including the buffer. The number of domestic delivery points which will be called households is 481.4k, i.e. there are about 2.41 people per household on average.

Our bus service information shows 37 bus operators (status 2018) in the county of Surrey. Several bus operators in Surrey participated in the subsequent case study. However, due to non-disclosure agreements (NDA) details about them cannot be mentioned. There are 270 routes with distinct bus service numbers operating in Surrey. Sometimes a bus service number is assigned to several slightly differing routes depending on the time or whether there were route deviations. In this study the most frequent (usually the longest) hat to be used. Furthermore, a subset of 87 routes (i.e. 32.2% of all routes) with distinct service number were analysed. Again, due to the NDA these routes will not be identified. That means displayed routes to illustrate concepts are not necessarily related to data provided by bus operators. Information about bus stops and train stations can be obtained from the UKs government page by accessing the National public transport access node (NaPTAN) data (see BusAnalytics.uk).

Figure 1 gives a geographic overview of the area of interest. The underlying heatmap was derived for the domestic households in Surrey (excluding buffer zone) using Kernel Density Estimators (KDE).

Figure 1: Surreyoverview map.

The tables to plot the figure above are:

Bus routes: bus_route;
Bus stops: bus_stop;
Train stations: train_station;
Hospitals: hospital;
Delivery points: household (business, household (=domestic properties) and others);
household: household_domestic_surrey(used for heatmap);
Town and village names: town;
Border of Surrey: surrey_boundary.

Essential passenger and model data are in aggregated form in two tables:

Route information over entire time period: routes_aggr;
Daily route information: routes_daily.

A database backup and related information can be found on SurreyLearn.

Challenges: The aim is to predict passengers for new and existing routes.

Review related literature; SQL queries include aggregation functions (e.g. count, min, max, ), visualisations created by using Power BI and R (e.g. density/histogram plots, correlation plots, PBI should include header, filters and interactive graphs and calls to R code visualisation, methods briefly explained, feature creation/selection/map, three accuracy measures for several machine learning methods (compared in tabular form), R-code (appropriate variable types chosen, control statements, loops and functions, usage of dplyr, ggplot, etc.).

Ensure that the slides in the presentation are supported by the evidence. The descriptions and interpretation must be comprehensible, useful and actionable for a professional.

Presentation Format: individual presentation (type Rpres or R Markdown-Presentation)

The completed presentation will take the following form:

Title 1 slide (must include URN)
Executive summary 1 slide (include key findings, recommended method)
Literature review 3 slides (include up to six related references + key-points)
Data explanation 3 slides (include derived features, correlation matrix; assume problem specification and data dictionary is known)
Visualisation 4 slides (include PBI timeseries with forecasts, data insights)
Prediction/classification methods 3 slides (include results and comparison for different feature sets)
SQL queries, R-code (appendix) 6 slides (includeonly essential parts)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!