Question: Overview There are several publicly available data sets available to encourage citizen data scientists to use data to answer questions, investigate problems, or explore relationships.
Overview
There are several publicly available data sets available to encourage citizen data scientists to use data to answer questions, investigate problems, or explore relationships. For instance, several questions that could be investigated with the consumer complaint database include:
Which banks provide the best service for mortgages in a specific state? Service could be defined as having a low number of complaints relative to their total number of customers and providing a timely response to complaints that occur
Looking at the openended responses provided by consumers, which are the top complaints related to mortgages? Do these complaints suggest additional subcategories of complaints that could be added?
How have complaints changed over time?
How does demographic information change the nature of the complaints compare wealthy vs poor areas, areas with high level of education vs low, urban vs suburban, etc.
Project Components and Due Dates
For this project, you will initially work alone on Part and then will complete Part with other person. You will demonstrate your knowledge of BI tools, and analyze a data set with a specific question or questions in mind.
Getting Started InClass Activity
Find a suitable and Primary Fact Table that is based on a personal area of interest. Aim for a table with rows, and at least fields columns with both numbers and categorical text attributes. Less than records is OK but aim for greater than cells.
Use Google searches to gain access to large data sets that are supportive of your interests, and include in your search, terms which may render unique fields usually categorical that can be joined with other tables.
For instance, Google Violent crime statistics by zip codeEducation outcomes by University IDNFL players by college institutionSection housing statistics by FIPS code
FIPS codes, GEOID codes, hospital ID county and State, Vehicle Identification Number VIN Airport Code are all unique identifiers. And the gov websites have data sets that you are free to use, as well as others that you may find at work or other sources during your research project.
Avoid using Year as your unique identifier. It will work, but your data set will end up summarized by year. Even with years data, when you join with your dimension table you have an N of There are unique FIPS codes in the US unique university IDs
The important thing to keep in mind is that you are demonstrating skills learned in class using power pivot. If you search in vain for hours on a topic of interest, change your topic.
Start the following:
o Load data into PowerPivot
o Identify questions that would be of interest to you that may be supported
o Identify unique fields that could be candidates for table joins with a dimension table
Part : Individual Data Model and Metadata of term grade
Due date is Wed by : pm This is an Individual Assignment! Even if you are already have a partner, you will work by yourself for this part of the assignment. You will combine the work from both partners in Part
The Excel file you submit should
Include Primary Fact table This is the same table you presented for the Data Challenge kickoff
Add at least calculated columns to the Primary Fact table that could be relevant for your reporting.
Add dimension table of your choice to model. There must be a valid relationship between the table you add and the Primary Fact Table.
o You cannot use a data set that I provided for you in class.
Add at least calculated columns to the Dimension table that could be relevant for your reporting.
Create PivotTables.
o On each PivotTable, add textboxes describing what the tables mean
o On each PivotTable, use appropriate formatting for the data type. For example, if working with money, change to use Currency data type. When displaying averages, show a reasonable number of numbers after the decimal point.
o Use conditional formatting on at least of the Pivot Tables
o Use sorting on at least of the Pivot Tables
o Use filtering on at least of the Pivot Tables
The Word document you submit should
Your research questions that you wish to investigate using your data.
Describe the metadata for your Primary Fact Table, and provide URL for your dataset
Describe the metadata for dimension table you added and provide the URL for your dataset
For the metadata, for each column in your data set list the Column Name, Data type, range of valid values or a sample list of values, and a brief description. For data sets with columns, I would delete at least that you will not use. Below is an excerpt of a welldocumented data set about Universities.
Column Name Data Type Valid Values Description
Unit ID Whole Number ID Universities Identifier
Universities Name Text no mo
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
