Question: Overview There are several publicly available data sets available to encourage citizen data scientists to use data to answer questions, investigate problems, or explore relationships.

Overview
There are several publicly available data sets available to encourage citizen data scientists to use data to answer questions, investigate problems, or explore relationships. For instance, several questions that could be investigated with the consumer complaint database include:
1. Which banks provide the best service for mortgages in a specific state? (Service could be defined as having a low number of complaints relative to their total number of customers and providing a timely response to complaints that occur)
2. Looking at the open-ended responses provided by consumers, which are the top complaints related to mortgages? Do these complaints suggest additional subcategories of complaints that could be added?
3. How have complaints changed over time?
4. How does demographic information change the nature of the complaints (compare wealthy vs. poor areas, areas with high level of education vs. low, urban vs. suburban, etc.)?
Project Components and Due Dates
For this project, you will initially work alone on Part 1, and then will complete Part 2 with 1 other person. You will demonstrate your knowledge of BI tools, and analyze a data set with a specific question or questions in mind.
Getting Started In-Class Activity
Find a suitable and Primary Fact Table that is based on a personal area of interest. Aim for a table with 5000 rows, and at least 6-8 fields (columns) with both numbers and categorical (text) attributes. Less than 5000 records is OK, but aim for greater than 35,000 cells.
Use Google searches to gain access to large data sets that are supportive of your interests, and include in your search, terms which may render unique fields (usually categorical) that can be joined with other tables.
For instance, Google Violent crime statistics by zip code,Education outcomes by University ID,NFL players by college institution,Section 8 housing statistics by FIPS code.
FIPS codes, GEOID codes, hospital ID, county and State, Vehicle Identification Number (VIN), Airport Code are all unique identifiers. And the .gov websites have 200,000 data sets that you are free to use, as well as others that you may find at work or other sources during your research project.
Avoid using Year as your unique identifier. It will work, but your data set will end up summarized by year. Even with 10 years data, when you join with your dimension table you have an N of 10. There are 4000 unique FIPS codes in the US,7500 unique university IDs.....
The important thing to keep in mind is that you are demonstrating skills learned in class using power pivot. If you search in vain for hours on a topic of interest, change your topic.
Start the following:
o Load data into PowerPivot
o Identify questions that would be of interest to you that may be supported
o Identify unique fields that could be candidates for table joins with a dimension table
Part 1: Individual Data Model and Metadata 10% of term grade
Due date is Wed 3/27 by 11:59 pm. This is an Individual Assignment! Even if you are already have a partner, you will work by yourself for this part of the assignment. You will combine the work from both partners in Part 2.
The Excel file you submit should
Include Primary Fact table (This is the same table you presented for the Data Challenge kickoff).
Add at least 4 calculated columns to the Primary Fact table that could be relevant for your reporting.
Add 1 dimension table of your choice to model. There must be a valid relationship between the table you add and the Primary Fact Table.
o You cannot use a data set that I provided for you in class.
Add at least 4 calculated columns to the Dimension table that could be relevant for your reporting.
Create 3 PivotTables.
o On each PivotTable, add textboxes describing what the tables mean
o On each PivotTable, use appropriate formatting for the data type. For example, if working with money, change to use Currency data type. When displaying averages, show a reasonable number of numbers after the decimal point.
o Use conditional formatting on at least 1 of the Pivot Tables
o Use sorting on at least 1 of the Pivot Tables
o Use filtering on at least 1 of the Pivot Tables
The Word document you submit should
Your research questions that you wish to investigate using your data.
Describe the metadata for your Primary Fact Table, and provide URL for your dataset
Describe the metadata for dimension table you added and provide the URL for your dataset
For the metadata, for each column in your data set list the Column Name, Data type, range of valid values or a sample list of values, and a brief description. For data sets with 1600 columns, I would delete at least 1550 that you will not use. Below is an excerpt of a well-documented data set about Universities.
Column Name Data Type Valid Values Description
Unit ID Whole Number ID >10000 Universities Identifier
Universities Name Text no mo

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!