Question: STAT 3011 Workshop on Data Analysis and Statistical Computing October, 2015 1 Instructor: SONG, Xinyuan WEI, Yingying Room 114 (39437929) Email: xysong@sta.cuhk.edu.hk Room 111 (39437922)

STAT 3011 Workshop on Data Analysis and Statistical Computing October, 2015 1 Instructor: SONG, Xinyuan WEI, Yingying Room 114 (39437929) Email: xysong@sta.cuhk.edu.hk Room 111 (39437922) Email: ywei@sta.cuhk.edu.hk Teaching Assistant: LAI, Yingying Room G30 (39438534) Email: s1155023579@sta.cuhk.edu.hk 2 Project 2: Complete datasets: https://s3.amazonaws.com/ed-college-choice-public/Colle geScorecard_Raw_Data.zip Data dictionary: meanings of variables https://collegescorecard.ed.gov/assets/CollegeScorecardDa taDictionary-09-08-2015.csv More messy, less structured. You can use various statistical methods to analyze the data, draw conclusions and report results. 3 Project Description Main objectives: 1. Identify interesting problems. 2. Explore possible findings from the data. 3. Write a technical report to present and illustrate your findings. The report should be well organized and written in English. 4. Give a clear presentation of your report in English. 4 Requirements You are asked to work in groups to study the given dataset. Each group should give a presentation and write a report on its findings. Each member of a group should participate in the project and presentation. Presentation + Q&A = 15 + 5 = 20 minutes. The report should not exceed 15 pages including all tables and figures. 5 Academic Honesty and Plagiarism Attention is drawn to University policy and regulations on honesty in academic work, and to the disciplinary guidelines and procedures applicable to breaches of such policy and regulations. Details may be found at the CUHK academic honesty website. References (1) Cite any resource you use (2) Pick a citation style and use it consistently. (3) Include every resource you cite in a bibliography at the end of your report 6 Report Structure Introduction (A brief introduction of the topic) Method (What statistical methods have you used?) Results (Tables, graphs, charts etc.) Limitations (What could have been done better?) Conclusions (What can we infer from your results? Possible extensions?) References (A list of the papers, books, websites, etc. that you used) Include programs you have written in the appendix 7 Assessment scheme Methodology 30% creativity, innovation, proper use of statistical methods Report 30% organization, completeness, critical thinking Presentation 40% clearness, time management, fluency in presentation 8 To assess individual contribution within each group, please write down an agreed percentage contribution of each member in the report. Individual marks can be adjusted according to this percentage. Submission: Send both PPT and Report to s1155023579@sta.cuhk.edu.hk by 12:00am, Nov 30 2015. Must be submitted to Veriguide with a report of similarity. 9 Presentation time list (project 2): Group 1-9 : 23/11/2015 4:30pm-7:30pm Venue: LSB C2 Note: The time and classroom may be changed according to the availability of room and students. Please check them via blackboard e-learning system before presentation. 10 Data Background This data collection is designed to compare the costs of postsecondary education and students' income prospects in US. The US Department of Education of Education combined information from the federal reporting from institutions, the student financial aid system, and the federal tax returns. All the data are summarized at the college/university level instead of at individual student level. 11 Detailed data documentation: Website: https://collegescorecard.ed.gov/data/documentation/ csv file: https://collegescorecard.ed.gov/assets/CollegeScorecardDa taDictionary-09-08-2015.csv pdf report: https://collegescorecard.ed.gov/assets/FullDataDocumentati on.pdf 12 Overview of the dataset: College ID 6-digit institution level; 8-digit campus level School information location, religion, degree type, programs offered by type Admission adimission rate, SAT scores, ACT scores Student number, ethnic groups, family (income, education background) 13 Overview of the dataset: Costs average annual costs, average annual net price Financial Aid % of students with federal loans, Pell grants, debt level Completion rate 150% time (2yr->3yr & 4yr-> 6yr), 200% time Repayment Earnings 14 Tips: Define your problem and using the corresponding part of data How to select the data, years/variables/colleges? Which population do you base your inference on? How generalizable are your conclusions? How to clean the messy data? Discuss your data inclusion/exclusion criteria clearly. Discuss potential confounding factors: chicken and egg? Missing data? 15

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!