Question: Make sure you have run the necessary libraries in R: library(haven) library(tidyverse) library(stargazer) 1a). Preliminary data manipulation (5 points) Investigate the AGE_12 variable head(mydata$AGE_12)

Make sure you have run the necessary libraries in R: library(haven) library(tidyverse) library(stargazer) 1a). Preliminary data manipulation (5 points) Investigate the AGE_12 variable head(mydata$AGE_12) Delete (filter command) observations for the youngest workers 15 to 19 years old, "AGE_12=1" and delete observations for the oldest workers 70 and over, "AGE_12=12". Delete more categories of age depending on the last digit of your student id. If your student ID ends with a zero or five, delete "AGE_12=-2 |AGE_12==11". If your student ID ends with a one or six, delete "AGE_12==2 | AGE_12==3. If your student ID ends with a two or a seven, delete "AGE_12--11". If your student ID ends with a three or an eight, delete "AGE_12==2". If your student ID ends with a four or a nine, delete "AGE_12=-10 |AGE_12=11". How many observations are there in your sample? You can use glimpse() or just check the global environment. 1b) (15 points) Summarize hourly wages HRLYEARN. Do not forget to prefix the variable name by the dataframe name. How many observations with zero hourly wages are there? How many observations with missing hourly wages? Who do you think are those individuals with missing wages? Compute ("mutate") the log of hourly wages, lwage=log(HRLYEARN) What is the mean hourly wage? What is the mean log wage? Note: the simplest way to make a histogram (using the ggplot visualisation from the tidiverse library): ggplot (mydata, aes (x-HRLYEARN)) + geom_histogram () ggplot (mydata, aes (x=lwage)) + geom_histogram () These plots are a basic and you can improve on the visualisation if you are feeling ambitious by adding parameters to the ggplot commands (In the answer key I will leave it as in the basic example from above). Here is one possible source for inspiration (there are many): http://www.sthda.com/english/wiki/ggplot2-histogram-plot-quick-start-guide-r- software-and-data-visualization 2 From here on use the dataframe with non-missing lwage. filter (!is.na (lwage)) It is not a bad idea to give the dataframe a new name at this point. 1c) (5 points) Consider the following variables: HRLYEARN AGE_12 EDUC SEX COWMAIN UNION Investigate each variable either from the drop-down arrow in the global environment or using the "head" or "glimpse" commands, e.g. glimpse(mydata$HRLYEARN) Which variables are continuous? Which variables are categorical (factors)? 1d) (10 points) Investigate the EDUC variable. For factor variables, R has imported data labels from Stata, which it is storing as attributes. Rather than reading the LFS codebook, look at what are the values for the factor variable EDUC using the "head" command: head(mydata$EDUC) Tabulate the fraction of individuals in each education category. You can use a simple "table" command or you can also ask R to compute the percentages for you: prop.table(table(mydata$EDUC)) Report in a table the fraction of individuals in each education category, rounding fractions to three decimal points. Refer to each education category by its label not by its number. For instance, for EDUC=0, label it as "0 to 8 years". Which is the largest education category? 1e) (15 points) Familiarize yourself with the values and labels for these factor variables: EDUC, AGE_12, SEX, COWMAIN using the "head" command. Mutate a gender variable taking the values O and 1, for instance 0 for men and 1 for women: sex-SEX-1 Run a log wage regression including the following regressors: EDUC, AGE_12, sex, COWMAIN Do not forget to prefix the categorical variables by "factor" (if there is a single indicator like sex you may still declare it as factor if you want to): 1m (1wage factor (EDUC) +factor (AGE_12) +sex+factor (COWMAIN), mydata) What is the base category for each of the factor variables? Report the model including coefficients and variable (category) names, and R2. Either use the same format to report regressions results as was used in Assignment 1: (i) (ii) Inwage=Bo+ Some High school+... where you have to replace the beta hats with estimated coefficients. Or, report the same model in a table format (obtained, for instance, using the stargazer library). In future assignments we will report regression results in tables. Discuss the R2. Does the model have a good fit? 1f) (20 points) Interpret the coefficients from the wage regression, referring to each base category for the factor variables. Download and import in R the LFS for October 2022. This is the same file used in the labs. You will create your own sample based off of certain age requirements later on in Question la). Data download: Retrieve the Labour Force Survey from Odesi (using Nesstar web retrieval system) 1) Go to http://odesi2.scholarsportal.info/webview/ 2) Navigate to: Labour and Employment -> Canada -> Labour Force Survey (LFS) - > 2020s -> 2022 3) Choose October 2022 4) Select the "Save" button from the top right panel and 4b) Choose "Download as Stata v8"

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Economics Questions!

Discuss how lean production could work in the service sector.What can you imagine it would take to implement?How might you measure the effectiveness of your suggestions?Provide an example. Your...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Brexit poll analysis - Part 1 Directions There are 12 multi-part problems in this comprehensive assessment that review concepts from the entire course. The problems are split over above coding Make...

Brexit poll analysis - Part 1 Final Assessment due Sep 14, 2020 16:55 IST Bookmark this page Assessment for verified learners Test your inference and modeling skills with this case study analyzing...

Directions Make sure your package is up to date with the commandinstall.packages("dslabs"). Overview In June 2016, the United Kingdom (UK) held a referendum to determine whether the country would...

Introduction and learning objectives When you were learning about operational analysis earlier in the term, we talked about jobs that require multiple visits to the CPU (or servers) to receive their...

A Juarez, Mexico, manufacturer of roofing supplies has developed monthly forecasts for a family of products. Data for the 6-month period January to June are presented in the table below. There are 8...

Explain the different types of sampling methods. Conducting a research study and hypothesize that adult learners who are completing their degrees online have a better educational experience than...

The Divison of prollts and losses among the members of a partnership is formalized in the Multiple choice group chater. indemnity clause. parthership agreement. indenture contract. Statement of...

In 2017, the cost of producing a foot of pipe was $0.30, and the selling price was $0.39 per foot. In 2018, production costs increased to $0.40 per foot, although the selling price remained at $0.39....

Atkinson Construction assembles residential houses. It uses a job-costing system with two direct-cost categories (direct materials and direct labor) and one indirect-cost pool (assembly support)....

On January 1, 2013, Gates Corporation issued $100,000 of 5-year bonds due December 31, 2017, for $103,604.79 minus bond issue costs of $3,000. The bonds carry a stated rate of interest of 13% payable...

Gable Company uses three activity cost pools. Each pool has a cost driver. Information for Gable Company follows: Required: 1. Compute the activity rate for each activity. Round to three decimal...

Dakota Products uses a job-costing system with two direct-cost categories (direct materials and direct manufacturing labor) and one manufacturing overhead cost pool. Dakota allocates manufacturing...

What is the largest component of the long - term care workforce, and what are some of the challenges in developing and retaining these workers?

Dora was very excited about her committees recent approval of her proposal for her masters thesis project. After several arduous sets of revisions, she was finally ready to collect her data....

1. What descriptive information can we provide to the EEOC regarding the current MC test being used? How about the proposed one? 2. Create appropriate graphs to describe the current and proposed MC...

Caleb knew he had to pay a visit to Dr. Zavala, the instructor of his psychometrics course. Calebs grade on the first exam was, shall we say, less than impressive. While his performance on the...

Consider the following events: a. The price of cell phones goes down by 25 percent during a sale. b. You get a 25 percent raise at your job. Which event represents a shift in the demand curve? Which...

Consider shopping for cucumbers in a farmers market. For each statement below, note which characteristic of competitive markets the statement describes. Choose from: standardized good, full...

Consider the following events: a. A maggot infestation ruins a large number of apple orchards in the state of Washington. b. Demand for apples goes down, causing the price to fall. Which event...