Question: Shell Script: Task: Data Cleaning:you can use bash / shell tools and awk script. Our World in Data is a truly excellent resource for high

Shell Script:
Task: Data Cleaning:you can use bash /shell tools and awk script.
Our World in Data is a truly excellent resource for high quality data across a wide range of topics. For example, during acute phase of the Covid19 pandemic, Our World in Data created and hosted many analyses of Covid19 disease data.
Relevant to this assignment are three, linked datasets from the Gallup world happiness surveys, averaged by country for a range of years, and then aligned with data from national sources related to GDP, homicide rates per 100,000 population, size of the population, and life-expectancy. Linked below, tab-separated file (.tsv) versions :
gdp-vs-happiness.tsv(Headers are :Entity,Code,Year,Cantril ladder score,GDP per capita, PPP (constant 2017 international $),Population (historical estimates),Continent)
homicide-rate-unodc.tsv(Headers are: Entity,Code,Year,Homicide rate per 100,000 population - Both sexes - All ages)
life-satisfaction-vs-life-expectancy.tsv(Headers are:Entity,Code,Year,Life expectancy - Sex: all - Age: at birth - Variant: estimates,Cantril ladder score,Population (historical estimates),Continent)
[Screenshots are provided for the data files below]
you will be implementing a data-cleaning stage.
Data Cleaning:
You are to create a top level Bash script, called cantril_data_cleaning (Note: No suffix!), which will use Bash plus Shell tools, to clean the data. cantril_data_cleaning expects three .tsv files corresponding to the three files from Our World in Data. The input files may be in any order. The output is expected to be a tab-separated data directed, as before, to standard output.
The overall program should, for a given datafile:
1. Based on the header (i.e. top) line, make sure that the file is a tab-separated format file
2.Also based on the header line, report any lines that do not have the same number of cells. (Cells are allowed be empty.)[Ignore the rows and also print the error message to stdout]
3.Remove the column with header Continent, which is sparsely populated and is not present in one of the files.
4.Ignore the rows that do not represent countries (the country code field is empty)
5.Ignore the rows for years outside those for which we have at least some Cantril data. (Cantril data may be absent in certain years within the range of those for which there otherwise is data; those cells should be retained.)[The range of years are 2011 to 2021.So basically consider the rows only having years between 2011 and 2021(both inclusive).
The output file sent to stdout should have rows with the data in the following order (tab separated):
While the contents of the input files may change , and the order that they are provided to cantril_data_cleaning may vary, you can assume that the order of the columns in the various input files will not change.
Hint: You will notice that the country year combination of cells is unique within each of the three input files.[That means country code and a year combination will be present only once in each file].
So in short,cantril data cleaning script takes three tab seperate files in any order but columns/headers inside them does not change ,need to get a output with a tsv file headers mentioned above.
Screenshots of the three files and sample excpted file are below:
Entity
Monaco Code Year
C
E
F
G
H
Monaco
Monaco
Monaco
Monaco
Monaco
Monaco
Hong Kong
Macao
Hong Kong
Monaco
Hong Kong
Hong Kong
Macao
Macao
Hong Kong
Macao
Monaco
Monaco
Japan
Monaco
Japan
Macao
Monaco
Monaco
Australia
Hong Kong
Monaco
Japan
Australia
Hong Kong
Japan
MCO
MCO
MCO
MCO
MCO
MCO
MCO
HKG
MAC
HKG
MCO
HKG
HKG
MAC
MAC
HKG
MAC
MCO
MCO
JPN
MCO
JPN
MAC
MCO
MCO
AUS
HKG
MCO
JPN
AUS
HKG
JPN
life-satisfaction-vs-life-expec \table[[Entity,Code,Year,Homicide rate per 100,000 population - Both sexes - All ages],[Afghanistan,AFG,2009,4.0715265],[Afghanistan,AFG,2010,3.4870927],[Afghanistan,AFG,2011,4.208668],[Afghanistan,AFG,2012,6.393913],[Afghanistan,AFG,2015,9.975262],[Afghanistan,AFG,2016,6.6924186],[Afghanistan,AFG,2017,6.8006945],[Afghanistan,AFG,2018,6.7435727],[Afghanistan,AFG,2019,7.180397],[Afghanistan,AFG,2020,6.594439],[Afghanistan,AFG,2021,4.0224977],[Africa (UN),,2000,13.645948],[Africa (UN),,2001,13.589387],[Africa (UN),,2002,13.524751],[Africa (UN),,2003,13.095996],[Africa (UN),,2004,12.850431],[Africa (UN),,2005,12.7091255],[Africa (UN),,2006,12.759147],[Africa (UN),,2007,12.555357],[Africa (UN),,2008,12.498256],[Africa (UN),,2009,12.342634],[Africa (UN),,2010,12.226504],[Africa (UN),,2011,12.290624],[Africa (UN),,2012,12.340445],[Africa (UN),,2013,12.359684],[Africa (UN),,2014,12.483554],[Africa (UN),,2015,12.366553],[Africa (UN),,2016,12.334789],[Africa (UN),,2017,12.51506],[Africa (UN),,2018,12.317689],[Africa (UN),,2019,12.313706],[Africa (UN),,2020,12.066697],[Africa (UN),,2021,12.655763],[\Delta lhania,\Delta F,199),]] gdp-vs-happiness \table[[A,B,c,D,E,F,G,H,I],[Entity,Code,Year,GDP per capi,i Population (historical estimates) I,Homicide rate per 100,000 population - Both sexes - All ag,Life

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!