Question: As a Data Scientist write a python script for the below project with an extensive Exploratory Data analysis: Need help with the python script for

As a Data Scientist write a python script for the below project with an extensive Exploratory Data analysis:

Need help with the python script for the below problem statement using Machine Learning models as a data scientist at EasyVisa have to analyze the data provided and provide solutions:

Context

Business communities in the United States are facing high demand for human resources, but one of the constant challenges is identifying and attracting the right talent, which is perhaps the most important element in remaining competitive. Companies in the United States look for hard

-

working, talented, and qualified individuals both locally as well as abroad.

The Immigration and Nationality Act

(

INA

)

of the US permits foreign workers to come to the United States to work on either a temporary or permanent basis. The act also protects US workers against adverse impacts on their wages or working conditions by ensuring US employers' compliance with statutory requirements when they hire foreign workers to fill workforce shortages. The immigration programs are administered by the Office of Foreign Labor Certification

(

OFLC

) .

OFLC processes job certification applications for employers seeking to bring foreign workers into the United States and grants certifications in those cases where employers can demonstrate that there are not sufficient US workers available to perform the work at wages that meet or exceed the wage paid for the occupation in the area of intended employment.

Objective

In FY

2016,

the OFLC processed

775, 979

employer applications for

1, 699, 957

positions for temporary and permanent labor certifications. This was a nine percent increase in the overall number of processed applications from the previous year. The process of reviewing every case is becoming a tedious task as the number of applicants is increasing every year.

The increasing number of applicants every year calls for a Machine Learning based solution that can help in shortlisting the candidates having higher chances of VISA approval. OFLC has hired the firm EasyVisa for data

-

driven solutions. You as a data scientist at EasyVisa have to analyze the data provided and, with the help of a classification model:

Facilitate the process of visa approvals.

Recommend a suitable profile for the applicants for whom the visa should be certified or denied based on the drivers that significantly influence the case status.

Data Description

The data contains the different attributes of the employee and the employer. The detailed data dictionary is given below.

case

_

id: ID of each visa application

continent: Information of continent the employee

education

_

_

employee: Information of education of the employee

has

_

job

_

experience: Does the employee has any job experience? Y

=

Yes; N

=

requires

_

job

_

training: Does the employee require any job training? Y

=

Yes; N

=

_

_

employees: Number of employees in the employer's company

_

_

estab: Year in which the employer's company was established

region

_

_

employment: Information of foreign worker's intended region of employment in the US

.

prevailing

_

wage: Average wage paid to similarly employed workers in a specific occupation in the area of intended employment. The purpose of the prevailing wage is to ensure that the foreign worker is not underpaid compared to other workers offering the same or similar service in the same area of employment.

unit

_

_

wage: Unit of prevailing wage. Values include Hourly, Weekly, Monthly, and Yearly.

full

_

time

_

position: Is the position of work full

-

time? Y

=

Full

-

Time Position; N

=

Part

-

Time Position

case

_

status: Flag indicating if the Visa was certified or denied

Kindly make sure that all the required information asked in the rubric is included and do an extensive EDA with as many possibilities.

Include a detailed explanation of the approach taken, inferences, and insights

Include outputs such as graphs, tables, and all other relevant information

Criteria

Exploratory Data Analysis

-

Problem definition

-

Univariate analysis

-

Bivariate analysis

-

Use appropriate visualizations to identify the patterns and insights

-

Key meaningful observations on individual variables and the relationship between variables

Data Preprocessing

-

Prepare the data for analysis

-

Feature Engineering

-

Missing value Treatment

-

Outlier Treatment

-

Ensure no data leakage among train

-

test and validation sets

Model Building

-

Original Data

-

Choose the appropriate metric for model evaluation

-

Build

5

models

(

from decision trees, bagging and boosting methods

) -

Comment on the model performance

*

You can choose NOT to build XGBoost if you are facing issues with the installation

Model Building

-

Oversampled Data

-

Oversample the train data

-

Build

5

models

(

from decision trees, bagging and boosting methods

) -

Comment on the model performance

*

You can choose NOT to build XGBoost if you are facing issues with the installation

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

*****URGENT******PLEASE ANSWER COMPLETE********AND PLEASE DON"T COPY> Your task is to investigate mortality levels of children, infants and babies around the world. Which countries have a high rate...

help with python 1: HW06a.py Solving transcendental equations with scipy: 1. Given the equation for the intrinsic carrier density ni2=BT3exp(ar) find the value of T (in degrees Keivin) for a given...

Capstone Project In this task, you will develop a Python program that performs sentiment analysis on a dataset of product reviews. Follow these steps: Download a dataset of product reviews: Consumer...

Overview In this project, you will make use of Python to explore data related to bike share systems for three major cities in the United States Chicago , New York City, and Washington. You will write...

Please do problems 1 and 2 using Python. For parts that require written explanation, use the print() function to print your answers to the screen when the script is run. Remember that unlike with...

I need immediate help!! Can someone take this start to finish and post everything required.... Database Design and Analysis Use the Accidents_2016.csv file below to implement the following steps for...

If you are skilled in Python. Please help me with this Python code and SQL code. If you need more info on what HW2 and HW3 were, just let me know and I will add more info. I really need help, idk...

Question 1 What is the essential first step in the data analytics process? 1 point Collecting and cleaning data Defining the problem Creating hypotheses Presenting to key stakeholders 2 . Question 2...

Prob. 3 Table 1 shows historical data of the thermal efficiency of some early engines. Write a Python script to apply both linear regression and quadratic regression to predict the thermal efficiency...

Hello, I am a bit stuck on my assignment this week. I believe I have figured out steps 1-3. I am a bit stuck on 4-6. Any help would be appreciated. " This notebook contains the step-by-step...

With interest rates at historic lows in the United States, what is the effect on the optimal rate of extraction for a Texas oilfield owner? Explain the intuition that supports your answer.

In a Hooke's joint, the angle between the two shafts is 15. Find the angles turned by the driving shaft when the velocity of the driven shaft is maximum, minimum and equal to that of the driving...

For the Clean and Squeaky, LLC return, choose the response that provides the most complete information regarding how the return would be different if Marcus and Malcolm formed this company as equal...

Find the point that is symmetric to the point (0, -3) with respect to the x-axis, the y-axis, and the originThe point symmetric to (0, -3) with respect to the x-axis is (Type an ordered pair.)The...