Note . For this homework, we will use a Python toolkit/package called scikit-lear . For the...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Note . For this homework, we will use a Python toolkit/package called "scikit-lear . For the preprocessing, we start with tokenization using NLTK.tokenizer. You may try the other tokenization methods in the last bonus problem. . For the feature extraction, we will use the feature extractors from here scikit-learn.. . For the text classifiers, we start with Naive Bayes and Logistic Regression classifiers, but you can explore other classifiers at this page, which has a list of supervised machine leaming models. We will use the evaluation package to evaluate your model performance. Feel free to read through the linked documentation, which will be helpful for you to finish the homework challenges. Woment # you should define all your import packages here from sklearn. linear_model import LogisticRegression, Perceptron from sklearn. feature_extraction.text import CountVectorizer, fidfVectorizer from sklearn.metrics import f1_score, accuracy score #Load and preprocess datasets def load_and_preprocess (data_path, test=False): *** Load and preprocess your datasets Parameters: data_path (str): your data path test (bool): if the file is a test file *** x = [] y = [] with open (data_path) as dfile: cols = dfile.readline().strip().split(\t") review_idx= cols.index(review') rating idx= cols. index("rating') for line in dfile: if len(line) < 5: continue line = line.strip().split('\t") x.append(line[review_idx]) y.append(int(line[rating_idx])) return x, y MEANIYOR As [2]: # Load your training, development, and test datasets train_x, train_y= load_and_preprocess(/data/train.tsv') # training set dev_x, dev_y= load_and_preprocess('./data/dev.tsv) # development set test_x, test y = load_and_preprocess('./data/test.tsv, test=True) test set Task 1: Extract Features :def extract_features(x): *** This function is to extract document features of the input documents (x) Parameters: x (list): your input documents *** #Implement your function here train_x_feats = extract_features (train_x) # training set features dev_x_feats = extract_features (dev_x)# development set features # pass # # example code using count to vectorize documents # def extract features(x): # # *** This function is an example code to extract features using count vectorizer *** return vectorizer.transform(x) #train_x_feats extract_features (train_x) # dev_x_feats extract features (dev_x) vectorizer = CountVectorizer(ngram_range-(1,2), max_features-2000, win_df=2) vectorizer.fit(x) Task 2: Build your 1st classifier using Naive Bayes # # # def build_NB_classifier(x, y): Parameters: x (scipy.sparse.csr.csr_matrix): document features y (list): a list of document labels *** # Implement your function here pass nb_clf = build_NB_classifier (train_x_feats, train_y) # # example code using using Perceptron as a classifier # def build_PN_classifier (x, y): # ***A Perceptron classifier # cLf = Perceptron () clf.fit (x, y) return clf # pn_clf = build_PN_classifier (train_x_feats, train_y) Note . For this homework, we will use a Python toolkit/package called "scikit-lear . For the preprocessing, we start with tokenization using NLTK.tokenizer. You may try the other tokenization methods in the last bonus problem. . For the feature extraction, we will use the feature extractors from here scikit-learn.. . For the text classifiers, we start with Naive Bayes and Logistic Regression classifiers, but you can explore other classifiers at this page, which has a list of supervised machine leaming models. We will use the evaluation package to evaluate your model performance. Feel free to read through the linked documentation, which will be helpful for you to finish the homework challenges. Woment # you should define all your import packages here from sklearn. linear_model import LogisticRegression, Perceptron from sklearn. feature_extraction.text import CountVectorizer, fidfVectorizer from sklearn.metrics import f1_score, accuracy score #Load and preprocess datasets def load_and_preprocess (data_path, test=False): *** Load and preprocess your datasets Parameters: data_path (str): your data path test (bool): if the file is a test file *** x = [] y = [] with open (data_path) as dfile: cols = dfile.readline().strip().split(\t") review_idx= cols.index(review') rating idx= cols. index("rating') for line in dfile: if len(line) < 5: continue line = line.strip().split('\t") x.append(line[review_idx]) y.append(int(line[rating_idx])) return x, y MEANIYOR As [2]: # Load your training, development, and test datasets train_x, train_y= load_and_preprocess(/data/train.tsv') # training set dev_x, dev_y= load_and_preprocess('./data/dev.tsv) # development set test_x, test y = load_and_preprocess('./data/test.tsv, test=True) test set Task 1: Extract Features :def extract_features(x): *** This function is to extract document features of the input documents (x) Parameters: x (list): your input documents *** #Implement your function here train_x_feats = extract_features (train_x) # training set features dev_x_feats = extract_features (dev_x)# development set features # pass # # example code using count to vectorize documents # def extract features(x): # # *** This function is an example code to extract features using count vectorizer *** return vectorizer.transform(x) #train_x_feats extract_features (train_x) # dev_x_feats extract features (dev_x) vectorizer = CountVectorizer(ngram_range-(1,2), max_features-2000, win_df=2) vectorizer.fit(x) Task 2: Build your 1st classifier using Naive Bayes # # # def build_NB_classifier(x, y): Parameters: x (scipy.sparse.csr.csr_matrix): document features y (list): a list of document labels *** # Implement your function here pass nb_clf = build_NB_classifier (train_x_feats, train_y) # # example code using using Perceptron as a classifier # def build_PN_classifier (x, y): # ***A Perceptron classifier # cLf = Perceptron () clf.fit (x, y) return clf # pn_clf = build_PN_classifier (train_x_feats, train_y)
Expert Answer:
Answer rating: 100% (QA)
Lets break down the tasks and provide explanations for each part Load and Preprocess Datasets In thi... View the full answer
Related Book For
Modern Systems Analysis And Design
ISBN: 9780134204925
8th Edition
Authors: Joseph Valacich, Joey George
Posted Date:
Students also viewed these programming questions
-
David, an actuary, is explaining that insurance rates should have five characteristics. Which one of the following correctly describes what David means when he says that rates should provide for...
-
What is Financial Forecasting and it's components and importance of Financial Forecasting with example? Explain the advantages and disadvantages of Financial Forecasting?
-
Data set: WingLength2 If you completed the helicopter research project, this series of questions will help you further investigate the use of regression to determine the optimal wing length of paper...
-
What would likely happen to long-run average cost at Oman's facility if engineers encountered difficulties in maintaining the facility's substantial daily volume of oil production? Explain.
-
The simple band brake is constructed so that the ends of the friction strap are connected to the pin at A and the lever arm at B. If the wheel is subjected to a torque M, determine the smallest force...
-
The revenue bond agreement requires that the City set aside an additional $130,000 for purposes of servicing the debt if revenues are not sufficient to do so
-
Young Blood Helps Old Brains Exercise 2.69 introduces a study in which old mice were randomly assigned to receive transfusions of blood from either young mice or old mice. Researchers then measured,...
-
Consider the descriptions of management accounting provided in the chapter to identify management accounting information needs for the following: a. The managers of (1) a patient unit, where patients...
-
UNIT 1- MY SELF THE FOOD YOU EAT THE 3 MACH NUTRIENT CATEGORIES FOR HUMAN DIGESTION ARE: CARBOHYDRATES, FATS, PROTEIN DIGESTIVE SYSTEM IS RESPONSIBLE FOR DIGESTING FOOD: 1) PHYSICALLY BREAKING DOWN....
-
Jason Lang operates Jasons Cleaning Service. As the bookkeeper, you have been requested to journalize the following transactions: 2022 Oct. 1 Paid rent for two months in advance, $9,000. 6 Purchased...
-
Where the activities of a taxpayer into the acquisition and sale of property are alleged to constitute the carrying on of a trade, it is necessary to determine after consideration of all the facts of...
-
Maintenance on a test track simulator used to exercise vehicles 24/7 for engineering reliability analyses is expected to require $14,000 the first year increasing by 10 percent each year thereafter...
-
You decide to open a retirement account at your local bank that pays 8 percent/year/month (8 percent per year compounded monthly). For the next 20 years, you will deposit $400 per month into the...
-
Using real-word examples, discuss the extent to which demand-side policies are effective in reducing inflation.
-
One of the arguments for punitive damages (damages paid to the plaintiff in excess of actual damages) is that such damage payments compensate for the fact that lawsuits are not always filed, even if...
-
Does the problem of monopoly provision of a bad arise with a true Pigovian fee? Why?
-
Case study Company Case Campbell Soup Company: Watching What YouEat You might think that a well-known, veteran consumer productscompany like the Campbell Soup Company has it made. After all,...
-
Use nodal analysis to determine voltages v1, v2, and v3 in the circuit Fig. 3.76. Figure 3.76 4 S 3i, 2 A 4A
-
List the deliverables from the conceptual data modeling part of the analysis phase of the systems development process.
-
What is an RFP, and how do analysts use one to gather information on hardware and system software?
-
Provide some examples where variations in users, tasks, systems, and environmental characteristics might affect the design of system forms and reports.
-
In Appendix 16A.1, we illustrate the calculation of a standard error for the marginal effect in a probit model of transportation, Example 16.4. In the appendix, the calculation is for the marginal...
-
In Example 16.3, we illustrate the calculation of the likelihood function for the probit model in a small example. a. Calculate the probability that \(y=1\) if \(x=1.5\), given the values of the...
-
In Examples 16.2 and 16.4, we presented the linear probability and probit model estimates using an example of transportation choice. The logit model for the same example is \(P(A U T...
Study smarter with the SolutionInn App