Note . For this homework, we will use a Python toolkit/package called scikit-lear . For the...

Fantastic news! We've Found the answer you've been seeking!

Question:

Note . For this homework, we will use a Python toolkit/package called "sdikit-lear . For the preprocessing,

Task 1: Extract Features :def extract_features(x): *** This function is to extract document features of the

Transcribed Image Text:

Note . For this homework, we will use a Python toolkit/package called "scikit-lear . For the preprocessing, we start with tokenization using NLTK.tokenizer. You may try the other tokenization methods in the last bonus problem. . For the feature extraction, we will use the feature extractors from here scikit-learn.. . For the text classifiers, we start with Naive Bayes and Logistic Regression classifiers, but you can explore other classifiers at this page, which has a list of supervised machine leaming models. We will use the evaluation package to evaluate your model performance. Feel free to read through the linked documentation, which will be helpful for you to finish the homework challenges. Woment # you should define all your import packages here from sklearn. linear_model import LogisticRegression, Perceptron from sklearn. feature_extraction.text import CountVectorizer, fidfVectorizer from sklearn.metrics import f1_score, accuracy score #Load and preprocess datasets def load_and_preprocess (data_path, test=False): * Load and preprocess your datasets Parameters: data_path (str): your data path test (bool): if the file is a test file * x = [] y = [] with open (data_path) as dfile: cols = dfile.readline().strip().split(\t") review_idx= cols.index(review') rating idx= cols. index("rating') for line in dfile: if len(line) < 5: continue line = line.strip().split('\t") x.append(line[review_idx]) y.append(int(line[rating_idx])) return x, y MEANIYOR As [2]: # Load your training, development, and test datasets train_x, train_y= load_and_preprocess(/data/train.tsv') # training set dev_x, dev_y= load_and_preprocess('./data/dev.tsv) # development set test_x, test y = load_and_preprocess('./data/test.tsv, test=True) test set Task 1: Extract Features :def extract_features(x): * This function is to extract document features of the input documents (x) Parameters: x (list): your input documents * #Implement your function here train_x_feats = extract_features (train_x) # training set features dev_x_feats = extract_features (dev_x)# development set features # pass # # example code using count to vectorize documents # def extract features(x): # # * This function is an example code to extract features using count vectorizer * return vectorizer.transform(x) #train_x_feats extract_features (train_x) # dev_x_feats extract features (dev_x) vectorizer = CountVectorizer(ngram_range-(1,2), max_features-2000, win_df=2) vectorizer.fit(x) Task 2: Build your 1st classifier using Naive Bayes # # # def build_NB_classifier(x, y): Parameters: x (scipy.sparse.csr.csr_matrix): document features y (list): a list of document labels * # Implement your function here pass nb_clf = build_NB_classifier (train_x_feats, train_y) # # example code using using Perceptron as a classifier # def build_PN_classifier (x, y): # A Perceptron classifier # cLf = Perceptron () clf.fit (x, y) return clf # pn_clf = build_PN_classifier (train_x_feats, train_y) Note . For this homework, we will use a Python toolkit/package called "scikit-lear . For the preprocessing, we start with tokenization using NLTK.tokenizer. You may try the other tokenization methods in the last bonus problem. . For the feature extraction, we will use the feature extractors from here scikit-learn.. . For the text classifiers, we start with Naive Bayes and Logistic Regression classifiers, but you can explore other classifiers at this page, which has a list of supervised machine leaming models. We will use the evaluation package to evaluate your model performance. Feel free to read through the linked documentation, which will be helpful for you to finish the homework challenges. Woment # you should define all your import packages here from sklearn. linear_model import LogisticRegression, Perceptron from sklearn. feature_extraction.text import CountVectorizer, fidfVectorizer from sklearn.metrics import f1_score, accuracy score #Load and preprocess datasets def load_and_preprocess (data_path, test=False): Load and preprocess your datasets Parameters: data_path (str): your data path test (bool): if the file is a test file * x = [] y = [] with open (data_path) as dfile: cols = dfile.readline().strip().split(\t") review_idx= cols.index(review') rating idx= cols. index("rating') for line in dfile: if len(line) < 5: continue line = line.strip().split('\t") x.append(line[review_idx]) y.append(int(line[rating_idx])) return x, y MEANIYOR As [2]: # Load your training, development, and test datasets train_x, train_y= load_and_preprocess(/data/train.tsv') # training set dev_x, dev_y= load_and_preprocess('./data/dev.tsv) # development set test_x, test y = load_and_preprocess('./data/test.tsv, test=True) test set Task 1: Extract Features :def extract_features(x): * This function is to extract document features of the input documents (x) Parameters: x (list): your input documents * #Implement your function here train_x_feats = extract_features (train_x) # training set features dev_x_feats = extract_features (dev_x)# development set features # pass # # example code using count to vectorize documents # def extract features(x): # # * This function is an example code to extract features using count vectorizer * return vectorizer.transform(x) #train_x_feats extract_features (train_x) # dev_x_feats extract features (dev_x) vectorizer = CountVectorizer(ngram_range-(1,2), max_features-2000, win_df=2) vectorizer.fit(x) Task 2: Build your 1st classifier using Naive Bayes # # # def build_NB_classifier(x, y): Parameters: x (scipy.sparse.csr.csr_matrix): document features y (list): a list of document labels * # Implement your function here pass nb_clf = build_NB_classifier (train_x_feats, train_y) # # example code using using Perceptron as a classifier # def build_PN_classifier (x, y): # *A Perceptron classifier # cLf = Perceptron () clf.fit (x, y) return clf # pn_clf = build_PN_classifier (train_x_feats, train_y)

Related Book For answer-question

answer-question

Modern Systems Analysis And Design

Modern Systems Analysis And Design

ISBN: 9780134204925

8th Edition

Authors: Joseph Valacich, Joey George

See More Books

Posted Date: Oct 04, 2023 04:10 PM

See More Questions