Question: #UnivBook_classification # #Google Colaboratory #https://colab.research.google.com # #... #/usr/local/toku2/sample/UnivBook.ipynb # # #php book_category.php #tar --utc -cvzf book_category.tgz book_category # #tgz[>] # !wget http://www.cs.gunma-u.ac.jp/~michi/toku2/book_category.tgz !tar zxf book_category.tgz

#UnivBook_classification # #Google Colaboratory #https://colab.research.google.com # #... #/usr/local/toku2/sample/UnivBook.ipynb # # #php book_category.php #tar --utc -cvzf book_category.tgz book_category # #tgz[>] # !wget http://www.cs.gunma-u.ac.jp/~michi/toku2/book_category.tgz !tar zxf book_category.tgz #() !ls ./book_category # !ls ./book_category | head # #8 topics = [ 'computer_graphics', 'operating_systems', 'computer_security', 'application_service', 'computer_software', 'artificial_intelligence', 'search_engine', 'information_society', ] #import import glob import re import MeCab import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split docs = [] for topic in topics: for f in glob.glob(f"./book_category/{topic}/*.txt"): # 1 with open(f, "r") as fin: body = " ".join([line.strip() for line in fin if line.strip()]) docs.append((topic,body)) # df = pd.DataFrame( docs, columns=["topic","body"], dtype="category" ) # df.head() # df.topic.value_counts() # tagger = MeCab.Tagger("-Owakati") def parse_to_wakati(text): # wakatiMeCabparse return tagger.parse(text).strip() df = df.assign(body_wakati=df.body.apply(parse_to_wakati)) # df.head() # df.body_wakati.head() # le = LabelEncoder() y = le.fit_transform(df.topic) # print(le.classes_) # print(le.transform(["computer_graphics"])) print(le.transform(["operating_systems"])) # X_train, X_test, y_train, y_test = train_test_split( df.body_wakati, # y, # test_size=0.2, # 2 random_state=10, # shuffle=True ) # from sklearn.base import BaseEstimator, TransformerMixin from sklearn.metrics import classification_report, confusion_matrix class RulebasedEstimator(BaseEstimator, TransformerMixin): def __init__(self, label_encoder): self.le = label_encoder def fit(self, X, y): return self def predict(self, X): """""" result = [] for text in X: pred = 0 if re.search(r"(|)", text): pred = self.le.transform(["computer_graphics"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["operating_systems"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["computer_security"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["application_service"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["computer_software"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["artificial_intelligence"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["search_engine"])[0] elif re.search(r"(|)", text): pred = self.le.transform(["information_society"])[0] result.append(pred) return result # rulebased = RulebasedEstimator(label_encoder=le) # rulebased_pred = rulebased.predict(X_test) # from sklearn.metrics import confusion_matrix confusion_matrix(y_test,rulebased_pred) #(Precision)(Recall)F(F-measure) print(classification_report(y_test, rulebased_pred, target_names=le.classes_)) # #https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html from sklearn.ensemble import RandomForestClassifier from sklearn.pipeline import Pipeline rf_clf = Pipeline([ ("tfidf", TfidfVectorizer()), ("clf", RandomForestClassifier()), ]) rf_clf.fit(X_train, y_train) pred = rf_clf.predict(X_test) # confusion_matrix(y_test,pred) #(Precision)(Recall)F(F-measure) print(classification_report(y_test, pred, target_names=le.classes_)) #(MultinomialNB) #) https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html from sklearn.naive_bayes import MultinomialNB from sklearn.pipeline import Pipeline text_clf = Pipeline([ ("count_vec", CountVectorizer()), ("clf", MultinomialNB()), ]) text_clf.fit(X_train, y_train) pred = text_clf.predict(X_test) # confusion_matrix(y_test,pred) #(Precision)(Recall)F(F-measure) print(classification_report(y_test, pred, target_names=le.classes_)) #SVM #https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html from sklearn.svm import LinearSVC svm_clf = Pipeline([ ("tfidf", TfidfVectorizer()), ("clf", LinearSVC()), ]) svm_clf.fit(X_train, y_train) pred = svm_clf.predict(X_test) # confusion_matrix(y_test,pred) #(Precision)(Recall)F(F-measure) print(classification_report(y_test, pred, target_names=le.classes_))

i want to increase accuracy of the algorithm, please help

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

1 ) From my understanding, Machine learning ( ML ) is a branch of artificial intelligence ( AI ) that focuses on developing algorithms and statistical models that enable computers to learn from data...

2. Answering Questions based on a database (a) (5 points) Print out a tabulated list containing the original_title, usa_gross_income, and date_published of the top 10 movies with the highest...

Write a parallel program to convert our rgb image into grey scale image and display it. Following are the details: RGB-grey-scale Image Conversion using Google Colab:...

Machine Learning (K-Nearest Neighbor Classification using Python via Jupyter Notebook / Google Colab) Q) Obtain the CIFAR-10 dataset (Python version), which is commonly used for image classification...

GLG 110 Natural Disasters Name: INVESTIGATION 7: HOW BIG WAS THE CANYON DIABLO METEORITE? Instructions: You'll need to use the Google Colab Notebook file (inv7_impact_calc.ipynb) provided on the...

Testing for Biases in Open AI s ChatGPT and Google Bard, which involves testing for biases in OpenAI's ChatGPT and Google Bard through comparative analysis. Evaluating and reflecting on AI Toxicity...

2. Read the background material on reinforcement learning by Sayak Paul: https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement- learning/ This article is reproduced in Appendix 1 in...

Not Submitted Due Feb 26, 2023 at 11:59 PM Submission Types File Upload Submission \& Rubric Description Create a class called Stack, implement the following operations on the stack: 1) Push --...

Create an array called Queue, implement the following operations on the queue: 1) Insert an element to the queue 2) Delete an element from the queue 3) Traverse the queue 4) Search an element on the...

Lab4 Not Submitted Due Mar 5, 2023 at 11:59 PM Submission Types File Upload Submission \& Rubric Description Switch To Light Mode Create a class called Queue, implement the following operations on...

A basebal thrown at an angle of 650" above the horicontal strikes a building 17.0 m away at a point 8.00 m above the point from which it is thrown Ignore air resistance. Part A Find the magnitude of...

Let g(t)= = (t+3) (2t - 3t 10) - (t2 + 2t + 3) (t2-9) (a) Find lim g(t) t--3 (b) Find the horizontal asymptote of the graph of g. (c) The graph of g(t) has one vertical asymptote, x = a. Find a, then...

The new edition of Consumer Facts has named our software the No. 1 communications software in the nation. Obviously I am extremely proud of this accomplishment, I would like to congratulate our...

Net Payout to shareholders = dividends - share repurchases + share issues true or false

Understand some of the infl uences on the business environment that have changed the approaches to employing people.

Design a job advertisement.

10-20 Describe the six features of social commerce. Provide an example for each feature describing how a business could use that feature for selling to consumers on line.