Question: provide the details Conclusions 4. MODELING AND EVALUATION Operator Process Document (Figure 2) from data was used during the process of modeling with the aim

provide the details Conclusions 4. MODELING AND

provide the details Conclusions

4. MODELING AND EVALUATION Operator Process Document (Figure 2) from data was used during the process of modeling with the aim to execute tokenization. Suboperators used in process are Tokenize, Filter Tokens by Length, Stem and Filter Stopword (English) (Figure 3). Operator Tokenize recognize everything that is not a single letter as a special word (Token) meaning that option "non letter" is on. For filter Tokens operator minimal word length is setup to be 3 characters (because of words like "bad" which can carry strong emotion) and for maximum 25 characters. For word rooting Porters algorithm for "stemming" is used. At the end an operator for removing English stopwords is applied. Process Parameters Context XML M Process Process Documents from Data Marc create word vector vector creation TF OF Madd mata information op tot lect ATE prune method none * - datamant double sparse_art velocibutes and weights Figure 2: Process Documents from Data 39 Process Process Documents from Data Parameters 0 Vector Creation Tokenize Tokenize mode doc non letters Q Filter Tokens Filter Tokens (by Length) min chars 3 Stem (Porter) dae dos max chars 25 O Filter Stopwor. Stem (Porter dac Filter Stopwords (English) Figure 3: Process text suboperators Several models for classification were used: k-NN, W-B1 (Weka classifier) and Naive Bayes (Table 1). It turned out that using operator Naive Bayes takes shorter time for execution, but accuracy gained is lower comparing with two other operators (operator W-IB1 and operator Naive Bayes). Operator W-1B1 achieves precision as K-NN, but W-IB1 operator needs lot more time. So, k-NN operator, based on principle of neuron network, is used in this case. For assessment of optimal number "k" the Optimize Parameters node was used while for measure type "Cosine Similarity" has been used. For model evaluation, operator Validation(X-Validation) has been used. Number of validation is setup to 5 and the sample was stratified. Criteria of goodness was "accuracy" (according to contest propositions). The best result for predicting accuracy using operator K-NN was 0.53 on 6% of sample set (Figure 4). For operator Naive Bayes best accuracy achieved (in best case) is 0.22. Table 1: Comparing result of 3 operators MODEL Time 1:59 K-NN K = 1 K = 2 K = 3 K = 4 Accuracy 49.39% +/- 1.29% 49.46% +/- 1.29% 51.06% +/- 0.57% 50.38% +/- 0.68% accuracy: 49.20% +/- 1.09% 20.37% +/- 0.93% 20.37% +/-0.93% 2:00 1:52 1:37 16:43 0:31 0:32 W-IB1 Naive Bayes Laplace correction

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

Requirements Read the give information deeply and Drawing conclusions refers to information that is implied or inferred. ... Using these clues to give for deeper understanding And provide the details...

. . necessities study the give records deeply and Drawing conclusions refers to records this is implied or inferred. ... the use of those clues to give for deeper information And provide the details...

give conclusion of the above study 3. DATA AND METHODOLOGY Data mining becomes a cutting-edge information technology tool in today's competitive business world. It helps the company discover...

provide the details Conclusions 2. ANALYSIS OF RELEVANT APPROACHES The existing body of research on ETL process development is constantly expanding to respond to the growing need for the...

RESEARCH PROJECT TEMPLATE - MSLM 610 (3500-4000 words) Introduction (approximately 800 words) Discuss the Introduction here with the following in mind: Summarize the current state of knowledge...

Calculate the following ratios a. Gross profit ration b. Net Profit ratio PAGE 6 PACKAGING COMPANY LIMITED SAOG STATEMENT OF COMPREHENSIVE INCOME FOR THE YEAR ENDED 31ST DECEMBER 2020. Note 2020 2019...

Below are a number of transactions that took place in Seneca Company during the past year: a. Common stock was sold for cash. b. Interest was paid on a note, decreasing Interest Payable. c. Bonds...

Determine whether the closed-loop system in Figure 9.69 is stable or not, given the loop transfer function s+1 = L(s) Ge(s)G(s) == s(4s + 1) In addition, if the closed-loop system is stable, compute...

Read the articles on journaling and review the template provided. Identify what approach, to journaling, in the articles you found useful and why. Post to the discussion forum that you have reviewed...