Question: . necessities study the give records deeply and Drawing conclusions refers to records this is implied or inferred. ... the use of those clues to

.
necessities study the give records deeply and Drawing conclusions refers to records this is implied or inferred. ... the use of those clues to give for deeper information And provide the details Conclusions
Because of the large number of data points we decided to use sample of whole dataset, and we used 10% of it to learn algorithm behavior. The algorithms that we have used are: 1. Decision tree - It is an algorithm that is used for classification on both nominal and numerical data and to create a classification model that predicts the value of a target attribute (often called class or label). (Quinlan, 1986) 2. Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. 3. K Nearest Neighbor(k-NN from now on)is one of those algorithms that are very simple to understand but works incredibly well in practice. Also it is surprisingly versatile and its applications range from vision to proteins to computational geometry to graphs and so on. In pattern recognition, the k-Nearest Neighbors algorithm (or K-NN for short) is a non-parametric method used for classification and regression. In k-NN classification, the output a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. In K-NN regression, the output is the property value for the object. This value is the average of the values of its nearest neighbors. Both for classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. K-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The K-NN algorithm is among the simplest of all machine learning algorithms. The basic k-Nearest Neighbor algorithm is composed of two steps: Find the k training examples that are closest to the unseen example. Take the most commonly occurring classification for these k examples (or, in the case of regression, take the average of these k label values).A shortcoming of the K-NN algorithm is that it is sensitive to the local structure of the data.(Cover & Hart, 1967) 4. Neural networks try to mimic the human brain by using artificial 'neurons' to compare attributes to one another and look for strong connections. An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and functional aspects of biological neural networks. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Neural networks are usually used to model complex relationships between inputs and outputs or to find patterns in data. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) to the output nodes.(Haykin, 1994) We decided to use this algorithm because we have already found some results of this Ford's competition obtained with some others prithms (Logistic Regression, SVM, etc.) For the evaluation we have used two types of evaluation criteria: The accuracy of our algorithm, which is calculated as in formula below: (TP + TN) Accurecy = N (1) TP - True Positive TN - True Negative N-Number of data points in sample The Area Under the Curve (AUC) - it presents a measure of discriminatory power of classifiers. It is very good for measuring performances of binary classifiers and it takes values from zero to one. When value is above 0.5 it is a positive class prediction, and when is between 0.7 and 0.8 it is a good classifier, between 0.8 and 0.9 very good, and above 0.9, it is excellent. Validation of model, which is used to prevent over fitting, was done using 10-fold cross validation. This means that original dataset was divided to 10 disjunctive parts, from which 9 subsets were used for training of model and remaining one for test. This process is iterated 10 times, each having different subset for testing the model. (Kohavi, 1995)Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
