Question: need only conclusions bb The idea for this paper came after observing the ease with which people with only basic computer training can make data

need only conclusions bb The idea for this paper

need only conclusions bb

The idea for this paper came after observing the ease with which people with only basic computer training can make data mining models using pre-made software solutions. With some data science background, it was possible for a non-programmer to create a system which gave back somewhat accurate results However, the question arose - are there any drawbacks to creating models using these tools? Several parameters were considered to possibly show differences, but they were all discarded. Prediction performance measures (accuracy, precision and recall, ROC curve) depend on the quality of the model chosen, and should be the same regardless of which type of implementation is used. On the other hand, CPU, memory and disk usage, as well as overall runtime appear to be most adequate performance measures for this case. Interest in performance analysis of data mining systems is usually focused on the system's content - on accuracy, recall, and similar. There are almost no scientific papers that are concerned with system performance from the aspect of the runtime, and no side-by-side comparison of such performances for tool- based systems For this paper, the test is comprised of a simple recommender system implemented as similarly as possible in Rapidminer and in C# and measuring the times required for implementation and average runtime. Prediction performance, as well as CPU, memory and disk usage are skipped in this paper. The expected results are that the runtime of recommender system custom-built for single purpose should be less than the runtime of a tool-based recommender system. However, the implementation time for the tool- based recommender system should be drastically lower than for custom-build system. 2. IMPLEMENTATION ENVRIONMENTS Rapidminer an open-source platform which can be traced back to the year 2001, when it was still known as YALE ("Yet Another Learning Environment", developed at the Technical University of Dortmund) (Mierswa et al, 2006). In the thirteen years since its beginnings, Rapidminer has become number one of the foremost software packages for data analysis (Piatetsky, 2013). Some of the reasons for this surely lie in its affordable pricing and high availability to students, but also in the high level of generic approach adopted by the creators. This allows for a wide variety of combinations and easy extensibility, making experimentation, modification and upgrades of existing systems a painless and quick process On the other hand, this generic approach comes with a cost, in the form of a large overhead that occurs during runtime. C# is an object-oriented programming language developed by Microsoft, and is an integral part of the NET framework. With the first appearance during the July of 2000, it predates Rapidminer by a year. It is one of the top programming languages used in the world (currently ranked in the 5th place according to TIOBE Index for April 2014 (TIOBE Software B.V., 2014)). The recommender engine presented in this paper is built from the scratch, and can be made to fit perfectly to all aspects of the data model. Both implementations were run on the same hardware specification, with Windows 8.1 Pro as the operating system 3. DATA AND ALGORITHM SPECIFICATION The problem considered is a textbook example of recommender engine implementations - the one of recommending movies to users, based on their previous ratings, and ratings of other users. The data used in this paper is a subset of the Netflix challenge (http://www.netflixprize.com/), no longer freely available due to privacy issues. Dataset contains around 3000 movies, as well as around 1.2 million of user ratings. The ratings are given in the MovieID. UserID, Ratingi format. The data for users is not explicitly given - we only have their IDs and ratings, but there are around 4000 distinct user IDs. Algorithm chosen for recommender engine is k-nearest neighbors. The algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by n attributes. Each example represents a point in an n-dimensional space. In this way, all of the training examples are stored in an n-dimensional pattern space. When given an unknown example, a k-nearest neighbor algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbors" of the unknown example, "Closeness" is defined in terms of a distance metric. For this paper, the distance metric chosen is the Euclidean distance. Recommender system that is considered in this paper is a user-based collaborative recommender, Collaborative recommenders try to predict the value of items for a particular user based on the items previously rated by other users, based on the "similarity" to the user (Adomavicius & Tuzhilin, 2005). For this dataset, the recommendation application tries to find the "neighbours" of user- other users that have similar tastes in movies (shown by rating the same movies similarly). Then, only the movies that are most liked by those neighbours would be recommended. 4. IMPLEMENTATION Recommender engines are not the part of the default Rapidminer 5 installation, but they are freely available as an extension. After the installation, we have to choose among 20 options for recommender system that suits our needs. Since the recommender that will be implemented in C# is the simple K NN collaborative filter, based on the similarity of users, that option is chosen. The implementation process for Rapidminer is straightforward and takes a short while the entire process is done in less than five minutes. The Rapidminer recommender system process is shown on Figure 1. Road Test CSV Set Test User Apply Model o que Read Train CSV M Set Train User Mod M Sot Movie Recommender Mod Figure 1 - Rapidminer recommender system process After the configuration, we are able to run the process, and get the recommendations for our test User. Runtime for Rapidminer process is on average between 9 and 10 minutes, after which the recommender system gives recommendations for all supplied User examples. Implementation of the recommender engine in C# is more open to choices, but for this paper, the choice is a simple WPF (Windows Presentation Foundation) user interface, together with filesystem access to data Domain classes are shown in Figure 2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!