Question: Abstract This article describes CRISP-DM (Cross-Industry Sandand Process for Data Mining), a non-proprietary, documented, and freely available data mining model. Dezeloped by indias- try leaders

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM

Abstract This article describes CRISP-DM (Cross-Industry Sandand Process for Data Mining), a non-proprietary, documented, and freely available data mining model. Dezeloped by indias- try leaders unith input from more than 200 data mining 1123 and data mining tool and service providers. CRISP-DM is an mdustry fool and application-neutral model. This made encourages best praction and offers organizations the struc- ture needed to realize better faster results from data mining: purchased in 1998) first provided services based on data mi principles in 1990 and launched Clementine the first com cial data mining workbench in 1994. NCR, aiming to del added value to its Teradata data warehouse customers, meti clients' needs with teams of data mining consultants OHRA, of the largest Dutch insurance companies, provided a valuah testing ground for live, large-scale data mining projects CRINPDM Organiban the duta mume process to sex buses business varukerstanding, dua understanding, data prepara- er rommet evaluation, and deployment. These phases brodarsons widerstand tw due rug process and fronte a mul onwap to follower wbude planne wand auring ow a data in proxed. This article perus call sex phasex mandirus en eskis Arrud uuthwch piwner Setur me wak antiwch wudlus a Book ww wywic dute mug probleem opes and obowes for kinessing term, is prekek A year later a consortium formed with the goal of developing CRISP-DM AS CRISP-DM was intended to be industry took a application neutral, the consortium solicited input from a wid Range of practitioners and others suchu ses data warehouse ven don and management consultants with a wished interest in da muning To gain this insight, the CRISPYDYM Special Interest Group, or SIG, was created with the way of developing a standa process model to serviens the data wanaonuniy During the next several years we WSM s crveloped and retined the model serviews aware we data mining projects at Daimler Dand commercial data min ing tools began topi ke Seo proved invaluable growing to more than 200 members and holding workshops in London, New York, and Brussels In 1996, while interest in data mining wis mounting, no widely accepted approach to data mining existed. There was a clear need for a data mining process model that would standardize the industry and help organizations launch their own data mining projects. The development of a non-proprietary, documented, and freely available model would enable organizations to realize bet- ter results from data mining, encourage best practices in the industry, and help bring the market to maturity. In 2000, the presentation of the next generation of CRISP-DM_ version 1.0 reflects significant progress in the development of a standardized data processing model while future extensions and improvements are certainly expected, industry players are quickly accepting the CRISP-DM methodology CRISP-DM (CRoss-Industry Standard Prxcess for Data Mining) was conceived in late 1996 by four leaders of the nascent data mining market: Daimler Benz (now DaimlerChrysler). Integrad Solutions Ltd. (ISL), NCR, and OHRA. At the time, Daimler-Bena led most industrial and commercial organizations in ons data mining in its business operations. ISL (which spss Ine Num Ral 2000 The CRISP.DM Model continued The CRISP.DM Reference Model IN CRISINOM herishe watu wat how the plan data mining prom. The are important font dependencies between filter artile symbols the cyclical nature and illustrates that the lessons leamed dur iarining process and from the deployed solution can How often info business questions. Figure 2011 phase of the data mining process wule of bining Figure I. Phases of the CRISP-DM Reference Model Phase One: Business Understanding Perhat the most important phase of any data mining project, that initial briness understanding phase focuses on understand- ing the project objectives from a business perspective converting this knowledge into a data mining problem definition, and then developing a preliminary plan designed to achieve the chjectives In order to understand which data should later be analyzed, and how, it is vital for data mining practitioners to fully understand the business for which they are finding a solution. The business understanding phase involves several key steps including determining business objectives, assessing the situs tion determining the data mining goals, and producing the project plan. Figure 2. Tasks and Outputs of the CRISP-DM Reference Model Data 1 - M EN CM wule . Care III We M Rew MEMBUAT Le PRIN IN CON PA le www MANA MATHEW E. ROMANI YO goal cannot be effectively translated into a data mining goal, it may be wise to consider redefining the problem at this point Determine the Business Objectives Understanding a client's true goal is critical to uncovering the important factors involved in the planned project and to ensuring that the project does not result in producing the right answers to the wrong questions. To accomplish this, the data analyst must unawer the primary business objective as well as the related questions the business would like to address For example, the primary business goal could be to retain cur rent customers by predicting when they are prone to move to a competitor Examples of related business questions might be, How does the primary channel (eg, ATM, branch visit, Internet of bank customer affect whether they stay or go?" or will lower KTM fees significantly reduce the number of high- Vale customers who leve? A secondary issue might be to dermine whether lower fees sfect only one particular cus- Own Produce a Project Plan The project plan describes the intended plan for achieving the data mining goals, including outlining specific steps and a pro- posed timeline, an assessment of potential risks, and an initial nement of the tools and techniques needed to support the pro- ject. Generally accepted industry timeline standards are: 50 to 70 percent of the time and effort in a data mining project involves the Data Preparation Phase: 20 to 30 percent involves the Data Understanding Phase; only 10 to 20 percent is spent in each of the Medeling. Evaluation, and Business Understanding Phases and 5 to 10 percent is spent in the Deployment Planning Phase. Pawl data analyst always determines the measure of HOROR Ny mensured by reducing best customers by 10 kent or simply achieving a better understanding of the Sewerbe unalysts should bewano winguna whake sure that each one bines to www of the specified business Phase Two: Data Understanding The data understanding phase starts with an initial data collection The analyst then proceeds to increase familiarity with the data, to entity data quality problems, to discover Initial insights into the dator to detect interesting sules to foma by others about hid- den information. The data understanding phone invotes four ste, including the collection all data, the description of data, the exploration of data, una terrin dana quality MIN E In this ne the internalyst outlines them from person el software that sure able to accomplish the data mining project Particularly important is discovering what data is avail alute to meet the pary business goal. At this point, the data analyst also should list the assumptions made in the project issumptions such as "To address the business question, a mini- mum number of customers over age 50 is necessary. The data analyses should list the project risks, list potential solutions to those risks, crte a glossary of business and data mining terms, and construct a cost-benefit analysis for the project Collect the initial Du Here a data analyst acquires the day alam kocluding load ing and integrating this data center should make sure to report problems eneinud mul hehe solu- tions to aid with future replications of the ph For stanice data may have to be collected from veral different sources and some of these sources may have a long lag time. It is helpful to know this in advance to avoid potential delays Zwemmune er Mining Goals The data mining goal sales project objectives in business terms such as, Predict how many widgets a customer will buy given their purchases in the past three years, demographic information (age, salary, city, etc.), and the item price." Sucess also should be defined in these terms for instance, sucess could be defined * achieving a certain level of predictive accuracy. If the business Describe the Data During this step, the data analyst examines the "gross" or "sur- face properties of the acquired data and reports on the results, examining issues such as the format of the data, the quantity of the data, the number of records and fields in each table, the identities of the fields, and any other surface features of the data. The key question to ask is: Does the data acquired satisfy the rele- vant requirements? For instance, if age is an important field and the data does not reflect the entire age range, it may be wise to collect a different set of data. This step also provides a basic understanding of the data on which subsequent steps will built 15 Agrome This task tackles the data ng USS, When can be addressed using querying. Nisualization, and porn instance, data analyst may her the canal Klitseriene S proxlucts that purchasers a su particular incluye Or the analyst may nan a visualization analysis OMP tal fraud patterns. The data analyst would theniens exploration mport that outlines fisst dies, or an unuda bypoth. esis, and the potential impact on the rumander OL. DI without clean data, the results of a data mining analysis are in ston. Thus at this stage, the data analyst must either select clean subsets of data or incorporate more ambitious techniques such as estimating missing data through modeling analyses. At this point, data analysis should make sure they outline how they arkdressed each quality problem reported in the earlier "Verify Data Quality step Vert Duter One At this point, the analyse examine the quality de data, addressing questions such as: Is the data complete? Missing val wes often mun particularly if the data was collected across long periods of time. Some comme les bocheck include missing attributes and blank teks whether all possible values are repre- the plausibility or values the spelling of values, and whether attributes with dilunent values have similar meanings leg, low fat, diet). The data analyst also should review any atinbutes that any time answers that conflict with common sense (eg, teenagers with high income). carsrud Data Alter the data is cleaned, the data analyst should undertake data preparation operations such as developing entirely new records or producing derived attributes. An example of a new record would be the creation of an empty purchase record for customers who made no purchases during the past year Derived attributes, in contrast, are new attributes that are constructed from existing attributes, such as Area = Length x Width. These derived attributes should only be added if they ease the model processor facilitate the modeling algorithm, not just to reduce the number of input attributes. For instance, perhaps "income per head" is a better under attribute to use than "income per household." Another type of drived attribute is single-attribute transforma- tions, usually donned to fit the needs of the modeling tools Themations may be necessary to transform ranges to Symbolic fields ez a to age bands), or symbolic fields ("def Initysves. don't know not to numeric values. Medeling tookor algorithms often require these transformations Phase Three: Data Preparation The data preparation phase covers all activities to construct the final data stor the data that will be fed into the modeling tool(s) from the initial raw data. Tasks include table, record and attribute selection, as well as transformation and cleaning of data for modeling tools. The five steps in data preparation an the selection of data, the cleansing of data, the construction of data. the integration of data, and the formatting of data. Select Data Deciding on the data that will be used for the analysis is based several criteria, including its relevance to the data as well as quality and technical constraints such as emiso data volume or data types. For instance, while sunt de address may be used to determine which one from, the actual street address data can be reduce the amount of data that must be VIILENNON data selection prons should involve explain who was included or excluded. It recomme or more attributes are more IMU MEMANN Prema A Integrating data involves coenbining information from multiple tables or records to create new records of values. With table-based data an analyse can join two or more tables that have different information about the same objects. For instance, a retail chain has one table with information about each stores general char- acteristics (eg floor space type of mal, another table with mmand sales data cegprofil percent change in sales from prus year and another table with anlamation about the nones o sumounding area. Each of these tables con- Bains Horech se nese Lables can be menged together in With contorech store, combin- UN IR the Scene Data integration also covers aggregations. Aggregations refer to operations where new values are computed by summarizing information from multiple records and/or tables. For example, an aggregation could include converting a table of customer purchases, where there is one recond for each purchase, into a new table where there is one record for each customer. The table's fields could include the number of purchases, the average pur- chase amount, the percent of onders charged to credit cards, the percent of items under promotion, etc. data analyst develops the model based on one set of existing data and tests its validity using a separate set of data. This enables the data analyst to measure how well the model can predict history before using it to predict the future. It is usually appropriate to design the test procedure before building the model; this also has implications for data preparation Build the Model After testing, the data analyst runs the modeling tool on the pre- pared data set to create one or more models. Forma Data In some cases, the data analyst will change the format or design of the data. These changes might be simple--for example, removing legal characters from strings or trimming them to a maximum langther they may be more complex, such as these na organization of the information. Sometimes these chansons are needed to make the data suitable for a specific med eking tool. In other instances, the changes are needed to pose the data mining questions As the Mode The data mining analyst interprets the models according to his or her domain knowledge, the data mining success criteria, and the desid test design. The data mining analyst judges the success of the application of modeling and discovery techniques technically, but he or she should also work with business lysts and domain experts in order to interpret the data mining results in the business context. The data mining analyst may even choose to have the business analyse involved when creating the models for sessistance in discoverio porential problems with the data. Phase Fourt Modeling In this phase, nous modeling techniques elected and supplied and their parents are called to optimal values Typically, evenal techniques exist for the same data mining probe lemtype Semne sechniques have specific requirements on the forn al dikts. Therefore, stepping back to the data preparation phase may be sury Modeling steps include the selection of the modeling technique the generation of test design, the cre- ation of models, and the sessment of models For example, a data mining project may test the factors that ffect bank account closure data is collected at different times of the month, it could negru cant difference in the account balances of the data sets collected. (Because individ- uals tend to get paid at the end of the month, the data collected at that time would reflect higher account balances.) A business analyst familiar with the bank's operations would note such a discrepancy immediately In this phase, the data mining analyst also tries to rank the mol- els. He or she assesses the models according to the evaluation crite- na and takes into account business objectives and business success criteria. In most data mining projects, the data mining analyst applies a single technique more than once or generates data min- ing results with different alternative techniques. In this task, he or she also compares all results according to the evaluation criteria Sex Meng Rece This task refers to choosing one or more specific modeling tech- niques, such as decision tre building with C4.5 or neural net work generation with back propagation. If assumptions are ached to the modeling technique, there should be recorded Generales Design After building a model, the data analyst must let the model's quality and validity running empincal testing to determine the strength of the model. In supervised data mining tasks such as classification, it is common to use error rates as quality measures for data mining models. Therefore, we typically separate the data set into train and test set, build the model on the train set, and estimate its quality on the separate test set. In other words, the Phase Five: Evaluation Before proceeding to final deployment of the model built by the data analyst, it is important to more thoroughly evaluate the model and review the model's construction to be certain it prop. erly achieves the business objectives. Here it is critical to deter- mine if some important business issue has not been sufficiently considered. At the end of this phase, the projecl leader then should decide exactly how to use the data mining results. The key steps here are the evaluation of results, the pious review and the determination of next steps though it is often the customer, not the data analyst, who carries out the deployment steps, it is important for the customer to understand up front what actions must be taken in order to actu ally make use of the created models. The key steps here are plan deployment, plan monitoring and maintenance, the production of the final report, and review of the project. Exakte Red Previous evaluation steps deal with tons such as the securacy and generality of the model This step assesses the degree to which the model meets the business objectives and determines there is some business Pason why this model deficient Another option bere is to test the modello on meworld applica- tions... and budget casinos permit Momen alus- tion also seeks to unwel alitional dhallenges Information, or hints Norman ALLOTIS. Plan Deployment In order to deploy the data mining result(s) into the business, this task takes the evaluation results and develops a strategy for deployment Plan Monitoring and Maintenance Monitoring and maintenance are important issues if the data mining result is to become part of the day-to-day business and its environment. A carefully prepared maintenance strategy avoids connect age of data mining results, NSS data analyst summarizes the assessment en Marribasiness success criteria, including a final NUNHEMMIN MININIWither the project already meets the initial MASINI DEMON IMHINN COMMS W SUMINISCHMOane da u more thorough review of the data TRENTHERMANASONetermine if there is any important now sk. BALINIS w een overlooked. This review also CONA en GIMNASSAT ses (eg, did we correctly bulld the MUDHIPINAWWAR IN Lowable attributes that are available future deployment Ce nero end of the project, the project leader and his or her team warunt. Depending on the deployment plan, this ko bem Sof the project and its experiences Shwethane albiready been documented us an ongoing activi- Melhema and comprehensive presentation of the allinn, The includes all of the previous nabidi sumuman and wants the results. Also, then it will be a meeting the conclusion of the project, When the results and verbally presented to the customer Determine Neue At this stage, the project leader must decide whether project and move on to deployment or whether to store iterations or set up new data muning PodS. Phase Six: Deployment Model creation is generally no he end of the select the bowl edge gained mast tworkiana rand and PINSAM What time customer can use it, which often www.SUNDIMAI within an organization's decision real-time personalization of Web marketing databases Bewer The data analyst should assess failuns and successes as well as potential areas of improvement for use in future projects. This step should includes sammen of important experiences during the project and can include erviews with the significant project participants. This document could include pitfalls, misleading approaches on hints for selecting the best-suited data mining techniques in similar Namions. In ideal projects, experience decumentation Cover Reports written by individual pro- Meet membens during the project phases and tasks Depending on the WWWMAN simple as generating an UNMENT repeatable data min PHONINGIN Conclusion A Look at Data Mining Problem Types CRISP-DM was designed to prwide guidance to date ming Outlined below are several different types of data mining tech- beginners and to provide a punene process model can be niques that together can be used to solve a business problem. specialized according to the needs of any particular industry on company. The industry itaal use of the besedam kombino Datu Description and Summarization that it is a valuable uud to bennen an den Manner Dam Description and Summarization provides a concise alike. At OHRA, A new model the OROSPINA CASA description of the characteristics of data, typically in elementary model to plan and guide a highly successful determine protect and aggregated form, to give users an overview of the data's In addition, Damenchester CRUSADO VEIO structure. Data description and summarization alone can be an its own socialized customer relationshme. KORINO objective of a data mining project. For instance, a retailer might tool to pe RASUAL beterested in the turnover of all outlets, broken down by cate- gone. summarizing changes and differences as compared to a SPSS NORS Survitens ve adete Previous period in almost all data mining projects, data descrip CRISP-DM and HUR DES HOUSE ton and summarization is a sub-goal in the process, typically in CATMUHIMINI MASHARITHA MINUS Kar ses where initial exploratory data analysis can help to Serve understand the nature of the data and to find potential hypothe: CRININ dilue analists es for hier information. Summarization also plays an impor- huwa Mkand for the industan and role in the presentation of final results, , CORSI SCHIMIA Melanenced in invitations to be ang perts seal pics OLAR and EIS SYS- and RAMADHURSER endala arts and semimation but do not Had to nare adattica mexel- CRISE DNI WALA RIN ANC demic manner Taration sconsidered a stand- working from bechung aus mordid elite comunitas TAN , theset gurus credit what we can. CRISI DM Succe e en any data mining engagements is soudly based on the wedd didu mangerten ence in that respect the data mining industry and Segmenta indeed to the many practitioners who contributed them The Mb DOMA puntes the data and their ideas throughout the CRISPADN pret AMIRSANTES sahigroups on classes that share For instance, lo shopping basket analy The CRISP-DM process model is met mene of baskets depending on the items Instruction book that will instantly we hemos contain: A can segment certain subgus as rele: wowio succeed in data miningHowwe ocument ant for the best question based on peide knowledge or in data mining methodology and Taschen kun e Liddescnption and summarization. However tance from more experienced practice center Here are automate chastening techniques that can detect wool to help less experienand data minina, ses encanta Sunspected and he structures, in data that allow value and the survolved in the centre de SA mentation OM KAN dan problem type of its own ents the main purse Forex RE and with higher than merge age Ar mais existents on Heation of sap MONIO Volume 5 Number Fall 2000 The CRISP-DM Model, continued toward solving other problern types where the purpose is to keep the size of the data manageable or to find homogeneous data subsets that are easier to analyze ample Using data about the buyers of new cars and using a nule indue- tion technique, a car company could genverste rules that describe its loyal and disloyal customers. Below are simplified examples of the generated rules propriate techniques Clustering techniques Neural nets - Visualization SEX - male and AGE > 51 then CUSTOMER - bowl W SEX female and AGE > 27 thn CUSTOMER - loyal Aurple Near company regularly collas Nomen its customers concerning their socioeconomic characteristics. Using dus analysis, the company can divide its as the under standable bgroups analyze the structure of each subgroup and Meer die marketing strategies for each neun sarately Classification Cassification assumes that there is a set of objects character ized by some attribute or feature which belong to different classes. The class label is a discrete (symbolie) value and is known for each object. The objective is to build dassification models imeti ness that sign the correct class Tabel to VIENU jects Classification med todassification sess the edit WASTANID Descriptions Www aims at anders bruseni The purpose is nie to develop cennete med hon sccuracy, but its For ny may be interested amtebou O customers of these IN tid disloyal customers the many Wir be done to keep us to instorm MSN to loyal customes. Typically rentation is Homept description Selectriques, such lustering techniques perfona sementation and Gone encription at the same time. Concept descnpsons also can be used for classification punaise, On the other hand, some classification techniques produce under! standable classification models, which then can be considerext: concept descriptions. The important distinction is that classifica tion aims to be complete in some sense. The classification, needs to apply to all cases in the selected population. On the other hand, concept descriptions need not be completa cient if they describe important pans of the concepts omela Appropriate che Rule induction methods Conceptual clustering have information on the payment behavior of cants. By combining this formation fonnation about the customers, such exte, anel. It is possible to develop a system to classify new cus- sus pood or bad customers, the credit risk in wap a la astmer is either low or high respectively): Prediction Another important problem type that occurs in a wide range of applications is prediction. Prediction is very similar to classifica- tion, but unlike classification, the target attribuite (class) in pre- diction is not a qualitative discrete attribute but a antinuous one. The aim of prediction is to find the numerical value of the target attribute for unseen objects. This problem te is seene- times called regression. I prediction deals with time senudas, then it is often called forecasting Dependency analysis has close connections to prediction and classification, where dependencies are implicitly used for the for- mulation of predictive models. There also is a connection to con- cept descriptions, which often highlight dependencies. In appli- cations, dependency analysis often co-occurs with segmentation. In large data sets, dependencies are seldom significant because many influences overlay each other. In such cases, it is advisable to perform a dependency analysis on more homogeneous seg ments of the data propriate fechniques Regression analysis Region trees Naranets INSU Neighbor Selenkins methods et alorithms Sapuential patterns are a special kind of dependencies where the order of events is considered. In the shopping basket domain, associations describe dependencies between items at a given time, Sequential patterns describe shopping patterns of one particular Customer or a group of customers over time. propriate Technikest Comelation analysis Regression analysis - Association in Bayusin W Indone JOXE LAMONGAN Senue of an internal company is med Whentributes such as advertisement exchange rate and aving these values are simti In the company predictisepee INTRO se mine find a signifi- und its price Dependency Analysis Its finds a model that describes dependencies (or asseclations betwem dan tumoren Dependencies can be used to predict the value of a diena item. given information on other data items. Although denkendencies can be used for predictive modeling, they are mostly used for understanding Dependencies can be sind or inabilities Associations are a special case of dependencies that have recently become very popular Associations describe iffinities of data itens (le, data items or events that frequently occur together) A typi- cal application scenario for associations is the analysis of shop ping baskets. There, a rule such as t'in 30 percent of all purchasi es, beer and peanuts have been bought together is a typical example of an association. Algorithms for detecting associations are very fast and produce many associations, Select the HSE interesting ones is often a challenge 21 The CRISP-DM Model, continued CRISP-DM Glossary Task - Series of activities to produce one or more outputs part of a phase. Activity - Part of a task in User Guide describes actions to per form a Trask. User guide - Specific advice on how to perform data mining projects. CRISP-DM methodology- The Realm for all oncepts developed and defined in CRISP-DM. BIOGRAPHY Data mining context Set of constraints and sustions such as problem type techniques or tools and en doen, Data mining problem type Class apie Listing noiems such as data description and simman hem tation concept descriptions classification prediction, and depen- Helsis Colin Shearer is Vice President, data mining business derulament utb, SPSS Brastness Intelligence. She 7984. Ex as been modo ang walanced share solutions to soleng basustes problemas. A ons mwah SD Scicon and O Sistems we was one of the power of bederal Solutions in pomen of data mining in the warty 1994. Sener wech ze muandamine Clerkenwe. W wa muming pool med at rom- VaR WHW A balanced numer O NAS unui mares including www. www kerese. In Cumbi . ret the world Lewe mamprises Kencintask that holds across all possible data mining pro- lewens monene e, applicable to be the whole data mining OSINONIMINOWA wsible data minin valid for In dements such as edeling techniques MAMA HINDI MO apply to a sette pediat MALLROHNEN Nederuable usputangible result of performing Phase High-level tem for part of the press model consists of related tasks. SPISS Inc. with Matthew Mant 23 Wacker 11th Floor Chicago 312.05.06 Email martingass.com Process instance - A specific project described in terms of the process model Process model-Defines the structure of data mining pro- jects and provides guidance for their execution consists of refer- ence model and user guide. Reference model - Decomposition of data mining projects into phases, tasks, and outputs. Specialized - A task that makes specific assumptions in se dific data mining contexts

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!