Question: Case Study The following case study is about QA testing in analytics published in Towards Data Science website. Read the following case study and try

Case Study The following case study is about QA testing in analytics published in Towards Data Science website. Read the following case study and try to find answers to the following questions:

Why QA testing is important in Data analytics?

Explain the two common problems in machine learning and predictive analytics, and the solution described here in this case study.

Discuss the difference between cross-validation and QA testing?

QUALITY ASSURANCE TESTING IN ANALYTICS

QA Testing

Most people are familiar with the term Quality Assurance Testing in engineering. The definition of Q&A Testing given by techtarget is:

In developing products and services, quality assurance is any systematic process of checking to see whether a product or service being developed is meeting specified requirements. [] A quality assurance system is said to increase customer confidence and a companys credibility, to improve work processes and efficiency, and to enable a company to better compete with others. [] Todays quality assurance systems emphasize catching defects before they get into the final product.

Why is Q&A testing a good idea? In engineering, there are many things that could go wrong, and they only way to know that a system will work correctly is through testing. To understand how complicated testing can be, here are just some types of software testing being used in the industry:

Unit testing

System testing

User acceptance testing

Compatibility testing

Compliance testing

There are even international standards set up on QA testing for software.

QUALITY AND ASSURANCE IN DATA SCIENCE

Now, how many of you have heard of QA testing in analytics? Probably only a few. Every data scientist, however, is familiar with the concept of cross-validation, using parts of your dataset in order to test generalization performance. In simple words, use your dataset to get an estimate of how well the model will do in the real world. So, what is the difference between cross-validation and QA testing?

QA testing has an inherent business component in it. Its not just about making sure that the system works properly, but also that it achieves the goal to a certain standard. To that end, many data scientists fall to these mistakes.

1) Not choosing an appropriate metric for the task at hand. Weve already discussed about this issue in the past in the article about performance measures in predictive modelling. 2) Not making sure that the dataset at hand is representative of the real world. A common problem is concept drift. The concept that is being modelled has changed due to external factors. Think for example how much different the economy and investor behavior was before the 2008 financial crisis and after that. Applying a model built in 2006 to data from 2009 would probably yield wrong results.

3) Overfitting or underfitting are two other very common problems in machine learning and predictive analytics. We use cross-validation to guard against that, but only through excessive testing can we be sure of our models performance.

4) Not understanding how the model will translate in business terms. Reporting a prediction of lets say 2000 units sold in the next quarter can be useless without additional information. For example, what is a 99% confidence interval? It could be [1500,2500] or [1900,2100]. How does the performance translate in economic terms. An error of 100 units could translate to $1m or $10.

Figure 1: Overfitting and underfitting are two common problems in machine learning that we need to guard against.

Being a data scientist entails more than just being good in statistics or machine learning. It also entails a proper understanding of the underlying business problem and reporting results. Using the concept of quality assurance testing in data science could go a long way towards improving the final outcome and reducing the risk of model-based decision making that is inherent in predictive analytics.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!