Sorry, you need to enable JavaScript to visit this website.

Application of Data Science in Product Testing

Application of Data Science in Product Testing
November 03, 2017

The product development process, as it was known (‘code, test, and release’) has been going through a continuous evolution over the years. Right from waterfall and iterative to agile and DevOps (Continuous Integration, Continuous Delivery), the changes have been significant. The evolution took place owing, primarily, to a competitive marketplace and the need for companies to keep themselves ahead of the game by offering best-in-class product features to the consumers, as quickly as possible.

Secondly, the advent of tools & technologies (Automation Testing frameworks, Automated deployment, infrastructure as code, dynamic elasticity and scale of Cloud, service fabric, containerization, serverless infrastructure) and more advanced practices like Test in Production have also been instrumental in the product development process.

While all of this has been developing by leaps and bounds, in terms of agility of the product development cycle, one basic requirement remains constant - the quality of the product and customer experience should be paramount. This means Testing as a function (gate to release, scanning through the prism of spec) had to evolve too, not only to adapt to the changes but also to explore the opportunities of how it could continue to be relevant and add value in the new paradigm.

Data Science these days is dominating the discussions on technology solutions across the board, be it industry solutions or horizontal functions like sales, marketing, operations, customer service, and so on. Product development itself has great potential to benefit from it, and hence in this blog, we will explore how it can help improve the quality of a product and its lifecycle. We shall explore the application of Data Science in both the conventional practice of testing, as well as in new, path-breaking ways of improving product quality i.e. testing beyond the Specs.

The application of Data Science is seen in testing and in ways of improving product quality.

Enhance Conventional Testing using Test Result Prediction Engine

Test Result Prediction Engine (a binary classification machine learning model), predicts, of all the tests in your test repository, the ones which may potentially fail or pass due to a change or changes in a build. The Prediction Engine learns from the historical record of the test execution results against the changes made to the code and makes a statistical judgment of the possible result (fail or pass) and a confidence factor (degree of certainty to which the prediction would be correct).

The extent of accuracy of the Prediction Engine may depend upon the level of intelligence available or what can be collected about test cases, its execution history, and details of changes. For instance, a file that is changed itself can be an indicator, but # of lines changed in the file can be an even more useful indicator of the degree of change for a machine learning model to make its judgment.

The Test Result Prediction Engine can be leveraged in many ways. Listed below are some use cases.

Expedited Test Cycles and Shortened time to market with lower Cost of testing

With Agile development methodology mostly in practice today, the changes are made to the code in a rapid manner. Hence, testing becomes a choking point, as the testers need to test not only whether the changes made are as per the requirement, but also ensure that it did not cause a regression to the existing functionality. Hence, in a complex distributed code environment, for every change, the scope of testing becomes multifold in proportion to the scope of the change itself.

The predicted results from the Test Results Prediction engine + the confidence factor, combined with human judgment (based on an understanding of priority and importance), can help obtain an Optimized Test suit, a subset of test cases, that must be run, excluding the test cases that can be safely skipped to cover both the new functionality and regression.

The Optimized Test Suit obtained via Prediction Engine, can help to A) Reduce Testing Time, by only executing the subset of test cases as against full test pass. B) Reduce cost of testing, person-hours spent on testing in case of Manual and infrastructure setup and compute cost in case of automated testing. Combining these two can give a perpetual benefit of lower costs, safe and effective, shortened release cycle.

Reduced Cost of Failure through Early Bug Detection

It is a universally accepted fact that the more delayed the identification of a bug in the product lifecycle is, the higher is the cost of fixing it. Directly, due to the introduction of additional test cycles, and indirectly, an even higher cost of remediating the impact after it is released to the consumer, in terms of customer support tickets and more importantly, an aggrieved customer.

Test Prediction Engine can be effective in reducing the cost of failure by using it as a prioritization criterion for test cases execution order, i.e. the higher is the likelihood of a test failing, the earlier it can be run in the test cycle to catch bugs early.

Introducing a continuous integration process (gated-check-ins) in the Test Prediction Engine can provide a greater level of efficiency by preventing a failure at the very beginning by not letting a possible bug to be introduced in the first place and save even the triggering of the test cycle. Typically, gated check-ins are enforced via BVTs (Build verifications Tests). The Prediction Engine can enhance the effectiveness of gated check-ins by providing a gated check-in test suite in real time by considering the changes that are a part of the check-in.

Tool for Effort Estimation and Planning

The Prediction Engine can also be used as a tool for planning. Consider leveraging the predicted results in estimating the effort or time involved in releasing the updates to the product by predicting the test cases that would fail and estimating the effort involved in fixing the failed test cases and regressing them. When combined with feature implementation effort (Number of developers, testers, complexity, etc.) it may give an estimate of the time and effort that may be involved in releasing to production.

A Paradigm shift for Test as function: Testing beyond the Spec

So far, we have discussed the application of Data Science and how we can take advantage of it to complement conventional or structured testing i.e. Testing against Specs and ensure that the product works as everyone building it agreed to and were able to articulate in the spec.

But what about customers? They are not always interested in what you have in the spec (in other words, that is a given). They are impressed by the experience they have with the product while using it. Many products have adopted practices of Testing in production (sliced roll-out or experimentation). This makes it more pertinent for testers to think more out of the box approaches for testing.

In this section, we shall explore how the role of QA can benefit from Data Science beyond testing what is in the specs (which possibly can be automated mostly) and add real value to improve the quality of the product for the customers in the field.

Techniques of Data Mining & Natural Language Processing can be leveraged effectively in understanding the customers’ interaction with the product, get feedback, and provide useful inputs to the testers to test the product beyond the specs.

Product Telemetry

Organizations today are putting a conscious effort in instrumenting their product to collect telemetry data for the product. Telemetry, as its name implies, is a detailed timestamped log of events and interactions of the customer with the product. This is a goldmine not only for product intelligence and feature planning, but also for testers to define their strategy of product testing.

Here are some of the examples of how Test can use Telemetry data and tune their Test strategy accordingly.

Organizations today are putting a conscious effort in instrumenting their product to collect telemetry data for the product.

  • Mining customer interaction scenarios from Telemetry to identify the most common, most critical or most error prone scenarios.
  • Help identify gaps in Telemetry for better data collection

Customer Feedback

While Telemetry helps capture the customer interactions with the product in a structured manner, customer feedback is one of the other most important intelligence to learn what customers find good or bad about the product. The customer feedback can come from many channels:

  • Direct customer feedback (semi structured or unstructured feedback from product surveys, official product forums etc.)
  • Customer support Issues (semi structured or unstructured feedback support tickets)
  • Social Forums (unstructured feedback from customers’ references of the product on social forums like Facebook, Twitter, blogs etc.)

The customer feedback data is another goldmine for testers to help define their product testing approach and strategy. Using Natural language processing, Text classification, and Auto Labeling techniques, the unstructured customer feedback can be mined and used to identify:

  • Features of the product that customers are having more problems with, so that a more sound testing strategy can be created for testing those features by fixing gaps in coverage
  • Patterns of problems the customers are facing, to add new scenarios in Testing

Data Science application in Product Development is a promising prospect and what we talked about above are some of the possible benefits that we could think of and articulate in the blog. The opportunities are limitless when it comes to Data Driven Quality. It may help transform the way testing as a role is perceived and functions in an organization. But like all problems, seeking an answer in Data Science, data, and procuring good quality data, is very important.

Hence, it is very critical for organizations to start collecting the data that matters. Here are some of the recommendations-

  • Test Cases repository
  • Bug Repository
  • Test Execution Result log (Bugs mapped to Failures)
  • Builds and Changes logs
  • Product Telemetry
  • Customer Feedback