Is Production Environment Really Sacred?Testing in Production

I was once tasked with testing a service that was integrated with ATM. We approached a 3rd party ATM Integrator to set up an ATM in our staging environment to test this scenario. The cost involved was very high and our testing budget did not permit us to use a third part ATM integrator. So what was the alternative? It was an unanimous decision in the program steering group to test in production, since the feature was tested properly in the previous version. A couple of internal users were tasked with testing this feature immediately after deploying the newer version to production.

Recently, I have been noticing that testing in production is becoming a popular practice, as part of the defect and incident analysis, or as a concluding test before going live, aiming to eliminate uncertainty, and give confidence to management and operations teams.

There are many factors why organizations are forced to test particular scenarios in production. First, with shrinking IT budgets, organizations are having difficulty in creating test environments that represent the full functionality the production environment contains, some of these include, load balancers, ATMs, SMS Gateways, billing, etc. Most of these aspects are tested using simulators in test environments, but ideally organizations would like actual integration with all these hardwares/softwares in the staging environment, to perform UAT before moving to production. At the same time, business process are becoming more and more integrated, which in turn demands test environments that are connected end-to-end. It is always a challenge to create a test environment that is equal to production.

With advancement of technology, the nature of services that are rolled out is complex. One of the projects I was worked on involved a mobile payment service for financial inclusion, that spanned across multiple organizations, 3rd party integrators, geographically distributed teams, and many stakeholders in the service. For example, we had to work with different stakeholders such as banks, merchants, SMS Integrators, ATMs, payment terminal vendors, billing payment aggregators and mobile payment platform provider. Add to this, all hardwares in data centers like servers, load balancers. Testing particular scenarios in such an integrated environment is possible only in the production environment (or spend huge sums of money to set up and maintain a staging environment with such a complicated integration).

Of course there are many risks associated with this practice such as creating and maintaining test accounts and test data in production, security controls and accountability, testing in production can cause production incidents, postponing real testing until deployment. (a separate blog is required to address these risks). but Testing in Production (TiP) when performed in a controlled manner within the organization’s IT policy, is a better way to ensure elimination of any remaining risks or uncertainity before GTM.