Adding chaos to performance testing
November 13, 2023

The Importance of Chaos Testing

Performance Testing
DevOps

Imagine the following scene. Your team has successfully finished development of a very important product change. The change was thoroughly tested during development, there was a regression testing cycle executed, and an additional round of load tests was executed. All test results were green and your team was looking forward to seeing the positive impact of the delivered change to production.  

But the change at the end brought nothing but bad blood – the application started to crash under the load so often that the entire change had to get rolled back. Instead of satisfaction from the good work done, everybody was left with frustration: developers, testers and most importantly – the end users of the application. 

I would bet that everybody who ever dealt with software development or testing experienced something similar where something that was thoroughly tested before release caused a lot of troubles when it got released. And I bet that many of you were wondering how to avoid such bad scenarios like the above. We did everything right, didn’t we? Test suites with a great level of coverage were executed throughout the cycle, additional tests were executed – all green and all passed. But still, there is failure in production. How do you solve that problem?

 

Back to top

What is Chaos Testing?

Chaos testing is a testing approach that relies on chaos theory concepts and randomness deliberately injected in the test inputs, scenarios and environment conditions. 

All of that with the goal to add the same level of unpredictability to our testing like the real world brings to our application in production. In other words, we need to to make sure that our testing will approximate the real world conditions, unexpected patterns and behavior and therefore we will be able to capture the hidden issues earlier than when everything is released in production.

Back to top

What You Miss When You Don't Use Chaos Testing

Let us first understand why our fine-tuned tests are still failing to capture these hidden issues that have tendency to appear seemingly out of nowhere. 

When the application is running in production it faces the real users and the real environment conditions – let’s say, it faces whatever the “real world” brings to it. And as we know from our everyday personal experience, the world around us – nature, weather, people – is hardly deterministic and predictable. Despite the belief that we can predict test outcomes – and maybe in most cases we could to some degree  –we cannot predict it with 100% accuracy. There is always something that surprises us.  

The same applies for the “real world” where our applications and services delivered to production are living. Perhaps 99.9% of users will follow the flows that were tested and everything may looks ok, but there are always some users that perform the stuff differently – in an unexpected way. And perhaps not always intentionally, it may be just a simple and natural mistake, typo, wrong click or tap. The same applies to the environment where the app is deployed and for the dependencies your application relies on. In most of the cases they work as expected, but not always… And all these exceptions – or glitches if you will – may cause serious troubles to your application. And what is worse – most often nobody can say in advance how big these troubles will be – because nobody has ever thought about them, let alone tested for them. 

The fact is that most often we do test the typical, predictable scenarios in a testing environment that is stable (or maybe it is not that stable, but that is a different story…). But our applications in production face something that is way different from these ideal or “laboratory” conditions in our test environment. 

When I spoke to bunch of software testers, they shared with me the following observation: a lot of defects in applications are discovered not because the test plans are strictly followed, but in fact because they intentionally or even unintentionally (by a lucky mistake) deviate from the test plans – actions are performed in a bit different order than expected, values that are entered are different from the expected ones or instead of “John Doe” test user somebody used a test user who has a middle name which nobody ever considered before. 

Back to top

The Importance of Chaos Testing  

So the idea is obviously to reproduce that approach of the intentional unexpected deviations inserted into our testing even in the automated testing we do rely on. You may have heard about the term Chaos Engineering or you may be familiar with tools like ChaosMonkey. Chaos Engineering is defined like (https://principlesofchaos.org): Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. In other words, ability to monitor and test a system where some random events are deliberately injected to exercise the system for unexpected conditions. And what we are going to talk about here is the concept of "Chaos Testing“ inspired by the Chaos Engineering principles.  

Another term you may have heard before is “Negative testing." It is often mentioned as a way to increase the test coverage. Negative testing is a very important functional testing concept where the test is intentionally designed in a way that it tries to break the functionality to validate the system behave in a resilient way (e.g. instead of freezes or nasty stack traces displayed to the user, there is at least friendly error message and a way for user to retry or recover from that). So this way, in addition to the „happy path testing“ approach that tests the expected regular positive scenarios, you test also the negative cases and that obviously increases the coverage. So how does the negative testing differ from the Chaos testing? Essentially, the Chaos testing is using negative testing concepts but adds the factor of randomness in the mix. Happy path testing and negative testing is an either-or – which again does not fully reflect the real world conditions of the production where it is not always only 100% positive or 100% negative. In the real world, the random glitches are what happens there. Therefore, the chaos testing mix both of these and adds the randomness – the result is that in the typical cases all is going fine, but there are cases where some parts of the test are negative and some parts randomly shifted and that is something that is closest to the real world conditions. 

Back to top

Key Aspects of Chaos Testing 

Test data — The inputs that drive the test. Every test requires test data and test data is what drives the test. So instead of testing with the set of expected test data, there has to be a bigger variety of test data that will include the negative ones, the odd ones, the unexpected ones. This is where the intelligent synthetic data generation helps to create the unexpected inputs based on the expected ones and add the necessary variety.  

Test environment and test dependencies — In order to simulate the real world conditions, the environment and application dependencies have to behave in a random (aka unexpected) manner – be slow at times, have downtimes at times or just send unexpected responses that may be valid (e.g. very looooooong text strings) or even invalid ones (empty values in case where non-empty are expected, letters instead numbers or emoji character where simple alphanumeric content is expected…). Ability to mock the dependencies and therefore control their behavior at will is the technique that is a pre-requisite to achieve chaotic behavior of the test dependencies. 

Test scenarios — It depends on whether the test is oriented on functional or performance aspect, different approaches should be taken. For the functional test, the goal is to simulate the scenarios that may deviate from the expected ideal „happy path“ scenario – i.e. to mix happy path, with negative scenario variations and with unexpected scenario variations. As we have discussed earlier, it may be a different sequence of the steps (e.g. instead of filling the form from the top to bottom, start from the middle to top and then from the middle to bottom), it may be alternative actions like "click on the button twice instead of once“ or "select already selected item in the dropdown“. These scenario variations may seem odd or too simple to consider to test them, but these are examples of what application has to be ready to face for in production. For example, the two clicks on a button simulates a case when user clicks on the button twice – maybe accidentally, maybe intentionally because the first click does not have any visual feedback so therefore users believes he or she has to click again. And it may happen that the second click triggers the same request again, but the application is not properly developed to accepted two identical requests and something bad may happen downstream. 

In the case of performance or load testing, the alternation of scenario steps may also rely on the above functional techniques (e.g. perform the stuff in a different order in the sequence) – and the same principle like for the functional testing applies: the test should simulate the unpredictable behavior of the production to best approximate the production specifics. However, the configuration of the performance or load test should vary as well – your application hardly face a steady flat load of, say, 10,000 concurrent users. There are spikes, there are up and down variances of the number of users and therefore the patterns of the load varies. And the same way our performance test should vary as well – i.e. to introduce virtual users setting that varies over time simulating unexpected traffic spike periods. 

Back to top

Chaos Testing WithBlazeMeter 

BlazeMeter Continuous Testing platform is instrumental for implementation of the chaos testing practices for your testing. Recently introduced Test Data Pro helps to generate unprecedent variety of Test Data of various characteristics. By leveraging AI, built-in rules to generate negative counterparts of the expected “happy path” data and by providing the ability to generate test data distributions based on randomness, BlazeMeter Test Data is an essential tool for your beginnings with the Chaos testing from the perspective of the test data inputs and their variety which is one of the key pillars of Chaos testing. 

Mock Services is another BlazeMeter capability that contributes to successful implementation of the Chaos testing practices. Mock Services can stand in for the real dependencies of your application and you can control their behavior and data, you can also control them in a way to support unexpected scenarios. Using BlazeMeter Test Data Pro, you can drive your Mock Services with random distributions of expected as well as unexpected negative data BlazeMeter generates for you. By using the “think time” delay option it is possible to simulate random glitches in the response times of components your application depends on. And that simulates the production glitches. 

The performance and load testing in BlazeMeter supports the ability to change number of virtual users during the test execution which is a key functionality to simulate variable number of users interacting with your system. At any point during the test execution, it is possible to change the number of virtual users to simulate spikes or any non-flat volume characteristics of the number of virtual users during your test scenarios. Exactly like it may (and most likely will) happen in production. 

As we are close to the end of this article, let me get back to the story introduced in the beginning of this article. Actually, and not surprisingly, it is a true story where the root cause of the entire production failure was in a different behavior of application caches compared between to the test environment and their behavior in the real production environment. In the test environment the test was using similar test data and steady rate of requests – and that causes many cache hits and artificially better performance results. While, in production the data heavily varied, rate of requests varied heavily as well… As a result, there were not so many cache hits – even to a degree that the performance got impacted and a hidden memory leak introduced by the code change made by the team got surfaced. Having better test data and scenario variety during the testing before releasing the change in production, the issue would be captured much earlier than as a unpleasant surprise in production.

Implement chaos testing to your performance testing with the industry's most-trusted performance testing platform. Start testing with BlazeMeter for FREE today!

Start Testing Now

Back to top