A common mistake made in our industry is to conflate performance testing with load testing. While both are important, neither is a substitute for the other.
The former helps assess whether a system is performant. Can it accomplish some tasks in a short enough period of time to be effective? The latter is used to
understand how the performance changes as more and more demand is put on the system. I came to appreciate the value of both while working for AOL, back when 30 million people looked to them as THE way to get online.
At the time, we tried to convert people from dial-up access to broadband while simultaneously maintaining the billing relationship with the customer. This strategy ultimately proved unsuccessful (people preferred to buy access directly from the phone or cable company), but it was the best hope the company had for maintaining a dominant position as an ISP.
My team was responsible for the qualification and provisioning systems in this effort. Qualification was particularly challenging. In order to be eligible for
DSL, you had to live within a short distance of a telephone exchange that had a Digital Subscriber Line Access Multiplexer (DSLAM) with open capacity. And just
to make it even more interesting, the workers in the local chapter of the Communications Workers of America couldn’t be on strike at the time.
We needed the system to qualify users during the login flow, using the phone number we had on file. If qualified, the users were then prompted to upgrade to DSL.
Our qualification system had to perform well. Delays would have led to a less effective conversion rate as well as a degraded user experience–users had been conditioned to expect the login popups (remember those?) to appear earlier rather than later. In addition, the system had to be capable of handling substantial load. At its peak time, several hundred people per second would log in and begin the qualification process.
This entire experience taught me three fundamental lessons:
It’s important not to confuse performance assessment with load testing. As mentioned, the former is all about ensuring that software reacts to inputs in an acceptably short period of time. The latter is about understanding how much hardware is needed to serve expected demand and whether there are any parts of the architecture that unacceptably limit this, e.g., a single shared database
that all requests queue up behind.
Synthetic load testing is, at best, a very poor proxy for understanding how your system will perform under actual load. Real world usage has traffic patterns that are far more complex and varied than you’re likely to come up with in a load testing script. This matters a great deal since it prevents code and data from being cached in the same way it can be in simple tests.
The demand on a system is rarely constant. Much more typical is a daily traffic pattern where load is high at predictable times of day and lower at others. The users still do the same basic stuff throughout the course of the day, but there are time clusters where there are more of them doing it. It’s only at the times of highest system demand that having a good handle on your capacity is important.
At any given stage, you need to be clear-headed about what it is you need to understand. For example, if you are concerned with how the user experience suffers under conditions of low bandwidth or less than modern hardware, you’re dealing with a performance concern. In such cases, tools that analyze individual transactions are sufficient. In Ruby, this includes things like New Relic and the bullet gem to study database queries that can be optimized. For web applications, Google PageSpeed Insights and Yahoo’s YSlow are invaluable.
If instead you’re trying to understand whether the system can respond to expected demand and scale, then load testing is what’s called for. There are tools that purport to help with this. Apache’s JMeter is an example of this. But after my time at AOL, I consider myself a skeptic of this brand of tools. As mentioned above, there are too many differences between real and synthetic load to completely trust the results of a test.
To complete the story of my AOL days, we learned how to do load testing right. Once we realized that simple load tests weren’t getting us any closer to provisioning adequate hardware, we decided to capture actual production traffic, e.g., Apache server logs, and then build tools to replay it against test hardware. This was better but difficult/expensive to do and still lacking in key real world characteristics that are difficult to model.
This was our lightbulb moment: Who needs synthetic load testing when you can just use the real thing and get better information? To do this, you have to have a system that is instrumented well enough for you to know when it’s stressing. Queue sizes are measured. Failure/abandon rates can be read and interpreted. Hardware performance on the server, e.g., CPU/RAM levels are exposed. Once you have that in place, you start at a part of the day when actual load is low and you remove some of your production capacity, e.g., load balancer adjustment.Then you watch your numbers as traffic grows over the course of the day.
Once your instrumentation tells you that user performance is starting to degrade to unacceptable levels, you flip your load balancer configuration back so you are, again, at full capacity. After this exercise, you have an understanding of system performance under load you can be much more confident in.
No approach to performance or load testing is foolproof. You have to resign yourself to battle scars that will inform future system design and implementation; but it all starts with the awareness and understanding that performance and load testing are two very different things.