The CapCal Blog

Why Static Content is Just So Much Static

If you don’t know what the term “static content” is, it refers to any content that doesn’t change (like images, SWF files, media files, CSS files, script files, etc). The average web page contains dozens to links to static content, while only the HTML contains the data the user is typically interested in. This has an enormous impact on load testing for reasons this blog post will address.

As you probably know, static content is usually cached on your local machine the first time you see a page and stays there unless it changes. This is why the first time you go to a site it takes longer to load than subsequent times and is a huge design consideration for serving up complex pages quickly for return visits. Furthermore, since the advent of CDN (content delivery networks) like Akamai, the static content gets delivered to user from a server closest to them, reducing even further the time it takes to arrive at your desktop.

But even if you don’t use a CDN but host the content on your own servers, the data is cached in the sever memory the first time it is accessed so future accesses require almost no processing overhead on the server’s part – filling up the pipe is all it takes and that is minimal in terms of CPU time.

So the question is, what are the pros and what are the cons of including static content in a load test? I think you’ll see that the list of cons is fairly long and the list of pros is…well, almost non-existent! (Unless you want to test your bandwidth, in which case you can just choose the largest content file you have and just hit that with as many users as you can).

Besides filling up the test and its results with dozens of URLs that make them difficult to read and interpret, the test is not very realistic unless you assume that all your users are first time visitors. Even then, it’s the speed of the real end user’s connection that will really determine how fast the page loads and if you have a way to control that then you should patent it at once! Seriously, though, the amount of static content and its size has always been a design consideration for web developers and the trade-offs are well known.

Most importantly, however, it’s the dynamic content that most determines the performance of a web application and that is where the “rubber meets the road”. You have no control over static content other than eliminating it, reducing its size or increasing your network bandwidth. You do have control over dynamic content, however, and that’s where most bottlenecks occur (most often with database operations).

So the art of web site performance improvement often boils down to database performance, load balancing and application code design. For that, it’s best to eliminate all static content and focus entirely on what is making the real demands on your server. Otherwise you may end up with a load test that requires many times the number of load agents than it would otherwise require, which adds cost that would otherwise be spent on improving performance on the back end!

Cloub Lab Grid Automation Comes to EC2 and IBM Smart Cloud

Stare at this a few secs and watch it shimmer

Everybody remembers when the concept of grid computing first became popular It was the early 2000’s when projects like seti@home and the Human Gnome Project harnessed the power of thousands of computers to work on complex problems A grid can also be constructed on the LAN, and in fact a private cloud is really just that. But there was another concept that emerged at the same time called “peer-to-peer”, immortalized by the short-lived Napster project (which was file-sharing only) and made smashingly successful by Skype with its collaborative peer-to-peer model. Both grid computing and collaborative peer-to-peer have found their way into a new cloud-based platform called Cloud Lab Grid Automation.

With Cloud Lab Grid Automation, one person can spin up a “test execution grid” that can execute one test in a tenth of the time or ten times the number of tests in the same amount of time using just 10 instances. Any data-driven functional test, regardless of the tool being used, can be executed in parallel like this the same way a large problem like the Human Genome Project was divided between many computers. This same test execution grid can also accomplish what was never really possible before, which is performance testing of thick client applications (Java SWING, .NET, etc).

100 Instances running the same test with different data

OK, that explains the grid computing part but what about peer-to-peer? The Cloud Lab Browser session can be shared with as many people as required, just by passing them a link and a code. The built in chat window shows who is online and logs the conversation while any participant with the proper access level can use any machines in the grid that are not being used. Since this grid is for execution of tests and not development, rarely does anybody need to establish a direct RDP session – the Cloud Lab commands allow all the usual things to be done, like starting tests, stopping tests, copying tests, test data and test results back and forth, etc.

A Grid with Nine Instances Being Shared Remotely By a Team of Six

Cloud Lab Grid Automation combines two time-tested and successful paradigms and brings them together on the Cloud to deliver phenomenal testing productivity today.

Level Three Software Installation Nirvana

In the beginning was the floppy disk and all was well with the world. No software was ever installed that did not come from a floppy, even if it took 30 disks that required manual changing. Some of those people had used paper punches and teletypes before and they never missed a chance to tell you about it! So a floppy was “high speed data transfer” to them.

Then came the CD-ROM, a single disk to store dozens if not hundreds of floppies and for a time, Software Installation Nirvana was achieved by all. With all the extra space, software became more and more bloated but hard drives were growing at a faster and faster rate so nobody cared.

Finally, the Web appeared on the horizon and the concept of downloading software and installing it rather than using any kind of medium was revealed to a breathless world. Vendors and users who had just achieved Software Installation Nirvana just while ago were now at a Higher Level of Nirvana, Level Two. Level Two doesn’t use any medium at all, just light itself carrying pulses along fibers!

Just when we thought that we had reached the ultimate level of Nirvana, an online bookstore and retailing giant started something called a “cloud” in which Level Three can finally be revealed. In this brave new world, you don’t install software at all – you spin up a virtual machine with the software already installed and preconfigured, with absolutely nothing else on there that you don’t need!

This gives vendors WAY more control over the environment our software is used in and eliminates a vast number of issues that always arise with different version of the OS, the browser, the .NET framework, yada yada yada. It gives users all the power they need at their fingertips in a few minutes, to be easily shut down when no longer needed. This, you must admit, is Level Three of Software Installation Nirvana!

We now have something like the original circuit cards that came in the PC except these are dedicated application servers and not hardware components. Starting with a bare bones, stripped down version of Windows guarantees not only fast boot up times but reduced storages costs. Then who cares if you are two versions behind the latest browsers and whatnot – you know that what you are giving the customer works “out of the cloud” and has been tested. But if you install it on the user’s machine, all bets are off! Back to Level Two, sorry - come back when you are ready!

Cloud Lab is a lot like that – the same core technology stack and base AMI can be configured to run a classroom, an online meeting or a development and test lab. It’s super svelte construction guarantees the fastest possible boot up times using your selected CPU instance (small, medium, large, x-large). Real men always choose x-large, of course, so that means a user with a netbook will have a machine at their disposal that would require a hulking noisy tower and a cooling power supply!

This is Level Three Software Installation Nirvana, available today on EC2 – don’t use a floppy, don’t use a CD-ROM, don’t even download it! Instead, find the AMI and spin it up!

Cloud Lab – how Windows® is run on the cloud

A Royal Pain of a Web site Crash

This news item tells us how the Royal Wedding brought the BBC website to its knees - didn't the Knights always have to approach the Queen on bended knee? Nowadays I guess the same rule applies to websites! We're thinking of offering Her Majesty and the Royal Family a free load test in exchange for a photo op at the Palace that we can use on our web site. Unfortunately she is not on LinkedIn or Facebook.

Kaavo Application Management and Security on the Cloud

Yesterday in the space of less than two hours CapCal signed up as a customer of Kaavo and began using it to deploy and test a multi-tier web farm on Rackspace. Even better, we have begun to deploy and manage our own mission-critical applications (CapCal, Cloud Lab and Browser Lab) on Kaavo and it solves a whole bunch of issues for us while opening up some amazing new possibilities. I've been waiting for something like this for a long time, I just didn't expect it to be so awesome. Check them out at www.kaavo.com - the web site is well laid out and there is lots of helpful information and great videos. Stay tuned for regular updates on this amazing technology and how we will be using it at CapCal!

A Ferrari on the Wide Open Web

Designing and building a highly scalable web site on the cloud is quite a challenge, even though the tools and infrastructure are there to make it possible. Being able to summon up servers out of thin air to meet increasing demand would have been impossible without cloud computing, but that power comes with its own set of problems and issues, as the team at Lockerz recently discovered.

If you haven't heard about Lockerz yet it's about time you do - started by an early Amazon employee and funded by Amazon and a group of blue chip investors, Lockerz is a revolution in social media and e-commerce. Their web site was designed from the ground up to take full advantage of the Amazon cloud and all the latest high scalability innovations, from distributed memcached to a master-slave MySQL architecture running on high CPU Linux Apache servers. Quite simply, this is a Ferrari on the wide open Web, where crashes and slowdowns happen all the time. But just like a Ferrari, it is possible to go too fast, and the consequences can be as severe as they are on the open road!

Actually, designing a high scalability web site is a lot like designing a race car – to operate at its fullest capacity there’s a whole lot of testing, tweaking and tuning that has to be done. The web server, the load balancer, the database, the memory cache and the network all come into play, and problems can arise with everything from default configuration settings to race conditions that are extremely difficult to reproduce, much less to diagnose and fix. In a two hour meeting between their development and operations teams, Lockerz was able to accomplish what easily might have taken days if not weeks to do.

This particular case involved a memcache race condition in which one thread requested a tag that was still being written by another, resulting in an invalid object exception. As you might expect, this problem only surfaced at higher loads and was first uncovered by means of an 8 hour burn-in test at 50,000 users. At that level, the servers began returning internal server errors (HTTP 500) on random pages that seemed to have no rhyme or reason to them. At an average of 3,500 hits per second, as many as 6% were 500 errors. Even though the error page shows a really cute video of a dog licking your screen, this was clearly unacceptable to a company aiming for 99.9999% availability!

To diagnose and fix the problem, each member of the team floated his or her best guess about what might be causing it and as a group they decided on a test case that would either prove or disprove that theory. Using CapCal tests that quickly ramped up to the level where the errors began to occur, variables were first removed and then re-added one by one to analyze their effect. For example, from a load balanced web farm of 85 Apache servers they reduced it to a single Apache and determined that the errors did not occur. From there they added another one to the mix, then two, and were finally able to reproduce the problem. The fact that the error occurred more readily on a high CPU instance with two virtual cores was a very useful hint that helped them zero in on the cause.

By means of this disciplined and focused approach they were able to diagnose and fix it in a couple hours, so that when the next 50k user burn-in test was run they got a single “connection reset” error out of 17.5 million page hits – well within their stated SLA goal!

CapCal Announces Support for Rackspace Cloud

We are proud to announce support for the Rackspace Cloud, which offers a number of advantages to our customers. It's the same ultra lightweight Debian Linux agent as before, but Rackspace offers more choices in terms of memory and CPU, including a 250MB version that rents for 3 cents per hour - that's 10 times less than CapCal agents on the Internet were paid before the cloud came along!

Please see our blog post on the Rackspace Tools site - we are excited about joining the Rackspace Cloud Tools program as a partner and customer.

Web Site Crash Related to World Cup Soccer?

This news item flashed across my desktop in my Google Alert email about web site crashes and I found it both educational and amusing. Apparently a travel site has crashed because of the rush of people wanting to flee South Africa after England's sorry showing against Germany or something. The IT Director at the travel site says that :

“The crash was caused by our own bespoke visitor tracking software which was not able to cope with the load (slow write times to the database).
“We had to disable some of the visitor tracking data to get the site back online. Internally we call this the ‘Go Faster Button’, it strips away all the nice-to-have data (logging etc) to make the site go faster but leaves the customer experience unchanged apart from being faster.”

Interestingly, website crashes ordinarily do not lead to travel companies explaining the inner workings of their systems to the outside world. But Waite continues:

“It seems our database server wasn’t able to cope with the number of visits per second when inserting rows in to the visit detail table. We’ve now partitioned this functionality across two servers, so we should be ready for at least double the load now.”

My response to that is THANK YOU MR. WAITE for publishing this little jewel and letting people know how you fixed it. If everybody whose website crashed for whatever reason would tell the whole world about it we would be much better armed in our quest for higher and higher scalability! Yes, partitioning among servers is usually the answer but you might want to run some tests just to be sure. (Use code CUPCRASH to get a free CapCal Crash Test)!

Agile Testing is Collaborative Testing

Testing in general, and load testing in particular, is a collaborative process when it works, and an endless source of conflict and frustration when it doesn’t. The developers who write the code, the operations staff that configures and deploys the production system, and the QA folks who do the testing must all work together or the project is doomed to fail (or drag on forever, which is essentially the same thing).

This workflow for testing has been followed for decades in most companies:

Dev....QA....Ops....QA

It takes place serially, in other words. The developers do unit testing and pass off to QA. QA does functional testing and passes off to Ops (or sometimes back to Dev) for load testing. Oftentimes load testing is done by a third party because of the resources and know how required. In any case, this “waterfall” or tag-team approach presents both interpersonal and logistical challenges that can be counter-productive at worst, time-consuming at best. Finger pointing and blaming are common, as are delays and miscommunications.

Collaborative testing, on the other hand, looks more like this:

Dev....
QA ....
Ops....

It happens at the same time, or in parallel if you will. Not only is it true that two heads are better than one, but it often requires everybody on the team to identify, track down and fix certain kinds of issues. There is plenty of finger-pointing going on, but fingers are pointed where they need to be (i.e., at the bugs, bottlenecks and other issues that are always lurking below the surface). Finding and fixing these becomes a team exercise that can even be fun, a word that is rarely associated with testing. Not only that, but it can be done in hours instead of days or weeks.

Agile testing is a critical component of agile development but just like agile development it requires collaboration to be done successfully. For CapCal load tests we use Skype or GotoMeeting chat windows to facilitate the collaborative process in which everybody puts on their QA hat to run tests, analyze results, make the necessary changes and repeat the process as many times as necessary. What may seem like an enormously expensive and resource-intensive process involving anywhere from three to five people is actually a time- and money-saving procedure that reduces the development cycle by an order of magnitude. Not only that, the chat log contains a record of everything that transpired and can be used as a point of reference going forward. Finally, since geographically dispersed teams are the norm rather than the exception nowadays, it is a necessity and not just a convenience.

Our new product, CloudLab, includes a built-in chat component that accomplishes the same thing and we are very excited about it – check back soon to find out more!

Why "Statistical Regression Testing" Matters

There is a class of problems or bugs that have a very small chance of being detected by manual testing or conventional regression testing, simply because they occur only when certain conditions are met. Take as an example a web server farm in which one of the servers is mis-configured or didn’t boot correctly for some reason. A manual test has a 1 in X chance of hitting the bad server, where X is the number of servers in the farm. These are exactly the kind of issues that bedevil us the most, because we may have to repeat it many times before it happens and there seems to be no rhyme or reason to it. In this example, the “rhyme and reason” is the algorithm used by the load balancer to distribute incoming requests.

In a recent project we had the load balancer configured for the default client header size of 2k bytes, which is fine in most cases but in this particular case there was a small percentage of headers that exceeded 2k because of cookie size. The result was an HTTP error that never seemed to occur in manual testing simply because it was statistically uncommon. Only by load testing were we able to consistently generate these errors and eventually discover their cause and fix them by raising the maximum header size in the load balancer.

This is a perfect example of why load testing is critical, not just for measuring performance but for uncovering the kinds of problems that require a large statistical sampling of client instances. The objective is not to cause stress on the system but to throw enough variations at it to make sure bugs like this are not lurking below the surface. We’ve named this “statistical regression testing” because it is a cross between functional and load testing designed to uncover issues that are statistically uncommon.

I’m sure if you think about it you’ll come up with a lot of examples in your own career where statistical regression testing either did help or would have helped. Maybe you let a functional test run all night or got several people to bang on their keyboards at the same time. Or maybe you WERE running a load test when the issue popped up and blamed it on the load testing tool until you discovered otherwise (and yes, you know who you are)!

One reason that load testing is normally done at the end of a sprint or development cycle is that there’s not much point in stressing a system that doesn’t work to begin with. The Catch 22, as you can see, is that you may need to run a load test or a whole lot of functional tests before you can say that it works well enough to stress it. In any case, “statistical regression testing” is just as important as functional and load testing and the risk of releasing bad code goes way down if you employ all three.

We’ll continue to revisit this topic and finding more examples as we go along. The statisticians among us will discover that their knowledge is very critical in determining the kinds of tests to run, and I am not exactly a statistician. But I do admire them, and hope that my understanding of this field will grow over time.

How Fast is YOUR Upload Speed?

Not only the Web but the entire broadband infrastructure is based on the idea that uploads will by and large be much smaller than downloads, something that is not the case on a LAN. I recently got a fiber connection at home and it’s the zippy upload speed that impresses me the most – if you’ve ever tried to upload a video using a DSL connection you know what I mean.

We have recently come across a couple instances where a web app uploads huge chunks of data in a POST form and the user is forced to wait until it finishes. When examining the application at the HTTP level, we see that the huge chunk of data is downloaded as part of the HTML of the previous page. While the download takes less than a second, the upload takes anywhere from 20 to 40 seconds, even on this blazing fiber connection of mine!

This is a perfect example of why testing inside the firewall with a 1GB Ethernet connection can mask problems that end users “out there” are going to experience. In the second case that we recently encountered, the graph of a single user test looked like this:

The long red bars show the time it takes to upload the form which it just finished downloading. I have to assume that it's not an exact copy of the data it downloaded but it sure looks the same! My recommendation is to find a way to eliminate the redundancy somehow, so that only the changes are uploaded instead of the whole block.

By the way, here are the results of a speed test of my fiber connection from www.speakeasy.net - fast for sure, but MUCH slower than a local Ethernet connection on uploads:

Here endeth today's lesson in uploads vs downloads. Class dismissed!

The Value of a Good "Death Test"

In a recent project we created a test that was sure to crash the site every time and Bernard Garner at Spry Business Technology Solutions aptly gave it the name "death test". (Bernard and team are managing a very challenging web site launch on EC2 and working with us to get it done). It's hard to overestimate the value of a good "death test", not just for the sake of infrastructure but security as well - only by knowing one's limits can one be prepared to react accordingly by adding additional cloud resources. This particular feature of the cloud - the ability to withstand attack - is an undersold and understated benefit as far as I am concerned. I like to call it "crash proofing" because whether the traffic is legitimate or not it should never be allowed to crash the site and the "death test" is key to this.

Elsewhere in this blog I have called this a "crash test", and perhaps it sounds more positive than "death test". But in any case the purpose is the same - what you see in the chart above is the equivalent of a "flat line" on a heart monitor, so maybe "death test" is more accurate. (By the way, Bernard also gave us an idea the other day that we think is going to make load testing a lot more akin to twisting knobs on an oscilloscope than writing a program or creating a test plan. We think this is a major breakthrough and think you will agree. Stay tuned for more!

Where Less is More and Free is Costly

While it's true that testing is a means to verify the expected behavior of a system, what it really boils down to is uncovering and fixing all the things that prevent that behavior. It's an iterative process by nature, the very purpose of which is to find problems, whether by manual or automated means. And nothing uncovers problems like a good load test - bugs, bottlenecks and errors of all kinds come to the surface that may have nothing to do with performance.

So the process of load testing should really be focused on finding and fixing the problems encountered along the way and NOT on the load testing itself. To whatever degree your load testing tool slows this iterative process down or complicates it in any way is the measure of what it is REALLY costing you - most often there are three to five people involved in a load testing project, an expense that can dwarf the cost of just about any tool after a day or two of testing. And, paradoxically, it is the open source tools that are "free" which often take the most time to learn, master and use.

And herein lies the paradox of pricing for testing services like CapCal - less is really more when it comes to load testing and to charge customers for the amount of testing they do can actually be self-defeating and non-productive in the long run. Nobody knows in advance how much testing will be required for a given website and it can even vary substantially between releases of the same website. Fortunately the pricing of Linux instances on EC2 is low enough to make it affordable to most companies. But if the real advantage of a tool like CapCal is how much time it saves you does it not seem counter-intuitive to charge for the amount of testing you do?

It does, and that is why we lean towards a monthly pricing model that allows unlimited testing with the customer paying the AWS charges. The real ROI is in the time saved creating, modifying and running tests, which not only squeezes time out of the development process but also decreases time to market, which results in lower costs and higher revenue. Lowering costs, reducing risks and increasing revenue are what CapCal is aiming for and our customers are telling us that it works.

So if you are in the market for a load testing service and have received a quote from SOASTA or one of the others, you may discover you are paying more for less if you don't check out CapCal first!

Testing - the "Suite Spot" on the Cloud

This recent announcement in InfoWorld represents something of a tectonic shift in the cloud computing world. With IBM announcing its own cloud to compete with Amazon and others, things will definitely be "up in the air" and heating up as far as the cloud is concerned (nudge nudge, wink wink):

“According to IBM research, the average enterprise devotes up to 50 percent of its entire technology infrastructure to development and testing, but typically up to 90 percent of the test infrastructure remains idle. Like many other cloud startups going after the Dev/Test market, IBM found that taking advantage of cloud computing within development and testing environments can help reduce IT labor costs by 50 percent, improve quality, and drastically reduce time to market”.

Half of the enterprise entire tech infrastructure devoted to development and testing while 90% of the test infrastructure remains idle?

I believe it because I see it every day and have for the past couple of decades at least. That is why we are aiming to provide not just load and performance testing but all forms of testing as a service on the cloud. We have a couple of really exciting announcements coming up that will show just how far we've come! For a preview, have a look at this 3 minute video about our functional testing solution being rolled out this quarter.

We would like to hereby welcome IBM not only to the cloud space but the cloud testing space, the "suite spot" of the cloud!

What's a Few Million Among Friends?

Faithful readers will recall our search to enter the Guinness World Book of Records for running the first million user load test. We had a few takers but none with the kind of infrastructure it takes to handle 100,000, much less a million simultaneous users.

Meanwhile, SOASTA has announced a million user load test using 587 EC2 instances, a feat that is quite remarkable considering that it would require each instance to generate almost 2,000 users. It's possible to do this if you make the time between pages VERY long because normally the upper limit is half of that or even less due to bandwidth issues. The real challenge for such a massively large test is getting the instances to begin with - it's not that you can't but you have to get permission and often schedule it in advance. Here at CapCal we've run tests with as many as 420 instances and lately we ran a four million user test over a two hour time period.

But since we can no longer get our name in the Guinness Book for the first million user test I sent an email to Brad Johnson at SOASTA congratulating them on the accomplishment. He replied cordially and said that it's an exciting time to be in the testing business, to which I heartily agree! Cloud computing is the ultimate game changer in this space, and the field is wide open. It's good to have competitors like SOASTA because it makes us aim even higher to be the best in the eyes of our customers.

Tiger Woods Mistress Beauty Pageant crashes Howard Stern's website

This appeared in today's Detroit Examiner about how Howard Stern's website was crashed by the "Tiger Woods Mistress Beauty Pageant".

According to Liz Brown:

Howard Stern commented Thursday morning: "Doug Herwitz who runs the website, he said 'sure enough, the whole thing went down.' He says he can't even gauge the volume of traffic. Everything's blown out...And we have like a really big server. It's really rare that we can crash this thing."

Sure Doug, you may have a "really big server" but apparently not big enough, because not only size but numbers matter a LOT when hosting content that literally millions of guys might be interested in. I suggest that the next time you try something like this you have a load balanced web farm on EC2, maybe 10 Apache Linux instances to start with along with autoscaling that will add more as traffic increases.

If Howard or Doug are out there listening and want to try a free CapCal Crash Test to see at what point their "really big server" starts crapping out, just send an email to "info at capcal.com" and we'll schedule it!

Don't Shoot the Messenger - PLEASE!

Everyone who is involved in the messy business of testing software, whether it be functional, load or what have you, has from time to time found themselves being the Bearer of Bad Tidings to management and developers. It is, in fact, what they pay us to do, and there are few other areas of human endeavor in which success means finding failures. In that sense we’re no different from your doctor or your auto mechanic – if a diagnostic test brings bad news we are quick to seek a second opinion, or even to doubt the veracity of the diagnostic equipment.

This is human nature, of course, just like the glee we feel when the test results come back positive. I’ve worked with developers who LOVE CapCal when it gives them the kind of results they expect but are quick to call it into question when it doesn’t. I’m happy to say that the times in which there really is a problem with CapCal are becoming more and more infrequent, but it doesn’t keep me from assuming that it is (or at least could be) until I can prove otherwise. In the court of testing, the tool is guilty of malfunction until it can be proven otherwise. Proving it otherwise means finding and fixing the problem with the application most of the time.

Using software to test software is like using a diamond to cut diamonds (except for the word “soft”, which spoils the whole analogy if you dwell on it much). If the drill bit breaks while you are cutting a diamond you just have to replace it with a harder one and keep working. With CapCal this kind of breakage is normally due to an exotic combination of things that rarely occur simply because they are so exotic. I’d love to give you an example but you might work for a competitor and if that's the case you’ll just have to figure it out for yourself! :-)

Dealing With Surly and Downright Rude Websites!

My favorite of all the HTTP return codes is 403, which reads like this:

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request should NOT be repeated.

The server understood the request but is refusing to fulfill it? That's it, no explanation given? If that’s not an example of surly, rude behavior I don’t know what is.

Or, at a lower level, there is the famous “connection refused” message that tells you nothing at all, except perhaps that the owner of the website wants nothing to do with you. That’s the cyber equivalent of a slap in the face the way I see it!

From my experience, connections most often get refused when there just aren’t enough of them to go around, and that is often because of an arbitrary and artificial limitation on the load balancer. For the nginx load balancer, for example, the default is 1024 – a lovely, round number with a venerable history for sure, but WAY too low for a majority of websites.

Remember, these connections have a life of their own that leaves them hanging around, sometimes for up to 3 minutes. So you don’t have to have 1024 simultaneous connections to hit the limit, which makes it even more of a ridiculous number.

Operations folks typically see a number like that and think it was defined that way for a good reason. Because of that, they are reluctant to change it the same way a plumber is reluctant to turn a faucet all the way up and leave it. But failure to do so will result in a cyber face-slapping for your users, who may assume the problem is at their end and start haranguing their ISP. Hopefully their ISP will keep them on the phone long enough for you to get the problem fixed!

So only YOU can prevent the rude and surly treatment of your users by making sure the “connection faucet” is turned up all the way and left there!

EOR (End of Rant)

Walmart's Site Down on Black Friday Again

This appeared on the Marketing Vox site today (along with several others) about the Walmart Black Friday website crash that occurred on the busiest shopping day of the year. Faithful readers of this blog will remember this posting about another Black Friday meltdown at Walmart a few years back. Since a visitor from Walmart showed up on the blog today, I do hope he or she will return because we are offering them a free CapCal Crash Test with up to 200,000 users if they are up to it! Unfortunately I don't know anybody there but if anyone out there does, please pass this along!

But hurry because this offer is only good until December 22, 2012 (and that's only because the world will end that day according to the ancient Mayan prophecies)!

Stanford Uses CapCal with EC2 for Student Portal

Coalition Networks is a consulting firm in the Bay Area that was contracted by Stanford University to calibrate the performance of their Student Housing Portal, a web-based application that all students use to select their residence options and apply for a residence. As might be expected, this can often lead to a usage peak in the hours leading up to the deadline, and Stanford RD&E IT wanted to make sure their servers were adequate for supporting up to a couple thousand simultaneous users.

According to Akin Ajiboye with CNI:

Our test plan called for generating a load of up to 2,000 users, which even at 100 user per computer would require 20 machines. Fortunately, Coalition Networks partnered with Aligned Technology and CapCal to get the job. With CapCal running on the Amazon EC2 cloud we were able to fire up as many servers as we needed in less than a minute. While the test was running we gathered all the relevant network and database statistics and were able to form a complete picture of the application's performance. We recommend CapCal with EC2 as a great way to get excellent results quickly.

As a CapCal partner, Hiroaki Ajari of Aligned Technology had this to say:

We’ve worked with CapCal since the earliest versions so we were very excited to see it become available on the Amazon cloud. Of course it was a great honor and a privilege to work with one of the finest universities in the world, right here at the center of the technology universe. This was our first time to see and use EC2 and experience scalability as required. It's such a luxury to utilize servers on demand, minimizing waste - resources, cost, total footprint. Amazon's EC2 is perfect for the cyclical nature of testing; especially to handle performance testing's environmental needs for generating and distributing load from the test servers.

Our joint efforts with Amazon, Aligned Technology and Coalition Networks provided Stanford a quantitative way to measure risk, triggering mitigation strategies that allow them to maintain first class service to their students and administrators.

CapCal Demo Video Now Available Online

We recently completed an 8 minute Camtasia video showing how CapCal works with Amazon EC2. Please have a look and give us your feedback - we still want to make a few tweaks but it's 90% there!

Interview in Software Test &Performance Magazine!

It was a great honor and a pleasure to be interviewed by Andrew Muns for the October issue of Software Test and Performance magazine. I am planning to be at the STPCon 2009 conference in Cambridge this month, which I highly recommend as one of the best ways to stay ahead of the curve in our industry. I would love to meet up with any of our customers, partners and friends in the Boston area (you know who are and will be getting a call)!

Hope to see you there, and thanks again to Andrew Muns with ST&P for the honor of being in print in such a classy publication!

A Very Delicate Load Balancing Act

Everyone knows what a load balancer is but exactly what it does and how it goes about doing it are often mysterious. Since every single page request goes through the load balancer, how it is configured and what its capacity is can have everything to do with how well an application performs under load. For example, the maximum number of connections is a configurable parameter that is often very low in its default setting.

A current project for a cruise ship booking engine is a case in point. This particular application is being moved to the cloud for all the right reasons, namely, to have the extra capacity when needed to handle sudden usage spikes without bogging down or crashing. Our initial tests showed that the bottleneck was the database server, which is often the case. So we beefed up the database server quite a bit and saw dramatically better results. But we also saw reams of errors start to appear once we crossed a magic threshold that had to do either with the configuration or the capacity of the load balancer.

In the first case, it was exactly what I referred to earlier - if the load balancer is limited artificially to 100 connections and you try to open 200 or 2000, every request that exceeds the limit will receive an error of some kind, depending on the load balancer. In this case it was a 502 (out of resources), but I've also seen 404 (page not found) or just plain timeouts while making a connection.

Once we solved that, we tried again and hit a limit at about 1,000 users. This time we determined that the load balancer itself was being maxed out so we beefed up its CPU and memory (which is a snap on the cloud) and tried again. Now we got to 2,500 users before we started seeing any errors or delays. Our customer had this to say about the experience in a review on Amazon.com:

With the CapCal CloudBurst delivery system we were able to effectively simulate thousands of instantaneous/continuous users hitting our development environment. The after-test reports provided us with the guidance we needed to streamline our code. I highly recommend CapCal to anyone looking for a fast, affordable performance testing solution. John Hill, President HIL-TEC

Thank you, John!

We'll be publishing some actual results from these tests in future blog posts. But load balancing is such a fundamental part of the scalability and performance picture that it deserves to be studied and analyzed on its own, so that's what we'll be doing!

Blastoff Leaves the Launch Pad!!

If you haven't heard about it yet I'm sure you will soon enough. But Blastoff Networks left the launch pad on Sunday and CapCal has been working closely with them to ensure that they are ready.

Blastoff belongs in the category of "Wow, why didn't I think of that?" They've teamed up with hundreds of retailers to offer discounts to all their members, and to me it makes perfect sense. Unlike companies that push their members to sell for them, Blastoff just asks you to buy the things you would normally buy but for less money. Oh, and you get paid every time one of your friends buys something!

Pay less for what you buy and get paid when your friends buy for less - that's hard to beat! If this takes off the way I think it will, there will be another household name in the online retailing space before long!

And don't forget you heard it here first, on the CapCal Blog!

Barclay's Online Banking Takes a Dive (Again)

This just in from my daily Google alert for website crashes - apparently after a new version of the site was launched, along with a marketing push, Barclays has had a number of issues according to John Oates of The Register:

Either the bank has been plagued with problems or a disproportionate number of Reg readers use the bank's online services. In July the service went down after a new version of the site, and associated marketing push, was launched. In June it lost its ATM network and watched its website crash twice. It had similar problems in October of last year too.

Of course, this kind of thing happens all the time; there's no use in singling out Barclays. A marketing push is usually preceded by a new version of the site, which ends up being a double-whammy : if the marketing push is successful, it can often overwhelm the servers and thereby nullify the entire campaign.

First of all, they should know that a successful marketing campaign can result in way more activity than normal, and in fact that's the whole point! This is where EC2 comes in, with load balancing and autoscaling that will handle whatever comes their way.

Secondly, even with the full power of Amazon's data centers behind them, if the app itself isn't scalable they will still have problems. This blog beats these two points like a dead horse, and yet they can't be overemphasized - cloud computing + cloud testing = success. Forgo either one and you are asking for trouble!

Finally, this is a BANK, not just a place to buy the cute little coats worn by the Obama girls or to check out the latest swimsuit fashions. When people can't access their money they tend to get very nervous (or at least I do).

So I hope the CIO of Barclay's is entering the right phrases into his or her search engine of choice. If so, this page will likely turn up along with an offer for a free CapCal Crash Test of up to 10,000 users on Amazon EC2!

Sounds like a jolly good idea to me!

Your Servers Autoscale But What About The Rest?

It has traditionally been assumed that load testing is done for the sake of fine tuning performance and that's true. But it's also done to fine tune scalability, and that's a whole different ball game - the next CapCal customer or prospect I meet whose site is actually able to handle the kinds of loads they expect will be the first. Not because they lack a first class, multi-tiered infrastructure with all the latest hardware (or even better, a load-balanced, autoscaling cloud deployment), but because there are umpteen million "gotchas" laying in wait to surprise you at the worst possible moment - everything from load balancer settings to database, web server, OS or network settings and application configuration parameters, the list is endless. So instead of showing a performance drop at a certain load a server will begin spewing out errors indicating that an invisible boundary was crossed somewhere.

Have a look at the 2 minute CapCal test above (click on it for a better view) that attempts to reach 1,000 virtual users but at about 700 begins generating thousands of 503 (out of resources) errors. The green bars that show the ever-increasing bandwidth also drop precipitously when the errors start because only error headers are being returned instead of content.

Is this a site that expects to have more than 700 people online at any given time? Try 7,000 or even 70,000! Could this test be run in the lab using a tool like JMeter? Maybe at 1,000 virtual users, but at higher loads it just isn't practical or feasible because of the number of computers that would be required.

This is an example of a scalability test done against a single static page that doesn't even begin to stress the servers and yet it shows a scalability limit that required all of 2 minutes to uncover. Proof once again that performance rocks but scalability rules!

CapCal's Debut in the Amazon Solution Catalog!

CapCal made its debut on the Amazon Web Services Solutions Catalog today, the first of many announcements we'll be making about our partnership in the weeks to come! We are now working on a Case Study to be eligible for a spot as Featured Solution, which we hope to have available shortly. We are proud to display the Amazon logo on our website and to be part of the fast-growing community of developers and businesses taking advantage of this amazing, world-changing technology!

Handling the Dirty Work of Dynamic Content

One reason performance and scalability testing can be so challenging these days is that the content to be tested is so dynamic and ever-changing. It's a dirty job, but somebody's gotta do it - for a test to be realistic and effective it can't be done the "old fashioned" way, just by recording a single path through the application. It has to be automatically generated from the content itself, and that's what we're working on right now.

There are two types of dynamic content, server-side and client-side. An example of server-side content is online news, which changes constantly but is stored on the server and thus can be used for creating performance tests. Client-side dynamic content comes from AJAX or web services calls and shows up only in the browser, like a list of tickets available for a sporting event or concert. This particular site "sports" a rich client interface on the front end and uses Amazon EC2 on the back end. When you click on an event, the list of available seats is totally dynamic - it isn't stored anywhere except in the browser at that moment. Like an airline flight, it can disappear in an instant if the event gets sold out.

One of the companies we are working with is dealing with server-side dynamic content (online news) and another is dealing with the client-side content described above (online tickets). In the first case, what we envision is fairly straightforward - a small script will query the server database and generate a comma delimited text file to fill in the parameters of "template sessions" on the CapCal server, then call another script that automatically uploads it and kicks off the test. This will generate thousands of unique test cases that can be run by thousands of virtual users against a staging site before moving new content or application changes into production. Not only will the most current and dynamic content be tested, it will be tested in enough combinations and at high enough load levels to flush out any bugs or bottlenecks that might be lurking undetected.

In the case of the online ticket site and the countless other similar applications, we are at work on a solution that will be quick and easy for users while not so trivial for us developers (as it should be, of course)! Basically it will be an extension to the browser add-on in the CapCal client that allows fields or links in the browser to be identified and grouped so they can be dynamically accessed at runtime by each virtual user and manipulated by a new command called "Click-On". I can't divulge the magical powers of this new command until they fully exist and have been confirmed by users. But suffice it to say that it will easily solve the seat selection problem and that's what matters most right now!

I'm especially excited about this because it is true automation, with no manual intervention at all other than writing the extraction query or identifying the dynamic fields in the browser and recording the session template. Add to that the enormous economies of scale made possible by the cloud and all of a sudden something that has never been done effectively (if at all, as in the case of dynamic client content) can suddenly be done very thoroughly and cost effectively with no manual intervention.

Simple, powerful and clean - like soap and AJAX!

How Much Did Michael Jackson Rock the Web?

This story in today's New York Times gave some meat to the rumor that Michael Jackson's death wreaked considerable havoc on the Wild Wild Web. One of the metrics that stood out for me was that Yahoo experienced 800,000 clicks in the first 10 minutes, breaking their previous record!

This is copied from the Yahoo corporate blog :

The passing of the King of Pop set multiple records across Yahoo!. On our front page, the story “Michael Jackson rushed to hospital” was the highest clicking story in our history. It generated a whopping 800,000 clicks within 10 minutes and news of his death saw 560,000 clicks in 10 minutes. Also, the news area on our front page experienced five times the amount of traffic it normally receives.

Yahoo! News set an all-time record in unique visitors with 16.4 million people, surpassing our previous record of 15.1 million visitors on election day. Four million people visited the site between 3-4pm Pacific time, setting an hourly record. We also recorded 175 million page views yesterday, our fourth highest after Inauguration Day, the day after the Inauguration, and Hurricane Ike.

Michael Jackson's hospital visit generated more traffic than the election of the first black president in history! (Or should that be HIStory)? Wow.

I'm thinking of offering an "800,000 clicks in 10 minutes" test and naming it after the King of Pop, what do you think? There would be the Michael Jackson test, the Inauguration Day test and the Hurricane Ike test. Take your pick.

Why Self-Service and On-Demand are Critical

Last week I got a call from the senior developer at an online retailer with an urgent request - they were seeing errors in production at load levels higher than they could generate in the lab and needed to do some very large load tests immediately. I instantly put myself in his shoes and imagined how I would feel if it were my server and the word "immediately" took on a very visceral meaning!

Within an hour and a half we had negotiated the business terms and he had been trained on CapCal and was building and running his own tests! To make sure each user and session was unique, he created a few thousand test accounts and generated a CapCal replacement data set with user id, password and so forth. That took a couple hours, and then to make a short story even shorter a series of 5,000 user load tests using 10 Amazon instances uncovered a bug in their PHP framework that was causing the errors. Problem solved! (After hours of re-coding and unit testing that is).

They'll be spending a week or two doing more testing now that they've solved their immediate problem. They want to reach 150,000 simultaneous users while maintaining sub-second response times with no errors, which is pretty awesome in itself. What's even more awesome is how quickly and easily that kind of load is generated by CapCal using the Amazon cloud!

Click on the chart above and see how quickly it ramps from 1 to 5,000 in the Users column with excellent response times and zero errors. But at about 3 minutes into the test, server errors begin showing up and rapidly climb. Response times hardly degrade at all, which once again reminds us that scalability and performance go hand in hand but scalability is king!

I hope to get permission to publish their name and get a quote for the blog but I have to say that this particular developer is extremely bright and caught on to CapCal faster than practically anyone I've ever seen. But even mere mortals are capable of getting the same results in the same amount of time since there's no programming involved.

The buzz around here now is about moving CapCal to EC2 and running load tests on itself to test the autoscaling and load balancing features. Stay tuned!

How to Crash an Airline Site in 3 Minutes or Less

This is the second time on this blog where the words "airline" and "crash" are used together but fortunately we are referring to websites, not airplanes (see Pet Airways Crashes on Opening Day). The above graph shows the results of a CapCal Crash Test on a new promotional site for one of the major airlines (sorry, I can't use their name without legal approval).

There are only two pages in this test, the first one going to the main page and the next one submitting a registration form. The objective of a crash test is not to actually to crash the site but to find the point at which performance starts taking a nosedive (there's that airplane thing again). "Good" response time is less than a second, where "bad" starts at about 2 seconds and goes up. (These aren't subjective measures as much as the result of having run thousands of tests and seeing the same pattern over and over - usually after it reaches 2 seconds it goes up rapidly from there).

This one was able to retain good response time until it reached around 400 users. At that point you can see the "hockey stick" effect in the red bars on the right that reach 6 seconds at about 1,340 users. At 2,000 users, most people would think the site was offline and go somewhere else. For all intents and purposes it has crashed, even though it is still handling requests. Since the average response time is 6 seconds, some of the pages are probably taking more than 10. Life is much too short for slow web apps.

However, have a look at the Site Errors column to the left of the green bars - at a little over 1,000 users the server starts generating errors, and those can be even worse than slow response times. Don't you just hate it when you get a "500 server error" after you've patiently filled in all the fields in a form? Wouldn't you hate it even more if you were the airline or the company that put together the web site? That's why scalability and performance go hand and hand but scalability is king!

The site we tested was a staging site that mirrors the production site so this is what they can realistically expect to see in production. This test used five small Amazon instances so the cost was negligible. Not too shabby for something that could prevent a million dollar disaster if the promotional campaign itself flopped because of a glitch in the web site!

Integrated Performance Testing in Action

Automated functional testing, for someone who is watching it for the first time, is quite amazing. What would take a person hours to do by hand flashes before you on the screen within minutes. It's that phenomenal time savings and increased test coverage that makes it worthwhile, but the "while" part is the rub - if you end up spending all your time maintaining the test scripts as the application changes, the time savings starts to degrade to the point of ever diminishing returns.

So obviously making the tests easier to create and maintain is a huge leap forward, but if you have to turn around and create performance tests using a DIFFERENT tool you are faced with the same dilemma if not worse. Take the HP Mercury suite as an example - QuickTest Professional uses a scripting engine based on VBScript, while LoadRunner uses the C language as its basis. In other words, maintaining a load testing script can very often require more technical expertise than it took to write the application itself!

One of the major differences between functional and performance testing is that functional testing can be done manually while it is extremely difficult and unwieldy to do performance testing that way. That doesn't mean that people haven't tried it, and the thought of coordinating dozens of people to perform a load test would be funny if it weren't so absurd - in fact it was only last year that I heard of a case where this was being done at a major corporation for lack of a suitable load testing tool.

But even with a load testing tool at your disposal, if you've already invested in functional test automation doesn't it seem bizarre that you would have to turn around and duplicate all that cost and effort just to do performance testing? Sure, there are some major differences between functional and performance testing but that doesn't mean that an entirely different tool with its own language is necessary. It's evolved that way for historical reasons that have a lot more to do with the search for more revenue by test tool vendors than the search for more efficient testing methodologies.

If you could use a subset of your functional tests for performance testing not only would it save time and money but you could be many times more productive and efficient in your testing cycle. That's what we call "integrated performance testing" and it involves capturing an automated functional test while it is running and then immediately using it to create and execute a performance test. Not only have you saved the time and money involved in writing and maintaining a separate set of tests but you've also made sure that both kinds of testing get done at the same time.

Here you can see a screen recording of an automated Worksoft Certify test being run with CapCal Integrated Performance Testing. The demo shows batch command execution because that is how these tests will be integrated into the nightly build cycle. However, we've also tested with HP QuickTest Pro, Compuware QA Tester and Automated QA Test Complete. For a proof of concept in your own environment with whatever tool you are using, just write to info@capcal.com and someone will get right back to you!

What the Cloud Really Looks Like

This article today in New York Times online was an interesting read. Called "Data Center Overload", it gives a good overview of data centers in general and talks about cloud computing too. Besides the huge amount of space and power these things consume, the subject of latency (or the time it takes to get a packet from Point A to Point B) was also covered.

As it turns out, nowhere is latency more critical than in financial trading, where seconds can make the difference between a million dollar trade won or lost. One of the data points that intrigued me the most was this one:

Latency concerns are not limited to Wall Street; it is estimated that a 100-millisecond delay reduces Amazon’s sales by 1 percent.

One percent of total sales for 100 milliseconds?

Let that sink in for a moment.

That just boggles the mind. Do you suppose that performance testing is practiced as it should be at Amazon? I certainly hope so!

Another interesting tidbit was that Amazon Web Services now uses more bandwidth than Amazon's massive retailing operations. If that doesn't point to where the cloud is heading, I don't know what does!

On Finding a Home in the Cloud

Load testing is obviously a perfect fit for the Amazon Cloud - need 1,000 computers for a big test? Just spin them up, use them, and tear them down. To deliver that same capability, CapCal once had over 10,000 people around the world running our agent for $0.30 per hour of testing. A small Linux instance on Amazon EC2 costs $0.12, almost 1/3 the cost! We didn't pay for bandwidth, though, and that was the main attraction. But they weren't as secure and reliable as the Amazon agents, and that makes all the difference!

Load testing is also called stress testing, performance testing, volume testing, scalability testing and so forth, all of which refer more to the objective than the means. But what about other kinds of testing, like functional and regression testing? Do they have a Home in the Cloud as well?

Of course they do! Actually anything but unit testing and manual testing can be done on the cloud and should if you read the post below this one. Running functional and regression tests takes time, and time can be squeezed out of the cycle if you can throw more computers at it. Generally people do unit testing, then functional and regression testing, and then performance testing. But the ugly truth is that performance testing either a) doesn't get done at all or b) gets done at a point where it's too late to make changes.

Why is that? We'll be answering that question and showing some examples of a different approach in the next few posts so y'all come back now!

The Test Lab is Dead - Long Live the Test Lab!

A test lab, as I define it, is a room with nothing but computers which may or may not include desk space for testers. I’ve seen more than I can count and you have too if you have been doing test automation for any length of time.

Earlier test labs had a person at each computer, which was even more wasteful – not only are you taking up space and filling it with expensive hardware, but you’ve got people doing mind-numbing, repetitive work. Nowadays test labs are used only for automation if they are used correctly – manual and acceptance testing can be done at the user desktop instead of the lab.

Cloud computing is on track to replace ALL those machines and free up ALL the space they occupy, a forward leap just as huge as the leap of automation; first we replaced people, now we are replacing machines. It makes perfect sense if you think about it, and yet it is just starting to dawn on people that we can do this.

The cost and space savings combined with the dramatic increase in productivity and throughput is astounding. So if nothing else, software testing is a “killer app” or “poster child” for cloud computing; nowhere else are the benefits so obvious and immediate.

But the tools haven’t caught up yet for the most part, with a couple notable exceptions. CapCal is one, of course, and we've seen SOASTA CloudTest. I'm sure other companies have projects in the works and announcements will probably be forthcoming. But what kinds of things can you do right now that take advantage of the cloud to increase your testing coverage and reduce your costs at the same time?

For one thing, virtualization already provides a huge reduction in the amount of hardware needed for a test lab and that’s good. So maybe instead of an entire room, just a corner or a wall might be used. Then it’s simply a matter of doing the math to see if the cloud is cheaper and usually it turns out to be. In general terms, if a computer is not being operated by a human being it should also not take up space.

The cloud will bring up some interesting licensing challenges once people realize they can install software on one instance and duplicate that instance as many times as they want. Windows itself is covered, of course, since Amazon pays them. Everyone else has to trust their users to abide by the same restrictions applied to making physical copies, where these are virtual.

In this series, we’re going to explore some real life cases of testing as a service in the cloud. So keep on coming back!