Why Static Content is Just So Much Static
Cloub Lab Grid Automation Comes to EC2 and IBM Smart Cloud
Everybody remembers when the concept of grid computing first became popular It was the early 2000’s when projects like seti@home and the Human Gnome Project harnessed the power of thousands of computers to work on complex problems A grid can also be constructed on the LAN, and in fact a private cloud is really just that. But there was another concept that emerged at the same time called “peer-to-peer”, immortalized by the short-lived Napster project (which was file-sharing only) and made smashingly successful by Skype with its collaborative peer-to-peer model. Both grid computing and collaborative peer-to-peer have found their way into a new cloud-based platform called Cloud Lab Grid Automation.
Level Three Software Installation Nirvana
A Royal Pain of a Web site Crash
This news item tells us how the Royal Wedding brought the BBC website to its knees - didn't the Knights always have to approach the Queen on bended knee? Nowadays I guess the same rule applies to websites! We're thinking of offering Her Majesty and the Royal Family a free load test in exchange for a photo op at the Palace that we can use on our web site. Unfortunately she is not on LinkedIn or Facebook.
Kaavo Application Management and Security on the Cloud
A Ferrari on the Wide Open Web
CapCal Announces Support for Rackspace Cloud
We are proud to announce support for the Rackspace Cloud, which offers a number of advantages to our customers. It's the same ultra lightweight Debian Linux agent as before, but Rackspace offers more choices in terms of memory and CPU, including a 250MB version that rents for 3 cents per hour - that's 10 times less than CapCal agents on the Internet were paid before the cloud came along!
Please see our blog post on the Rackspace Tools site - we are excited about joining the Rackspace Cloud Tools program as a partner and customer.
Web Site Crash Related to World Cup Soccer?
“The crash was caused by our own bespoke visitor tracking software which was not able to cope with the load (slow write times to the database).Interestingly, website crashes ordinarily do not lead to travel companies explaining the inner workings of their systems to the outside world. But Waite continues:
“We had to disable some of the visitor tracking data to get the site back online. Internally we call this the ‘Go Faster Button’, it strips away all the nice-to-have data (logging etc) to make the site go faster but leaves the customer experience unchanged apart from being faster.”
“It seems our database server wasn’t able to cope with the number of visits per second when inserting rows in to the visit detail table. We’ve now partitioned this functionality across two servers, so we should be ready for at least double the load now.”My response to that is THANK YOU MR. WAITE for publishing this little jewel and letting people know how you fixed it. If everybody whose website crashed for whatever reason would tell the whole world about it we would be much better armed in our quest for higher and higher scalability! Yes, partitioning among servers is usually the answer but you might want to run some tests just to be sure. (Use code CUPCRASH to get a free CapCal Crash Test)!
Agile Testing is Collaborative Testing
This workflow for testing has been followed for decades in most companies:
Dev....QA....Ops....QA
It takes place serially, in other words. The developers do unit testing and pass off to QA. QA does functional testing and passes off to Ops (or sometimes back to Dev) for load testing. Oftentimes load testing is done by a third party because of the resources and know how required. In any case, this “waterfall” or tag-team approach presents both interpersonal and logistical challenges that can be counter-productive at worst, time-consuming at best. Finger pointing and blaming are common, as are delays and miscommunications.
Collaborative testing, on the other hand, looks more like this:
Dev....
QA ....
Ops....
It happens at the same time, or in parallel if you will. Not only is it true that two heads are better than one, but it often requires everybody on the team to identify, track down and fix certain kinds of issues. There is plenty of finger-pointing going on, but fingers are pointed where they need to be (i.e., at the bugs, bottlenecks and other issues that are always lurking below the surface). Finding and fixing these becomes a team exercise that can even be fun, a word that is rarely associated with testing. Not only that, but it can be done in hours instead of days or weeks.
Agile testing is a critical component of agile development but just like agile development it requires collaboration to be done successfully. For CapCal load tests we use Skype or GotoMeeting chat windows to facilitate the collaborative process in which everybody puts on their QA hat to run tests, analyze results, make the necessary changes and repeat the process as many times as necessary. What may seem like an enormously expensive and resource-intensive process involving anywhere from three to five people is actually a time- and money-saving procedure that reduces the development cycle by an order of magnitude. Not only that, the chat log contains a record of everything that transpired and can be used as a point of reference going forward. Finally, since geographically dispersed teams are the norm rather than the exception nowadays, it is a necessity and not just a convenience.
Our new product, CloudLab, includes a built-in chat component that accomplishes the same thing and we are very excited about it – check back soon to find out more!
Why "Statistical Regression Testing" Matters
In a recent project we had the load balancer configured for the default client header size of 2k bytes, which is fine in most cases but in this particular case there was a small percentage of headers that exceeded 2k because of cookie size. The result was an HTTP error that never seemed to occur in manual testing simply because it was statistically uncommon. Only by load testing were we able to consistently generate these errors and eventually discover their cause and fix them by raising the maximum header size in the load balancer.
This is a perfect example of why load testing is critical, not just for measuring performance but for uncovering the kinds of problems that require a large statistical sampling of client instances. The objective is not to cause stress on the system but to throw enough variations at it to make sure bugs like this are not lurking below the surface. We’ve named this “statistical regression testing” because it is a cross between functional and load testing designed to uncover issues that are statistically uncommon.
I’m sure if you think about it you’ll come up with a lot of examples in your own career where statistical regression testing either did help or would have helped. Maybe you let a functional test run all night or got several people to bang on their keyboards at the same time. Or maybe you WERE running a load test when the issue popped up and blamed it on the load testing tool until you discovered otherwise (and yes, you know who you are)!
One reason that load testing is normally done at the end of a sprint or development cycle is that there’s not much point in stressing a system that doesn’t work to begin with. The Catch 22, as you can see, is that you may need to run a load test or a whole lot of functional tests before you can say that it works well enough to stress it. In any case, “statistical regression testing” is just as important as functional and load testing and the risk of releasing bad code goes way down if you employ all three.
We’ll continue to revisit this topic and finding more examples as we go along. The statisticians among us will discover that their knowledge is very critical in determining the kinds of tests to run, and I am not exactly a statistician. But I do admire them, and hope that my understanding of this field will grow over time.
How Fast is YOUR Upload Speed?
We have recently come across a couple instances where a web app uploads huge chunks of data in a POST form and the user is forced to wait until it finishes. When examining the application at the HTTP level, we see that the huge chunk of data is downloaded as part of the HTML of the previous page. While the download takes less than a second, the upload takes anywhere from 20 to 40 seconds, even on this blazing fiber connection of mine!
This is a perfect example of why testing inside the firewall with a 1GB Ethernet connection can mask problems that end users “out there” are going to experience. In the second case that we recently encountered, the graph of a single user test looked like this:
The long red bars show the time it takes to upload the form which it just finished downloading. I have to assume that it's not an exact copy of the data it downloaded but it sure looks the same! My recommendation is to find a way to eliminate the redundancy somehow, so that only the changes are uploaded instead of the whole block.
By the way, here are the results of a speed test of my fiber connection from www.speakeasy.net - fast for sure, but MUCH slower than a local Ethernet connection on uploads:
Here endeth today's lesson in uploads vs downloads. Class dismissed!
The Value of a Good "Death Test"
Where Less is More and Free is Costly
So the process of load testing should really be focused on finding and fixing the problems encountered along the way and NOT on the load testing itself. To whatever degree your load testing tool slows this iterative process down or complicates it in any way is the measure of what it is REALLY costing you - most often there are three to five people involved in a load testing project, an expense that can dwarf the cost of just about any tool after a day or two of testing. And, paradoxically, it is the open source tools that are "free" which often take the most time to learn, master and use.
And herein lies the paradox of pricing for testing services like CapCal - less is really more when it comes to load testing and to charge customers for the amount of testing they do can actually be self-defeating and non-productive in the long run. Nobody knows in advance how much testing will be required for a given website and it can even vary substantially between releases of the same website. Fortunately the pricing of Linux instances on EC2 is low enough to make it affordable to most companies. But if the real advantage of a tool like CapCal is how much time it saves you does it not seem counter-intuitive to charge for the amount of testing you do?
It does, and that is why we lean towards a monthly pricing model that allows unlimited testing with the customer paying the AWS charges. The real ROI is in the time saved creating, modifying and running tests, which not only squeezes time out of the development process but also decreases time to market, which results in lower costs and higher revenue. Lowering costs, reducing risks and increasing revenue are what CapCal is aiming for and our customers are telling us that it works.
So if you are in the market for a load testing service and have received a quote from SOASTA or one of the others, you may discover you are paying more for less if you don't check out CapCal first!
Testing - the "Suite Spot" on the Cloud
“According to IBM research, the average enterprise devotes up to 50 percent of its entire technology infrastructure to development and testing, but typically up to 90 percent of the test infrastructure remains idle. Like many other cloud startups going after the Dev/Test market, IBM found that taking advantage of cloud computing within development and testing environments can help reduce IT labor costs by 50 percent, improve quality, and drastically reduce time to market”.
What's a Few Million Among Friends?
Meanwhile, SOASTA has announced a million user load test using 587 EC2 instances, a feat that is quite remarkable considering that it would require each instance to generate almost 2,000 users. It's possible to do this if you make the time between pages VERY long because normally the upper limit is half of that or even less due to bandwidth issues. The real challenge for such a massively large test is getting the instances to begin with - it's not that you can't but you have to get permission and often schedule it in advance. Here at CapCal we've run tests with as many as 420 instances and lately we ran a four million user test over a two hour time period.
But since we can no longer get our name in the Guinness Book for the first million user test I sent an email to Brad Johnson at SOASTA congratulating them on the accomplishment. He replied cordially and said that it's an exciting time to be in the testing business, to which I heartily agree! Cloud computing is the ultimate game changer in this space, and the field is wide open. It's good to have competitors like SOASTA because it makes us aim even higher to be the best in the eyes of our customers.
Tiger Woods Mistress Beauty Pageant crashes Howard Stern's website
This appeared in today's Detroit Examiner about how Howard Stern's website was crashed by the "Tiger Woods Mistress Beauty Pageant".
According to Liz Brown:
Howard Stern commented Thursday morning: "Doug Herwitz who runs the website, he said 'sure enough, the whole thing went down.' He says he can't even gauge the volume of traffic. Everything's blown out...And we have like a really big server. It's really rare that we can crash this thing."
Sure Doug, you may have a "really big server" but apparently not big enough, because not only size but numbers matter a LOT when hosting content that literally millions of guys might be interested in. I suggest that the next time you try something like this you have a load balanced web farm on EC2, maybe 10 Apache Linux instances to start with along with autoscaling that will add more as traffic increases.
If Howard or Doug are out there listening and want to try a free CapCal Crash Test to see at what point their "really big server" starts crapping out, just send an email to "info at capcal.com" and we'll schedule it!
Don't Shoot the Messenger - PLEASE!
This is human nature, of course, just like the glee we feel when the test results come back positive. I’ve worked with developers who LOVE CapCal when it gives them the kind of results they expect but are quick to call it into question when it doesn’t. I’m happy to say that the times in which there really is a problem with CapCal are becoming more and more infrequent, but it doesn’t keep me from assuming that it is (or at least could be) until I can prove otherwise. In the court of testing, the tool is guilty of malfunction until it can be proven otherwise. Proving it otherwise means finding and fixing the problem with the application most of the time.
Using software to test software is like using a diamond to cut diamonds (except for the word “soft”, which spoils the whole analogy if you dwell on it much). If the drill bit breaks while you are cutting a diamond you just have to replace it with a harder one and keep working. With CapCal this kind of breakage is normally due to an exotic combination of things that rarely occur simply because they are so exotic. I’d love to give you an example but you might work for a competitor and if that's the case you’ll just have to figure it out for yourself! :-)
Dealing With Surly and Downright Rude Websites!
My favorite of all the HTTP return codes is 403, which reads like this:
The server understood the request, but is refusing to fulfill it. Authorization will not help and the request should NOT be repeated.
The server understood the request but is refusing to fulfill it? That's it, no explanation given? If that’s not an example of surly, rude behavior I don’t know what is.
Or, at a lower level, there is the famous “connection refused” message that tells you nothing at all, except perhaps that the owner of the website wants nothing to do with you. That’s the cyber equivalent of a slap in the face the way I see it!
From my experience, connections most often get refused when there just aren’t enough of them to go around, and that is often because of an arbitrary and artificial limitation on the load balancer. For the nginx load balancer, for example, the default is 1024 – a lovely, round number with a venerable history for sure, but WAY too low for a majority of websites.
Remember, these connections have a life of their own that leaves them hanging around, sometimes for up to 3 minutes. So you don’t have to have 1024 simultaneous connections to hit the limit, which makes it even more of a ridiculous number.
Operations folks typically see a number like that and think it was defined that way for a good reason. Because of that, they are reluctant to change it the same way a plumber is reluctant to turn a faucet all the way up and leave it. But failure to do so will result in a cyber face-slapping for your users, who may assume the problem is at their end and start haranguing their ISP. Hopefully their ISP will keep them on the phone long enough for you to get the problem fixed!
So only YOU can prevent the rude and surly treatment of your users by making sure the “connection faucet” is turned up all the way and left there!
EOR (End of Rant)
Walmart's Site Down on Black Friday Again
This appeared on the Marketing Vox site today (along with several others) about the Walmart Black Friday website crash that occurred on the busiest shopping day of the year. Faithful readers of this blog will remember this posting about another Black Friday meltdown at Walmart a few years back. Since a visitor from Walmart showed up on the blog today, I do hope he or she will return because we are offering them a free CapCal Crash Test with up to 200,000 users if they are up to it! Unfortunately I don't know anybody there but if anyone out there does, please pass this along!
But hurry because this offer is only good until December 22, 2012 (and that's only because the world will end that day according to the ancient Mayan prophecies)!
Stanford Uses CapCal with EC2 for Student Portal
Coalition Networks is a consulting firm in the Bay Area that was contracted by Stanford University to calibrate the performance of their Student Housing Portal, a web-based application that all students use to select their residence options and apply for a residence. As might be expected, this can often lead to a usage peak in the hours leading up to the deadline, and Stanford RD&E IT wanted to make sure their servers were adequate for supporting up to a couple thousand simultaneous users.
According to Akin Ajiboye with CNI:
Our test plan called for generating a load of up to 2,000 users, which even at 100 user per computer would require 20 machines. Fortunately, Coalition Networks partnered with Aligned Technology and CapCal to get the job. With CapCal running on the Amazon EC2 cloud we were able to fire up as many servers as we needed in less than a minute. While the test was running we gathered all the relevant network and database statistics and were able to form a complete picture of the application's performance. We recommend CapCal with EC2 as a great way to get excellent results quickly.
As a CapCal partner, Hiroaki Ajari of Aligned Technology had this to say:
We’ve worked with CapCal since the earliest versions so we were very excited to see it become available on the Amazon cloud. Of course it was a great honor and a privilege to work with one of the finest universities in the world, right here at the center of the technology universe. This was our first time to see and use EC2 and experience scalability as required. It's such a luxury to utilize servers on demand, minimizing waste - resources, cost, total footprint. Amazon's EC2 is perfect for the cyclical nature of testing; especially to handle performance testing's environmental needs for generating and distributing load from the test servers.
Our joint efforts with Amazon, Aligned Technology and Coalition Networks provided Stanford a quantitative way to measure risk, triggering mitigation strategies that allow them to maintain first class service to their students and administrators.
CapCal Demo Video Now Available Online
We recently completed an 8 minute Camtasia video showing how CapCal works with Amazon EC2. Please have a look and give us your feedback - we still want to make a few tweaks but it's 90% there!
Interview in Software Test &Performance Magazine!
Hope to see you there, and thanks again to Andrew Muns with ST&P for the honor of being in print in such a classy publication!
A Very Delicate Load Balancing Act
Everyone knows what a load balancer is but exactly what it does and how it goes about doing it are often mysterious. Since every single page request goes through the load balancer, how it is configured and what its capacity is can have everything to do with how well an application performs under load. For example, the maximum number of connections is a configurable parameter that is often very low in its default setting.
A current project for a cruise ship booking engine is a case in point. This particular application is being moved to the cloud for all the right reasons, namely, to have the extra capacity when needed to handle sudden usage spikes without bogging down or crashing. Our initial tests showed that the bottleneck was the database server, which is often the case. So we beefed up the database server quite a bit and saw dramatically better results. But we also saw reams of errors start to appear once we crossed a magic threshold that had to do either with the configuration or the capacity of the load balancer.
In the first case, it was exactly what I referred to earlier - if the load balancer is limited artificially to 100 connections and you try to open 200 or 2000, every request that exceeds the limit will receive an error of some kind, depending on the load balancer. In this case it was a 502 (out of resources), but I've also seen 404 (page not found) or just plain timeouts while making a connection.
Once we solved that, we tried again and hit a limit at about 1,000 users. This time we determined that the load balancer itself was being maxed out so we beefed up its CPU and memory (which is a snap on the cloud) and tried again. Now we got to 2,500 users before we started seeing any errors or delays. Our customer had this to say about the experience in a review on Amazon.com:
With the CapCal CloudBurst delivery system we were able to effectively simulate thousands of instantaneous/continuous users hitting our development environment. The after-test reports provided us with the guidance we needed to streamline our code. I highly recommend CapCal to anyone looking for a fast, affordable performance testing solution. John Hill, President HIL-TEC
Thank you, John!
We'll be publishing some actual results from these tests in future blog posts. But load balancing is such a fundamental part of the scalability and performance picture that it deserves to be studied and analyzed on its own, so that's what we'll be doing!
Blastoff Leaves the Launch Pad!!
Blastoff belongs in the category of "Wow, why didn't I think of that?" They've teamed up with hundreds of retailers to offer discounts to all their members, and to me it makes perfect sense. Unlike companies that push their members to sell for them, Blastoff just asks you to buy the things you would normally buy but for less money. Oh, and you get paid every time one of your friends buys something!
Pay less for what you buy and get paid when your friends buy for less - that's hard to beat! If this takes off the way I think it will, there will be another household name in the online retailing space before long!
And don't forget you heard it here first, on the CapCal Blog!
Barclay's Online Banking Takes a Dive (Again)
Either the bank has been plagued with problems or a disproportionate number of Reg readers use the bank's online services. In July the service went down after a new version of the site, and associated marketing push, was launched. In June it lost its ATM network and watched its website crash twice. It had similar problems in October of last year too.
Of course, this kind of thing happens all the time; there's no use in singling out Barclays. A marketing push is usually preceded by a new version of the site, which ends up being a double-whammy : if the marketing push is successful, it can often overwhelm the servers and thereby nullify the entire campaign.
First of all, they should know that a successful marketing campaign can result in way more activity than normal, and in fact that's the whole point! This is where EC2 comes in, with load balancing and autoscaling that will handle whatever comes their way.
Secondly, even with the full power of Amazon's data centers behind them, if the app itself isn't scalable they will still have problems. This blog beats these two points like a dead horse, and yet they can't be overemphasized - cloud computing + cloud testing = success. Forgo either one and you are asking for trouble!
Finally, this is a BANK, not just a place to buy the cute little coats worn by the Obama girls or to check out the latest swimsuit fashions. When people can't access their money they tend to get very nervous (or at least I do).
So I hope the CIO of Barclay's is entering the right phrases into his or her search engine of choice. If so, this page will likely turn up along with an offer for a free CapCal Crash Test of up to 10,000 users on Amazon EC2!
Sounds like a jolly good idea to me!
Your Servers Autoscale But What About The Rest?
Have a look at the 2 minute CapCal test above (click on it for a better view) that attempts to reach 1,000 virtual users but at about 700 begins generating thousands of 503 (out of resources) errors. The green bars that show the ever-increasing bandwidth also drop precipitously when the errors start because only error headers are being returned instead of content.
Is this a site that expects to have more than 700 people online at any given time? Try 7,000 or even 70,000! Could this test be run in the lab using a tool like JMeter? Maybe at 1,000 virtual users, but at higher loads it just isn't practical or feasible because of the number of computers that would be required.
This is an example of a scalability test done against a single static page that doesn't even begin to stress the servers and yet it shows a scalability limit that required all of 2 minutes to uncover. Proof once again that performance rocks but scalability rules!
CapCal's Debut in the Amazon Solution Catalog!
Handling the Dirty Work of Dynamic Content
There are two types of dynamic content, server-side and client-side. An example of server-side content is online news, which changes constantly but is stored on the server and thus can be used for creating performance tests. Client-side dynamic content comes from AJAX or web services calls and shows up only in the browser, like a list of tickets available for a sporting event or concert. This particular site "sports" a rich client interface on the front end and uses Amazon EC2 on the back end. When you click on an event, the list of available seats is totally dynamic - it isn't stored anywhere except in the browser at that moment. Like an airline flight, it can disappear in an instant if the event gets sold out.
One of the companies we are working with is dealing with server-side dynamic content (online news) and another is dealing with the client-side content described above (online tickets). In the first case, what we envision is fairly straightforward - a small script will query the server database and generate a comma delimited text file to fill in the parameters of "template sessions" on the CapCal server, then call another script that automatically uploads it and kicks off the test. This will generate thousands of unique test cases that can be run by thousands of virtual users against a staging site before moving new content or application changes into production. Not only will the most current and dynamic content be tested, it will be tested in enough combinations and at high enough load levels to flush out any bugs or bottlenecks that might be lurking undetected.
In the case of the online ticket site and the countless other similar applications, we are at work on a solution that will be quick and easy for users while not so trivial for us developers (as it should be, of course)! Basically it will be an extension to the browser add-on in the CapCal client that allows fields or links in the browser to be identified and grouped so they can be dynamically accessed at runtime by each virtual user and manipulated by a new command called "Click-On". I can't divulge the magical powers of this new command until they fully exist and have been confirmed by users. But suffice it to say that it will easily solve the seat selection problem and that's what matters most right now!
I'm especially excited about this because it is true automation, with no manual intervention at all other than writing the extraction query or identifying the dynamic fields in the browser and recording the session template. Add to that the enormous economies of scale made possible by the cloud and all of a sudden something that has never been done effectively (if at all, as in the case of dynamic client content) can suddenly be done very thoroughly and cost effectively with no manual intervention.
Simple, powerful and clean - like soap and AJAX!
How Much Did Michael Jackson Rock the Web?
This story in today's New York Times gave some meat to the rumor that Michael Jackson's death wreaked considerable havoc on the Wild Wild Web. One of the metrics that stood out for me was that Yahoo experienced 800,000 clicks in the first 10 minutes, breaking their previous record!
This is copied from the Yahoo corporate blog :
The passing of the King of Pop set multiple records across Yahoo!. On our front page, the story “Michael Jackson rushed to hospital” was the highest clicking story in our history. It generated a whopping 800,000 clicks within 10 minutes and news of his death saw 560,000 clicks in 10 minutes. Also, the news area on our front page experienced five times the amount of traffic it normally receives.
Yahoo! News set an all-time record in unique visitors with 16.4 million people, surpassing our previous record of 15.1 million visitors on election day. Four million people visited the site between 3-4pm Pacific time, setting an hourly record. We also recorded 175 million page views yesterday, our fourth highest after Inauguration Day, the day after the Inauguration, and Hurricane Ike.
Michael Jackson's hospital visit generated more traffic than the election of the first black president in history! (Or should that be HIStory)? Wow.I'm thinking of offering an "800,000 clicks in 10 minutes" test and naming it after the King of Pop, what do you think? There would be the Michael Jackson test, the Inauguration Day test and the Hurricane Ike test. Take your pick.
Why Self-Service and On-Demand are Critical
Within an hour and a half we had negotiated the business terms and he had been trained on CapCal and was building and running his own tests! To make sure each user and session was unique, he created a few thousand test accounts and generated a CapCal replacement data set with user id, password and so forth. That took a couple hours, and then to make a short story even shorter a series of 5,000 user load tests using 10 Amazon instances uncovered a bug in their PHP framework that was causing the errors. Problem solved! (After hours of re-coding and unit testing that is).
They'll be spending a week or two doing more testing now that they've solved their immediate problem. They want to reach 150,000 simultaneous users while maintaining sub-second response times with no errors, which is pretty awesome in itself. What's even more awesome is how quickly and easily that kind of load is generated by CapCal using the Amazon cloud!
Click on the chart above and see how quickly it ramps from 1 to 5,000 in the Users column with excellent response times and zero errors. But at about 3 minutes into the test, server errors begin showing up and rapidly climb. Response times hardly degrade at all, which once again reminds us that scalability and performance go hand in hand but scalability is king!
I hope to get permission to publish their name and get a quote for the blog but I have to say that this particular developer is extremely bright and caught on to CapCal faster than practically anyone I've ever seen. But even mere mortals are capable of getting the same results in the same amount of time since there's no programming involved.
The buzz around here now is about moving CapCal to EC2 and running load tests on itself to test the autoscaling and load balancing features. Stay tuned!
How to Crash an Airline Site in 3 Minutes or Less
This is the second time on this blog where the words "airline" and "crash" are used together but fortunately we are referring to websites, not airplanes (see Pet Airways Crashes on Opening Day). The above graph shows the results of a CapCal Crash Test on a new promotional site for one of the major airlines (sorry, I can't use their name without legal approval).
There are only two pages in this test, the first one going to the main page and the next one submitting a registration form. The objective of a crash test is not to actually to crash the site but to find the point at which performance starts taking a nosedive (there's that airplane thing again). "Good" response time is less than a second, where "bad" starts at about 2 seconds and goes up. (These aren't subjective measures as much as the result of having run thousands of tests and seeing the same pattern over and over - usually after it reaches 2 seconds it goes up rapidly from there).
This one was able to retain good response time until it reached around 400 users. At that point you can see the "hockey stick" effect in the red bars on the right that reach 6 seconds at about 1,340 users. At 2,000 users, most people would think the site was offline and go somewhere else. For all intents and purposes it has crashed, even though it is still handling requests. Since the average response time is 6 seconds, some of the pages are probably taking more than 10. Life is much too short for slow web apps.
However, have a look at the Site Errors column to the left of the green bars - at a little over 1,000 users the server starts generating errors, and those can be even worse than slow response times. Don't you just hate it when you get a "500 server error" after you've patiently filled in all the fields in a form? Wouldn't you hate it even more if you were the airline or the company that put together the web site? That's why scalability and performance go hand and hand but scalability is king!
The site we tested was a staging site that mirrors the production site so this is what they can realistically expect to see in production. This test used five small Amazon instances so the cost was negligible. Not too shabby for something that could prevent a million dollar disaster if the promotional campaign itself flopped because of a glitch in the web site!
Integrated Performance Testing in Action
Automated functional testing, for someone who is watching it for the first time, is quite amazing. What would take a person hours to do by hand flashes before you on the screen within minutes. It's that phenomenal time savings and increased test coverage that makes it worthwhile, but the "while" part is the rub - if you end up spending all your time maintaining the test scripts as the application changes, the time savings starts to degrade to the point of ever diminishing returns.
So obviously making the tests easier to create and maintain is a huge leap forward, but if you have to turn around and create performance tests using a DIFFERENT tool you are faced with the same dilemma if not worse. Take the HP Mercury suite as an example - QuickTest Professional uses a scripting engine based on VBScript, while LoadRunner uses the C language as its basis. In other words, maintaining a load testing script can very often require more technical expertise than it took to write the application itself!
One of the major differences between functional and performance testing is that functional testing can be done manually while it is extremely difficult and unwieldy to do performance testing that way. That doesn't mean that people haven't tried it, and the thought of coordinating dozens of people to perform a load test would be funny if it weren't so absurd - in fact it was only last year that I heard of a case where this was being done at a major corporation for lack of a suitable load testing tool.
But even with a load testing tool at your disposal, if you've already invested in functional test automation doesn't it seem bizarre that you would have to turn around and duplicate all that cost and effort just to do performance testing? Sure, there are some major differences between functional and performance testing but that doesn't mean that an entirely different tool with its own language is necessary. It's evolved that way for historical reasons that have a lot more to do with the search for more revenue by test tool vendors than the search for more efficient testing methodologies.
If you could use a subset of your functional tests for performance testing not only would it save time and money but you could be many times more productive and efficient in your testing cycle. That's what we call "integrated performance testing" and it involves capturing an automated functional test while it is running and then immediately using it to create and execute a performance test. Not only have you saved the time and money involved in writing and maintaining a separate set of tests but you've also made sure that both kinds of testing get done at the same time.
Here you can see a screen recording of an automated Worksoft Certify test being run with CapCal Integrated Performance Testing. The demo shows batch command execution because that is how these tests will be integrated into the nightly build cycle. However, we've also tested with HP QuickTest Pro, Compuware QA Tester and Automated QA Test Complete. For a proof of concept in your own environment with whatever tool you are using, just write to info@capcal.com and someone will get right back to you!
What the Cloud Really Looks Like
As it turns out, nowhere is latency more critical than in financial trading, where seconds can make the difference between a million dollar trade won or lost. One of the data points that intrigued me the most was this one:
Latency concerns are not limited to Wall Street; it is estimated that a 100-millisecond delay reduces Amazon’s sales by 1 percent.
One percent of total sales for 100 milliseconds?
Let that sink in for a moment.
That just boggles the mind. Do you suppose that performance testing is practiced as it should be at Amazon? I certainly hope so!
Another interesting tidbit was that Amazon Web Services now uses more bandwidth than Amazon's massive retailing operations. If that doesn't point to where the cloud is heading, I don't know what does!
On Finding a Home in the Cloud
Load testing is obviously a perfect fit for the Amazon Cloud - need 1,000 computers for a big test? Just spin them up, use them, and tear them down. To deliver that same capability, CapCal once had over 10,000 people around the world running our agent for $0.30 per hour of testing. A small Linux instance on Amazon EC2 costs $0.12, almost 1/3 the cost! We didn't pay for bandwidth, though, and that was the main attraction. But they weren't as secure and reliable as the Amazon agents, and that makes all the difference!
Load testing is also called stress testing, performance testing, volume testing, scalability testing and so forth, all of which refer more to the objective than the means. But what about other kinds of testing, like functional and regression testing? Do they have a Home in the Cloud as well?
Of course they do! Actually anything but unit testing and manual testing can be done on the cloud and should if you read the post below this one. Running functional and regression tests takes time, and time can be squeezed out of the cycle if you can throw more computers at it. Generally people do unit testing, then functional and regression testing, and then performance testing. But the ugly truth is that performance testing either a) doesn't get done at all or b) gets done at a point where it's too late to make changes.
Why is that? We'll be answering that question and showing some examples of a different approach in the next few posts so y'all come back now!
The Test Lab is Dead - Long Live the Test Lab!
A test lab, as I define it, is a room with nothing but computers which may or may not include desk space for testers. I’ve seen more than I can count and you have too if you have been doing test automation for any length of time.
Earlier test labs had a person at each computer, which was even more wasteful – not only are you taking up space and filling it with expensive hardware, but you’ve got people doing mind-numbing, repetitive work. Nowadays test labs are used only for automation if they are used correctly – manual and acceptance testing can be done at the user desktop instead of the lab.
Cloud computing is on track to replace ALL those machines and free up ALL the space they occupy, a forward leap just as huge as the leap of automation; first we replaced people, now we are replacing machines. It makes perfect sense if you think about it, and yet it is just starting to dawn on people that we can do this.
The cost and space savings combined with the dramatic increase in productivity and throughput is astounding. So if nothing else, software testing is a “killer app” or “poster child” for cloud computing; nowhere else are the benefits so obvious and immediate.
But the tools haven’t caught up yet for the most part, with a couple notable exceptions. CapCal is one, of course, and we've seen SOASTA CloudTest. I'm sure other companies have projects in the works and announcements will probably be forthcoming. But what kinds of things can you do right now that take advantage of the cloud to increase your testing coverage and reduce your costs at the same time?
For one thing, virtualization already provides a huge reduction in the amount of hardware needed for a test lab and that’s good. So maybe instead of an entire room, just a corner or a wall might be used. Then it’s simply a matter of doing the math to see if the cloud is cheaper and usually it turns out to be. In general terms, if a computer is not being operated by a human being it should also not take up space.
The cloud will bring up some interesting licensing challenges once people realize they can install software on one instance and duplicate that instance as many times as they want. Windows itself is covered, of course, since Amazon pays them. Everyone else has to trust their users to abide by the same restrictions applied to making physical copies, where these are virtual.
In this series, we’re going to explore some real life cases of testing as a service in the cloud. So keep on coming back!