A Very Delicate Load Balancing Act


Everyone knows what a load balancer is but exactly what it does and how it goes about doing it are often mysterious. Since every single page request goes through the load balancer, how it is configured and what its capacity is can have everything to do with how well an application performs under load. For example, the maximum number of connections is a configurable parameter that is often very low in its default setting.

A current project for a cruise ship booking engine is a case in point. This particular application is being moved to the cloud for all the right reasons, namely, to have the extra capacity when needed to handle sudden usage spikes without bogging down or crashing. Our initial tests showed that the bottleneck was the database server, which is often the case. So we beefed up the database server quite a bit and saw dramatically better results. But we also saw reams of errors start to appear once we crossed a magic threshold that had to do either with the configuration or the capacity of the load balancer.

In the first case, it was exactly what I referred to earlier - if the load balancer is limited artificially to 100 connections and you try to open 200 or 2000, every request that exceeds the limit will receive an error of some kind, depending on the load balancer. In this case it was a 502 (out of resources), but I've also seen 404 (page not found) or just plain timeouts while making a connection.

Once we solved that, we tried again and hit a limit at about 1,000 users. This time we determined that the load balancer itself was being maxed out so we beefed up its CPU and memory (which is a snap on the cloud) and tried again. Now we got to 2,500 users before we started seeing any errors or delays. Our customer had this to say about the experience in a review on Amazon.com:

With the CapCal CloudBurst delivery system we were able to effectively simulate thousands of instantaneous/continuous users hitting our development environment. The after-test reports provided us with the guidance we needed to streamline our code. I highly recommend CapCal to anyone looking for a fast, affordable performance testing solution. John Hill, President HIL-TEC

Thank you, John!

We'll be publishing some actual results from these tests in future blog posts. But load balancing is such a fundamental part of the scalability and performance picture that it deserves to be studied and analyzed on its own, so that's what we'll be doing!

3 comments:

  1. Did you typo the 503 error? 503 normally means that one of the servers in the chain was down and/or overloaded. But I have never seen it happen in the frontend/load balancer.

    ReplyDelete
  2. Yes I did as a matter of fact! We did get a couple of 502 errors but the majority were "out of resources", which I think is 502. Thanks for pointing that out!

    ReplyDelete
  3. Also called “Geo” load balancing and “Site”. Used by companies with applications running at two or more sites, or companies with applications running in the company’s data center and in a hosting center. The Network load balancer will distribute users to multiple sites, as well as distribute users across multiple internet providers for site and ISP high availability. This functionality can be found in separate appliances or integrated into a Server Load Balancer.

    http://www.kemptechnologies.com/?utm_source=blog&utm_medium=pv&utm_content=zs&utm_campaign=home

    ReplyDelete