“The crash was caused by our own bespoke visitor tracking software which was not able to cope with the load (slow write times to the database).Interestingly, website crashes ordinarily do not lead to travel companies explaining the inner workings of their systems to the outside world. But Waite continues:
“We had to disable some of the visitor tracking data to get the site back online. Internally we call this the ‘Go Faster Button’, it strips away all the nice-to-have data (logging etc) to make the site go faster but leaves the customer experience unchanged apart from being faster.”
“It seems our database server wasn’t able to cope with the number of visits per second when inserting rows in to the visit detail table. We’ve now partitioned this functionality across two servers, so we should be ready for at least double the load now.”My response to that is THANK YOU MR. WAITE for publishing this little jewel and letting people know how you fixed it. If everybody whose website crashed for whatever reason would tell the whole world about it we would be much better armed in our quest for higher and higher scalability! Yes, partitioning among servers is usually the answer but you might want to run some tests just to be sure. (Use code CUPCRASH to get a free CapCal Crash Test)!