When you even contemplate bringing an old legacy system into a large-scale web project, you should do load testing on that system as part of the feasibility process before you ever write a line of production code, because if those old servers can’t handle the load, your whole project is dead in the water if you are forced to rely on them. There are no easy fixes for the fact that a 30 year old mainframe can not handle thousands of simultaneous queries. And upgrading all the back-end systems is a bigger job than the web site itself. Some of those systems are still there because attempts to upgrade them failed in the past. Too much legacy software, too many other co-reliant systems, etc. So if they aren’t going to handle the job, you need a completely different design for your public portal.

A lot of focus has been on the front-end code, because that’s the code that we can inspect, and it’s the code that lots of amateur web programmers are familiar with, so everyone’s got an opinion. And sure, it’s horribly written in many places. But in systems like this the problems that keep you up at night are almost always in the back-end integration.

The root problem was horrific management. The end result is a system built incorrectly and shipped without doing the kind of testing that sound engineering practices call for. These aren’t ‘mistakes’, they are the result of gross negligence, ignorance, and the violation of engineering best practices at just about every step of the way.

The title is adapted from Peter Drucker, "All business failures are management failures." The quoted piece is from a comment that I can't link to directly, so go to Arnold Kling on the problems with the health insurance exchanges and look for Dan Hanson on October 25, 2013 at 2:13pm.