Continuous Availability for Your Enterprise Applications.

May 1

07:59

2010

Mandy Gee

Regardless of the company application, by taking a server based approach to any availability or disaster recovery project, IT administrators are putting vast complexity to their designs.

This silo based methodology almost always results in the use of a number of different availability / DR technologies from an array of vendors, with considerably different designs, capabilities and limited/no integration points. For example, an online web ordering system could use network load balancing for front end web servers, some form of data mirroring or clustering for backend database servers, and a 3rd party availability
alternative for middleware. Point of sale solutions, CRM tools, and even BlackBerry messaging environments follow a similar prescription, using alternate technologies for every layer in the application stack.

Having such an approach to implementing an availability solution for your business application ecosystem has several drawbacks. Initially, one must consider the cost implications of utilizing assorted technologies within an availability or DR architecture. The most apparent cost is the equity outlay for the hardware/software alone. By choosing (or being forced) to deploy different solutions from different vendors, there is no option to leverage economies of scale. Most hardware and software companies offer volume based pricing enticements for larger order sizes, but this option is obviously lost when various alternatives from different vendors are used.

Additionally if each solution leverages different underlying hardware, disk, or OS technologies, an even higher total cost of ownership will be noticed. Of course cost extends beyond just hardware and software to include implementation, coaching, and ongoing management costs. Consider deploying even a relatively basic, three tier application architecture. In the online web ordering example explained previously, one would need to deal with the somewhat unnerving task of mastering not only the intricacies of SQL clustering, but also deployment and management of network load balancing and any middleware components required. Each time a new variation of any of these solutions is made, theres the additional cost of relearning a different technology.

Then contemplate the complexity of integrating differing availability technologies from numerous vendors. Are they guaranteed to interoperate with one another? Is such interoperability built in (not likely) or will some amount of customization and manual scripting (very probably) be needed, so that each tier can communicate with the other tiers? If custom scripting is necessary, what happens when even a single part of the availability architecture changes? Will extra, custom consulting work be needed to develop and re test existing scripts? Last but not least, if and when something breaks down, whose responsibility is it to determine the root cause? With an array of solutions from different vendors, one must be wary of the unavoidable finger pointing that may result when things go bad.

Obviously one alternative is to simply not integrate the solutions, after all, so long as every piece is doing its job, isnt it safe to suppose that the total system is operational? Not really. Consider for example the deployment of a multi tier, distributed architecture across physical sites for DR purposes. If the whole, primary production site fails, will the servers start up in the right order and fashion at the remote site, or will some level of interaction be necessary from an administrator?

Now imagine the more probabletype of failure, when just one component rather than an entire site fails. Unless youve deployed a combined HA + DR solution, possibilities are that the single failed component will resume operations at the DR site. But in most cases, the latency between sites will be too high for any multi tier application to function properly. In this scenario, its best to actually fail all of the components across to the remote site as a single, cohesive unit. But once again, how does this coordination take place? Either we are back to scripting the failover in some manner, or else some hands on administrator engagement is required. When that happens,
recovery times inevitably increase; when recovery times increase, so does the bottom line expense of the outage to the business.

Article "tagged" as: