“Bring Out Your Dead!”

Enterprise applications in the public cloud era

Enterprise companies represent a huge and lucrative market for the computing industry as they need to run many business critical applications and are willing to pay large sums of money to do so reliably and at scale. For the past few decades, vendors have been selling enterprise companies hardware and software solutions for the implementation of those workloads.

These solutions often need to be installed and maintained by an IT organization on-premises and are almost always “bespoke” — in other words, no two deployments are the same. The resulting ecosystem is fragile and, unbeknownst to the IT organization, often has more security holes than a piece of Swiss cheese.

There are two types of applications at play here: so-called “shrink wrapped” software (such as an Email or CRM solution) and home-grown applications written by developers working for the enterprise company itself (such as trading applications at a Wall Street firm). In both cases, inertia, regulatory requirements, and lack of detailed knowledge of the environment and the code base frequently results in stagnation and obsolete solutions that actually hinder an enterprise as it tries to achieve its business goals.

If I may steal a quote from the classic 1970’s movie, I think every enterprise should follow Eric Idle’s advice (as the village undertaker) in Monty Python and the Holy Grail and tell its developers and IT personnel to “Bring out your dead!” at least once every three years. Even in medieval times, people understood that keeping “dead bodies” around is not conducive to a healthy environment. I’m sure many would reply “I’m not dead yet!” just like in the movie, but clearing out the architectural and implementation cobwebs will benefit everyone involved.

Try comparing, for example, your email installation with that of your competitor across the street. He has Exchange 2016 SP3 running on a 4-node cluster of Dell servers with Windows Server 2008 SP2 and EMC Symmetric for storage. He uses Active Directory for identity and Digital Rights Management add-on for better security. He is running this solution on VMware vSphere 6.0 for server consolidation, OpenStack for management, and Symantec NetBackup 6.3 for archival. He is also using F5 appliances for load balancing and Cisco ASA as a Firewall. Except for that one Business Unit that came through an acquisition and has been running a different version of Exchange on HP servers with NetApp filers and HP OpenView…

Your setup, let’s just say, is ever so slightly different.

These are just half a dozen variables in the hardware and software soup (dare I say cesspool?) that is running in your data center. There are hundreds, if not thousands, of such variables in each data center running on-prem “Enterprise class” software. Each of the aforementioned components has gone through rigorous testing by the vendor but I can pretty much guarantee they have never been put together in exactly the combination that you have chosen.

Any wonder none of those vendors can replicate your problem when you call them at 3:00 AM Saturday morning complaining that you can’t restore a daily backup or when the firmware on your storage subsystem throws an exception? Every time you change a single variable, you double the complexity and the testing matrix for the companies involved.

The example above just covered email. Add to that HR and CRM and Finance and a dozen other stacks as well as the home grown apps that every enterprise depends on. Every one of those solutions, I claim, offers less reliability, availability, security, performance, manageability, and worse overall TCO, than any of the current generation of public clouds and comparable SaaS solutions.

IT organizations often claim their hands are tied, that they have regulatory requirements (for example, in the financial and health sectors) or that Business Units within the company write the home-grown apps and IT personnel only have a say in the underlying infrastructure. It’s so much easier to just order a few more servers, another SAN array from the same vendor, and a few more top-of-rack switches than to re-architect an application.

Over time, most IT organizations also make compromises based on budget constraints, time constraints, political constraints, new industry trends, M&A activity, and often even the whims of their personnel: HP for a few years, then Dell, then Cisco UCS. Windows for a while, then RedHat. EMC Symmetrix for a few years because it’s the Cadillac solution and we need the best, followed by NetApp filers, then Pure Storage. Oracle for the database while it’s hot, followed by SQL to save money, followed by Postgres since it’s free. A cool startup solution for load balancing followed by a more conservative approach when that startup goes out of business. This is how you end up with [architectural] spaghetti in your data center.

At best, you end up building a fragile environment that works well during normal operations but falls apart as soon as any single component hits a problem. Any such deployment doesn’t just have a Single Point of Failure; it has many Points of Failure. Compare that to the current generation best of breed public clouds that are not just designed from the ground up for redundancy, availability, and maintainability, they’re designed for Failure.

Worse, most IT organizations can’t keep track of required security patches for this dizzying array of software and hardware, resulting in embarrassing front page news and executive resignations when hackers break in through obscure channels, e.g., the HVAC system at Target!

Here’s a simple analogy to drive the point home: If I ask you to take me from point A to point B, would you take out your phone and call for an Uber or would you start ordering vehicle parts so you can assemble a car to fulfill the request? Even if you choose to take the latter route, I bet you wouldn’t order the chassis from Toyota, the engine from Honda, and the steering wheel from GM! Then why are you doing that when you want to run the most critical business apps, the ones that your company depends on?

On-prem software was once a necessity. Today, given the evolution of the internet and the availability of fat pipes, it’s just a recipe for disaster. Private clouds and Hyper-converged Infrastructure solve only a part of the problem but don’t address the fundamental software integration issues I’ve described above. Hybrid clouds? Those don’t even really exist given the architectural differences between private and public cloud deployments.

To be sure, moving to the cloud doesn’t address all the issues — but it does at least replace an outdated service delivery model with a modern one and puts operational responsibility in the hands of those who know best how to do it.

At the end of the day, every piece of software you install and maintain in that ecosystem requires plugins and patches and “agents” and “connectors” in order to integrate with other pieces of software. The complexity increases exponentially every time you add one more variable, all the way from hardware to operating system to applications to storage system to firewall to identity system to backup system to management solution to … you name it.

And you end up being the system integrator. I guarantee no one else in the world is running the same exact mix of hardware and software that you are. The situation is even worse in the case of home grown applications written decades ago. The developers are long gone and the organization is afraid to touch the deployment.

The right long term solution is to re-architect all these applications for the cloud, not to use a forklift to move the legacy monolithic applications (be it shrink-wrapped or home-grown) to the cloud because your IT department is “comfortable with the current tools”.

It’s hard to let go of legacy but I argue it’s always better to understand the core requirements for that application and find the closest commercially available SaaS solution on the market. Such an answer is always the right one in the long run based not just on CapEx and OpEx savings but also in terms of overall service availability and reduced attack surface.

But the IT department is usually the last one to tell you that; their jobs are not best served by that answer. Nor are the hardware vendors. Nor are the database companies. Nor are the operating system companies. Nor are the application providers. Nor are the management solution providers.

The promise of the cloud is obvious. Utility computing. Simplicity. Fewer variables. We standardize on one type of hardware, one operating system, one set of management tools. And, for all intents and purposes, we will always run the latest version of software. And we will offer you an SLA — which means we have to constantly monitor service levels, something your IT department is probably not doing. And We will do immediate postmortems and Root Cause Analysis in the case of service failure and share the findings with the public. In such a world, the fewer variables the better. Choice is the enemy of simplicity and reliability.

Seems obvious. Yet, a lot of people are still hanging on to the old on-prem delivery model. Every excuse is used to perpetuate the old world: compatibility, regulatory compliance, security concerns, training costs, budgetary constraints, etc.

That model (bespoke stacks running in your data center) made sense ten or twenty years ago when we didn’t have public clouds offering utility computing, when high speed connectivity didn’t exist, when we didn’t have such demanding service availability requirements. It makes no sense in the new world.

Re-architecting an enterprise application is an arduous multi-year journey. Companies typically tackle one application at a time and they do so infrequently — for good reasons. If I’m looking to upgrade my aging email infrastructure, for example, I will want to look into all the new architectures that have come along since Exchange was designed over twenty years ago.

The right answer is not to perpetuate the old model but rather to cap investment in existing on-prem solutions while re-architecting those applications for the public cloud if they are core to your business or outsourcing them to SaaS providers if they’re not.

Seen through this lens, a whole slew of infrastructure provider companies are doomed in the long term — unless they reinvent themselves. It is just as hard for these companies to do so as it is for their enterprise customers to move off existing infrastructure and solutions. Too much inertia, too many engineers and executives who are happy making incremental improvements to existing products rather than rethinking their value prop and business model.

The journey to the cloud needs to happen application by application, not company by company. The software companies that will survive in this new generation will be the ones that help you re-architect your application so it fits better to the cloud world, not ones that keep selling you solutions for the old world.

Former {CTO at VMware, VP at Microsoft, Head of Engineering & Cloud Ops at Cloudflare}. Recovering long distance runner, avid cyclist, newly minted grandpa.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store