The Enterprise Journey to the Public Cloud
“I consider the bicycle to be the most dangerous thing to life and property ever invented. The gentlest of horses are afraid of it.” — Samuel G. Hough, General Manager of the Monarch Line of steam ships. July 14, 1881.
The Death (and Re-Birth) of Infrastructure: The Rise of Public Cloud
“To really understand something is to be liberated from it.” — Dominic Frisby. The Four Horsemen.
At the heart of this discussion is the question of the public cloud’s readiness for taking on the most demanding workloads for the largest enterprises, financial institutions, healthcare providers, and government agencies.
I’ve been getting a lot of feedback, mostly positive, both in person and in writing so I was somewhat surprised recently when I met with a group of Enterprise IT practitioners who pushed back on the transition to the cloud, still convinced it’s not ready for their workloads.
Several of the comments stayed with me; hence this blog, in which I hope to answer some of the points raised by the attendees (in italics below). I’ve also added a few additional questions/comments that I’ve received through other forums.
Overall, there was general agreement in the room that the cloud was the future. The only question was: When? They all seemed to think the time to move to the cloud was still many years (“decades”, even) away for them!
I’ve spent my entire 35 year career working on enterprise software, operating systems, infrastructure, and distributed systems. For the past ten years or so, I’ve also spent a lot of time on cloud. I was taken aback when I heard from the audience some of the very same points that had been raised by IT pros against the cloud for the past decade. I thought we were past all this, but I guess not. Not yet, anyway.
The journey to the cloud happens one application at a time; and there’s no single universal solution to the needs of those applications. It would be a mistake to try to address all those needs at the bit, block, or port level — the level at which most IT organizations play today.
At the very least, you need to make an inventory of all your enterprise applications and then decide, on a case by case basis, which ones to move to the cloud. I argue that the time to move is now for the vast majority of those applications. And, as you do so, don’t just move your current stack to the cloud. Rethink the requirements at the application level. Chances are, there are plenty of better solutions available now to address those same needs.
We’re not like Netflix. If their application crashes because of an AWS outage, the worst that happens is you lose your place in the video you’re watching. [General laughter around the room] If our infrastructure crashes, it costs the company millions of dollars per hour of outage.
The implication here is that the public cloud is good enough for consumer brands like Netflix because there is no critical business data at risk when an outage happens. But that obviously wouldn’t work for us: “We are the financials, we are the healthcare industry, we can’t afford an outage even for an instant.”
This argument seems to ignore the fact that “consumer” companies like Netflix and Apple and Facebook are, routinely, serving hundreds of millions of customers simultaneously with higher availability and better performance than any Enterprise company I can think of. Remember that those enterprise companies (let’s say a financial institution or a hospital) typically only have to deliver their services to thousands or at most tens of thousands of customers at a time. Not millions. Meanwhile, the architectural solutions currently deployed by those same enterprise organizations are already buckling at the knees and coming apart at the seams trying to handle even that load.
Delivering cloud based services to millions of people is, in fact, the best way to make a technology bullet-proof for the enterprise. If you don’t believe me, just go take a look at the history of Gmail vs. Exchange. The larger the install base of customers (ahem… testers) and the fewer the knobs (ahem… test combinatorics), the more quickly you get to maturity in implementation — and you do so at scale. Once you can reliably support millions of users, you can confidently turn around and sell the same solution to enterprises for their (much smaller) needs.
For every single vertical you care to name, I bet I can name a “cloud native” company that is delivering better quality of service to its customers than any traditional enterprise company using an on-perm data center and applications. And I also bet the cloud native company is innovating more quickly and on a more scalable backend than any on-prem solution already stretched to its extremes through twenty years of architectural contortions.
But [the big cloud providers] won’t indemnify us. What if they have a major outage? It’s going to cost us millions of dollars in business.
I wish I’d had the quick wit to retort: “… as opposed to what the corporation is getting from its IT department today for management of on-prem infrastructure?!?” If you have a major outage in your private data center, some small subset of the IT department may get yelled at; at worst, one or two people may get fired. But, chances are, you will get an even larger budget next year to “fix” the problem.
So, where and how exactly does the corporation get “indemnification” from its current IT organization for an outage? And why should they expect the same when they outsource the infrastructure to the public cloud? You didn’t have it until now so why did it suddenly become a requirement?
I would urge you, IT Pro, to look back at the last year or two of outages you’ve had in your data center(s). I bet most of you don’t even have them documented nor did you completely solve the problem in retrospect. How many post-mortems did you do and what were the outcomes? Compare that to the public clouds which routinely publish root cause analysis reports on each of their outages and take corrective architectural actions to ensure that entire class of failure doesn’t happen again. Which operational model do you prefer? Which one do you think your CEO would prefer?
What about security? We can’t have our sensitive data sitting in the cloud.
The public clouds are, in fact, much more secure than any on-prem solution patched together from dozens of vendors and managed by IT professionals who don’t have access to the source code and don’t understand the various gaping holes in their perimeter defenses. Google’s approach to this problem, BeyondCorp, is in fact decades ahead of any perimeter-based defense in depth based on firewalls and ACLs.
What if they have a total data center outage in the cloud? It’s a big headache to redesign and re-implement all our failover capabilities. They’re currently implemented at the infrastructure level and the apps just fail-over with the hardware. We need the cloud guys to offer us the same set of failover capabilities that we now have on-prem (Windows clustering, Stretched SAN, Dark fiber, etc) before we can trust them with our apps.
Any properly architected enterprise application should have a redundancy strategy and a plan for failover and failback, in the case of full data center outages, that does not depend on “magical” infrastructure features underneath. The fact that your current generation of enterprise apps are not designed that way and are instead dependent on resilient “gold plated infrastructure” is not a feature, it’s a bug.
As I said in my earlier blog post, you need to “bring out your dead” (enterprise apps) once in a while and revisit them — architecturally speaking. If you are still running business critical apps that can’t withstand an entire data center outage, you have bigger problems. If you’re still running Exchange Server inside a four node Windows Cluster VM on top of a vSphere hypervisor and your plan is to recreate that exact environment in the cloud so you can get failover support for your email, perhaps it’s time to rethink your strategy. At the application level.
Every few years, Silicon Valley gets enamored with another new technology or framework. Last year, you industry pundits were telling us about how wonderful OpenStack would be. Look at how far that got us. Why should we believe you now that you are preaching cloud?
You’re comparing apples and oranges. OpenStack is an open source community effort. You, Mr. Enterprise IT Guy or Gal, are signing up for being part of the community as it evolves. With OpenStack, you get to play System Integrator. Because it’s an open system with Swiss Army knife connectors for everything from block storage to image management to networking to security to patching to whatever.
Why on Earth did you think that path would lead to reduced IT expenditure or would even converge quickly? The cloud is the opposite of that path. It says, Mr. Enterprise IT Guy, please stop getting in the middle of that level of infrastructure integration. Let us hide all that complexity behind an API and an SLA. Stop trying to “innovate” in infrastructure. Go up the stack, young man!
We don’t control the architecture anyway. The Business Units hold the purse strings and they get to make these kinds of big strategic decisions. And they don’t have any stomach for big upheavals. They just want the current stuff to keep working.
Yup. And those are the same BUs who have developers writing cloud native apps right now. Because they’ve given up on Central IT’s ability to help them in any timely manner. I never said it wouldn’t hurt to rip off the bandaid. It will require alignment from the top levels of the organization and you will get pushback from all the BUs: They just want to get their jobs done. They don’t want to deal with infrastructure.
Sooner or later, some startup will offer the same services you’re offering in your data center (be it block storage or compute or higher level services like database and firewalling and intrusion detection and load balancing or even entire applications).
Sooner or later, you will acquire another company with a more progressive cloud based approach to infrastructure delivery, and you will find that some critical part of your infrastructure is already dependent on the public cloud anyway. You can either be a passive and resistant party to this journey or you can take the lead. It’s up to you.
It can be quite expensive. The prices for public cloud based services are still too high.
Yes, of course. They will charge what the market bears. It’s an open economy. I suspect they will continue to drop their prices over the next few years as their platforms mature further and as hardware gets even cheaper.
The rate of architectural enhancements made to next generation cloud architectures is an order of magnitude faster than that of on-prem infrastructure hardware and software. You’re welcome to continue to invest in the old generation but, I promise, it’ll be for diminishing returns.
When you compare the costs, be honest and include all the hidden charges that go along with on-prem infrastructure. That’s not just a SAN box you ordered last year. This year, you already have to order the clustered upgrade to improve availability. Next year, you’ll have to invest in the management console and the snapshot provider and the backup adaptor as well.
Not to mention the Enterprise support agreement and salaries for your own operations team that has to learn how to manage it (compared to the other three SAN solutions they’ve inherited over the past decade). And that’s just the SAN box. Of course, you also just finished M&A of a rival which came with its own NAS based strategy and associated hardware and software. And we’re only talking storage so far. I can keep going but you get the idea. Let’s be honest and compare apples and apples.
Yes, it’s expensive. But the cost will go down as more and more companies adopt the cloud. And as applications become “self-aware” and gain more control of the infrastructure underneath them programmatically.
It’s a virtuous cycle that can’t be duplicated in the complexity of on-prem plug-n-play architectures of yesteryear. My argument is, at heart, an architectural one. We’ve learned a lot about distributed systems architecture in the past ten or twenty years. It’s practically impossible to retrofit those learnings into old monolithic operating system and application stacks like the ones currently running in every enterprise data center in the world.
We don’t want to be locked in to a single cloud infrastructure provider.
Smart move. There are any number of PaaS offerings you can use to avoid that problem. After all, PaaS is just a new-fangled way of saying “operating system for cloud environments”. Pivotal’s CloudFoundry and RedHat’s OpenShift are just two of the better known examples. Build your applications on the PaaS instead of directly calling the cloud provider’s APIs for various services and the whole environment can easily be moved to another cloud later if needed.
We already have a private cloud that delivers everything you get from a public cloud. And it lets us keep our data on-prem where it’s safe.
Does it? Does it, really? Or does it just perpetuate the problem by wrapping your existing infrastructure with a virtualization layer? Does it offer all the services you can get from a public cloud with just an API call or does it lock you into your current hardware-based nightmare even further?
Can you spin up thousands of VMs and containers in seconds — or does that still take weeks because of all the other requirements that are still not properly virtualized and automated? “Private cloud” is just a euphemism for “IT org that doesn’t want to give up control”.
We have a hybrid cloud strategy that lets us run exactly the same workloads on-prem or in the public cloud.
Great. That will help with the truly legacy apps that you don’t want to touch with a ten foot pole. But it also encourages bad behavior by using a “”forklift upgrade” model on apps that really should be re-architected. Your hybrid cloud strategy should include not just outsourcing of hardware but also re-architecting of the apps themselves.
We have learned so much, architecturally, about building secure highly distributed highly available services at scale in the past ten or twenty years that it would be negligent not to consider revisiting some of the old IT decisions and instead just fork-lifting the existing environment onto the public cloud.
But we have stringent security and compliance requirements. We can’t possibly move our data to the cloud.
If it’s good enough for the three letter agencies of the US government, it’s probably good enough for you. Besides, the physical location of the data has little to do with its security. Go read up on how Target was hacked (through their HVAC system) and how Google’s ‘BeyondCorp’ works before deciding that your patchwork of twenty year old on-prem architectures is more ‘secure’.
In fact, it’s likely to have by far more gaping holes in its perimeter defenses, more arcane firewall settings than your IT personnel even knows about, and more insecure pieces of software residing in your internal network than even the most shoddily designed public cloud.
As a first step, please show me all your firewall rules and the current version of all firmware running on your NAS and SAN boxes. Good luck spending the next month gathering that data. We’ll get to the routers and switches, the firewalls and load balancers, and your OS and database versions later. We won’t have any time to review the HVAC systems, unfortunately.
Now… Do you really still want to argue that you have tight control over your on-prem installation and that it’s more secure than the cloud?
But all of our IT guys are already trained on the existing infrastructure and tools. It would be too expensive to retrain them all.
You’re on a journey that will take years. You will need those skills for the foreseeable future. But by avoiding investing in the new tools, you’re just delaying the inevitable. Sooner or later, you have to retrain at least some subset of your admins. The sooner, the better.
What about data gravity? We have tons of data sitting in our data centers. We can’t possibly move all this data to the cloud.
No one is asking you to move all your existing data to the cloud. By the way, there are many ways to do that if you choose to do so. But the key facts to realize are: (a) Much of that data is historical and hardly, if ever, accessed; (b) Assuming your company is successful in the future, you will be generating many orders of magnitude more data than what you have today.
So, leave the old data where it is, leave the old apps alone for legacy and compatibility reasons, and architect the next generation of your apps from the ground up to be “cloud native”. Using this strategy, your onprem footprint will stop growing and will, in fact, naturally shrink over time as you retire old apps and data.
We have spent years cultivating a great relationship with our current infrastructure and application vendors. They’re even adding features we’ve requested that are critical for our business. We’d never get the public cloud vendors to implement the same features for us.
Yup, and you should be thankful for that. The reality is that those one-off features will cause more problems than they fix. Believe me, I was on the other side of the fence implementing some of those features. They’re often implemented begrudgingly, are tested haphazardly (often you become the guinea pig), will break every time the vendor releases a new upgrade or patch, and will result in your on-prem installation becoming a bespoke one-off environment which will only slow you down in the long run. There’s a reason that cloud vendors insist on simplicity and uniform infrastructure.
The journey to the public cloud starts one application at a time. The sooner you start, the sooner you’ll get there. Stop fighting the future and, instead, start figuring out which applications you’ll move to the cloud and in what order. The less hardware and software you have running on-prem, the better.
Digging in your heals and continuing to invest in on-prem infrastructure is morally equivalent to digging a well in your backyard because you don’t trust the quality of the water coming through your pipes. Sure, you can do it but, in this day and age, I wouldn’t recommend it.
At the end of the day, I walked away even more convinced that we will see a massive shift to the public cloud for enterprise companies over the coming years, at least the ones that want to not just survive, but also thrive, in the long run.