Moving AI Out of the Shadows Means No More Model Debt

Tony Paikeday and Cyxtera Technologies • October 15, 2020 • 7 minute read

AI/ML, Enterprise Bare Metal in Data Centers




If you’ve spent any time in IT, you’ve heard the term “shadow” applied to instances where employees have gone rogue and are utilizing systems and software that they (or their department) have purchased without IT’s blessing. Shadow IT is the sprawl of innovation silos within an enterprise, often led by well-meaning business units. Today, shadow IT appears most commonly as shadow artificial intelligence (AI) when teams are engaged in AI development and adding platforms and infrastructure to build AI applications outside of IT. This inadvertently runs up costs and siphons capital and financial resources that otherwise would have gone to the IT department, which, in turn, can delay, if not derail, the goals you’ve been tasked with achieving.

With the growing ubiquity of AI in every industry, shadow AI is making another AI adoption problem worse — model debt. Model debt refers to the phenomenon where money and intellectual capital gets sunk into models that never get deployed. It’s the symptom of failing to understand the differences between how AI models and conventional enterprise software are developed, and the unique demands AI workloads place on IT infrastructure. It’s a serious — and recurring — problem for many enterprises as they scramble to keep up with a fast-evolving business climate. Luckily, there’s a solution.

Buzz, hype, and rock stars

Given the hype about AI today, you can’t swing a stick without hitting a story about robotic butlers or flying cars, all made possible by the wonders of AI. But for all the buzz, the question remains whether AI actually lives up to expectations. Thankfully there are already numerous organizations doing impactful things closer to home with AI applications ranging from enhancing customer experiences to AI-guided radiology in the healthcare setting to those a little more far afield in oil refineries or agriculture.

Data is the source code of the modern enterprise, powering AI insight. If you’re continually processing millions of rows of data in an accounting ledger hoping to catch that one fraudulent transaction, it’s highly unlikely a person will find that needle in a haystack, but AI will see it sticking out like a sore thumb. Similarly, AI is incredibly good at finding that one microscopic crack among hours and hours of high-definition video footage filmed by a drone flying around smokestacks and pipes. Not only are companies saving hundreds of millions of dollars in industrial site inspection costs with AI, but by using AI they no longer have to send people into fairly dangerous settings and hope that during this fixed window in time they’re able to discover a weakness that could bring your smokestack down.

AI has an incredible leveling capability in terms of enabling small organizations to act like very large ones and helping very large organizations to personalize their customer interactions the way a smaller one might. Despite these benefits, AI also presents considerable challenges.

AI is more than data science artistry or having great algorithms and models, and it’s nothing like conventional software. It’s developed by data scientists – the new rock stars of enterprise – who are steeped in algorithms, statistical methods, and experimentation but lack DevOps rigor. Remember model debt? Part of the problem is that data scientists might not know how to engineer for scale, or design robust platforms or build data pipelines. That work is typically done by data engineers and others specializing in platform and infrastructure. So, while data scientists excel in creative application of algorithms and experimentation, they aren’t the people that ultimately deploy these AI models in a production setting.

Improving the Odds for AI Success

If you surveyed today’s business leaders, it’s likely that eight out of 10 executives know their future intimately depends on AI and that five years from now they won’t be around if they don’t have an AI-guided or AI-powered business. But, survey those same executives and probably three-quarters of them will say they don’t know what kind of platform and infrastructure they need. Somewhere between inception and production, their companies’ initiatives are hitting stumbling blocks of how to scale AI innovation in an enterprise setting.

Necessity might be the mother of invention, but budget constraints are the father of shadow AI. With many IT departments slow to embrace AI infrastructure and too many data centers lacking the GPU compute resources that power AI development, some departments go rogue and spin up an instance in the cloud. It’s this short-term thinking that leads to long-term problems, and almost inevitably, organizations will find themselves in the model debt cul-de-sac.

Moving from shadow AI to AI center of excellence

A lot of departments within an organization find themselves standing up innovation silos unbeknownst to each other and all because IT doesn’t have a shared, centralized infrastructure that excels at unifying people, process and technology. Some organizations have multiple silos under one roof, all doing their own thing. It’s easy to see how a lot of money can be spent acquiring data-science talent and platforms in a short amount of time. And, often, because of the ease and ubiquity of cloud, people go straight there. Unfortunately, cloud is not the hammer for every AI nail.

Organizations that have successfully solved the model debt problem have a platform-first mindset. They are skilled at putting all the architectural elements in place to deploy AI, and not just deploy it, but deploy it at scale. In fact, one of the biggest causes of model debt comes down to the fact that organizations don’t know how to scale. We see that day in and day out.

So, how do you go from ideation to prototype to production? How do you ensure that more of your great models actually get deployed? It’s about having an end-to-end platform approach for AI development that’s optimized for the unique demands of AI workflow and offers the right resources to practitioners, as needed.

Centralizing that capability in an AI center of excellence on the right platform that brings together expertise, data science workflow and tooling and purpose-built infrastructure can provide the environment where organizations can move concepts to prototypes to production applications quickly, with the same DevOps rigor that manages conventional IT application development and deployment.

Doing this can create a flywheel of sorts as your team gains the muscle memory of deploying AI applications, with speed and at a lower cost every time. From a platform perspective, obviously, you enjoy the lowest total cost of ownership and the fastest ROI when people aren’t stove piping platform silos that are overspent and ultimately underutilized.

The inevitable question that follow is, “can’t I just do all of this in cloud? Do I need an on-prem solution? We don’t even have a data center anymore!”. To be sure, there is room for both cloud and dedicated infrastructure, and it’s important not to get stuck in the false dichotomy that it’s one or the other. Instead, look at what best fits your workload and business outcomes and seek out architecture that lets you use the IT resource delivery model that fits each stage of your AI development journey. Early experimentation and temporal needs are often well-addressed with one approach, while persistent high-product model prototyping and training justifies a different one. Consider the mantra “train where your data lands” to ensure that whatever you do, you’re not spending escalating money and time pushing large datasets from your data lake to your compute instance.

AI centers of excellence speed business transformation

No one wants to be positioned as a cost center; everyone wants to be seen as enabling business transformation and top-line revenue growth. To facilitate this, it’s important that rather than spawning silos across an organization, IT leads the charge to centralize and build communities of expertise, wherein the AI talent pipeline can be groomed so that the data science expertise you need to succeed can come from within your own ranks. Additionally, you’ll see more effective utilization of assets and lower project TCO, and more importantly less model debt and faster AI-powered transformation.

AI is so critical to the success of an organization today that we would encourage IT leaders and those who influence IT decisions to think about how they can lead the conversation and bring an IT-led infrastructure and strategy to the table. With an AI center of excellence, you can scale data-science expertise, facilitate the sharing of best practices, and, subsequently, the time and money it takes to go from great AI ideas to exceptional deployed models. It means not looking at AI as just another thing to shove in the cloud, but something around which you need a purpose-built structure.

Getting there with Cyxtera and NVIDIA DGX A100

For many, moving AI out of the shadows begins with solving the platform and infrastructure dilemma that stifles innovation, drives costs up, and ultimately delays the ROI of AI. Together Cyxtera and NVIDIA have built an AI compute-as-a-service offering that helps IT teams bridge the divide separating those who need the convenience of a simple OpEx-based infrastructure consumption model, with those who need the deterministic performance of dedicated infrastructure, without requiring them to own a data center. With AI compute-as-a-Service built on NVIDIA DGX A100 system, operationalized in Cyxtera’s world-class data center facilities, IT and the data science teams they support get the best of both, without sacrificing either.

As IT people, it’s not often we get to not only say the words “business transformation” but actually be the protagonists of it, the ones championing the business strategy around AI deployment. Isn’t it time that IT steps out of the shadows and into the light?


Take The First Step Out Of The Shadows

Learn more about Cyxtera’s landmark Artificial Intelligence/Machine Learning (AI/ML) compute as a service offering powered by NVIDIA DGX A100

AI/ML Compute as a Service



Views and opinions expressed in our blog posts are those of the employees who made them and do not necessarily reflect the views of the Company. A reader should not unduly rely on any statements made therein.



Tony Paikeday Senior Director, AI Systems, NVIDIA

Tony Paikeday

Senior Director, AI Systems, NVIDIA