Many customers purchase AI systems originally designed for research labs and attempt to integrate them into production environments, only to end up with severe AI buyer’s remorse.
Once these boxes land in the data center, the AI team or data science team may be thrilled, but I can’t say the same for IT operations. Research hardware may look like a production platform at first glance, but that impression collapses as soon as real workloads appear.
A big part of the problem is expectations.
Enterprise buyers see the word “appliance” and assume it means a feature-rich, production-hardened, automated platform as capable and comprehensive as VMware Cloud Foundation (VCF). However, these “search” devices were never designed to support long-running, business-critical applications. Basic business requirements required for production inference workloads are either missing or immature, such as: patch and lifecycle management, observability, firewalls, high availability, automation and logging, role-based access controls, auditing, basic backup and restore capabilities, and much more. When it comes to production workloads, the GPU is the easy part.
Operational pain
You see it when teams attempt their first real operational task. For pure R&D, you don’t worry about logging or change control. But as soon as someone tries to patch software, update firmware, or manage dependencies, the illusion shatters. They suddenly realize they need another tool for something that should have been integrated.
From there, the operational pain grows. IT must add backup, high availability, monitoring, security, and compliance tools after the fact. The economic situation is deteriorating just as quickly. Utilization of these appliances often stagnates between 40 and 60 percent because the hardware footprint is oversized for the inference work being performed and the automation tools that could increase utilization are either absent or immature. In contrast, virtualized platforms typically achieve utilization rates of around 80% or more. This 30-point gap is significant: it’s 30% fewer servers, 30% less networking hardware, and 30% less power and cooling. The real penalty is the cascading cost of buying the wrong device, and it’s much more than the sticker price.
Misaligned incentives make the problem worse. GPU sales teams are paid based on volume, not fit. Server vendors want to sell the largest configuration possible. Meanwhile, many customers have misread the moment and assume they need training materials, even though most companies should rely on basic models and focus on inference. So they buy devices designed for training AI models and quickly discover that they can’t handle production inference workloads. All the things they get in VCF that “work” just aren’t there.
Last year’s push at board level to “do something with AI” has accelerated this trend. Some teams didn’t even know what their use cases were yet: they simply purchased the hardware. Now, CIOs fear being held accountable for oversizing data centers or purchasing colocation capacity based on poor assumptions. Much of this scaling was driven by sales teams motivated by the number of GPUs they could move.
Another source of confusion comes from the “AI factory” talk. Vendors present AI nodes as token-generating factories, as if generating tokens is all the work. But businesses need so much more: secure token pipelines, logging, observability, availability, change control, and full lifecycle management. None of them come with search devices, and once the cost of the device is amortized, all the missing operational requirements begin to pile up.
Course corrections
Some organizations are discovering this the hard way. A large European-based research institute built its sovereign cloud on VCF, then purchased AI factory appliances thinking that was just what you’re supposed to do for AI. They immediately ran into integration problems. Their enterprise software automation couldn’t interoperate with the appliances, and because those appliances were now a sunk cost, they had to create an entire parallel management plan to support them. In the future, their new AI inference infrastructure will be based on VCF.
Others have already corrected their trajectory. One of the world’s largest manufacturers, after years of bare metal, wanted to be leaner, more agile and more cost-optimized. They are in the process of moving almost half of their HPC fleet (around 10,000 cores) to VCF, and will also move all of their AI workloads there. Their alternative was to continue with bare-metal “AI factory” footprints, but they understood the additional costs associated with these architectures. A major regional cloud provider we work with has reached the same conclusion and is rejecting bare metal AI factories after discovering their hidden operational expenses.
The industry is also moving away from the belief that companies should build core models. The number of inference workloads outnumbers training, but many organizations continue to purchase training hardware as if that is their role. Only a handful of companies should form large model foundations. Most companies don’t need to create models: they need to optimize them, and open source offers many powerful options, as well as commercial models that you can run on your own infrastructure.
Many assumptions still trip people up. Buyers assume that lifecycle management, RBAC, high availability, secure pipelines, and version control automatically come with the appliance. They assume they can patch and update systems like they always have. They assume that pipelines are secure and that role-based access controls are applicable. They assume high availability is built in. They assume that the main metric is the number of tokens the system can generate. None of this reflects what is actually required in production.
Production AI needs observability, log management, workload scheduling, firewalls, encryption, maintenance workflows, and intelligent GPU placement. Scheduling behavior is important: some schedulers behave randomly and leave you with fragmented GPUs. Firewalls are important. Encryption is important. These are everyday business problems. AI is no exception. You cannot base architectural decisions solely on quantity (i.e. token generation) without giving equal consideration to quality (e.g. availability, security, and compliance). The two go hand in hand.
Private AI Response
There are three questions that help prevent buyer’s remorse:
- Can I use this solution with the tools of my choice? Otherwise, I’ll need different people.
- Can I run other AI models and software on this? Otherwise, you may find yourself on the path to many disparate “AI islands” that you will have to operate independently at a higher cost.
- What is the real total cost once tools, usage and power are taken into account?
This brings us to the solution. VCF on compatible systems like HGX can virtualize capacity, unblock blocked resources, and restore the enterprise-grade controls customers expect in the first place.
Broadcom anticipated this change years ago. We believed that AI models would move to where the data resides and that inference, not training, would become the dominant use case in business. The features we built at the time may have seemed boring, but the strategy was right. Private AI is no longer just a Broadcom term; the industry adopted it. Indeed, without automation, resilience, privacy and compliance, AI is not production AI. This is a research project. Together we can do better than this.