NVIDIA’s AI Factories and Agentic Software Development


I normally look at large language model (LLM) software tools that I can test right away, but this article started because of software I wasn’t able to test. I’ve been looking at how Solver, a so-called “standalone” AI coding tool, has been working since our article a year ago. And I noticed it was gone… sold to NVIDIA. Solver was already using an agent-based approach last year, so it was ahead of its time. But what does NVIDIA want with this?

“NVIDIA is a hardware company” would normally have been enough to dampen my enthusiasm for delving into anything. But of course, NVIDIA’s chips are at the heart of the AI ​​revolution and, as a result, it was the first company in the world to be worth $4 trillion. So it’s worth looking into what it does, even if your interests lie in the world of software development.

NVIDIA has made a lot of purchases recently, including the type of investments in its own ecosystem that inflate stock prices without the revenue to truly support it. Further muddying the waters, NVIDIA also announced that it would be pumping $100 billion into OpenAI. But what is the different world he is betting on? The clue lies in the dusty term “vertical integration.”

Understanding Vertical Integration in AI

Let’s look at the typical agentic CLI setup that I look at regularly. I start with a terminal (usually Warp), then install the target agentic CLI application (e.g. Claude Code) there. During its execution, it may be interrogating LLM Claude Sonnet, communicating via Anthropic, which runs its models on AWS clouds. This is treating everything as an element of a string. You hope you get the chance to use the best in class, or at least the most affordable. The argument usually made is that once a model is trained, it can be run on cheaper chips and a commodity cloud, giving us a horizontal or modular market.

In comparison, vertical integration is what Apple does. It makes its own hardware, its own chips, its own operating system, and reluctantly allows users to write software applications for their own ecosystem. At the very least, it improves reliability and security.

What’s wrong with the horizontal AI market? In the age of the agent, we no longer ask a chatbot one-off questions. Several agents work in parallel, which complicates the relationship with the LLM(s). As founder and CEO of NVIDIA Jensen Huang He says the real resource limitation is no longer cost per chip; it’s about how many tokens you can generate per kilowatt, because the limiting factor now is power. No matter how you provide electricity, just pumping it consistently across a large site is a challenge. Soon, the limiting factor may well be water.

The NVIDIA advantage: the AI ​​Factory model

Therein lies NVIDIA’s advantage: having a variety of the most powerful chips that generate the most chips per kilowatt. To answer many types of queries (short or deep), NVIDIA has a software layer that optimizes how AI models run on potentially hundreds or even thousands of GPUs, dynamically allocating resources based on workload needs in their AI factories or data servers.

But we don’t really need to go into detail because the Apple MacBook’s built-in example is demonstration enough. The MacBook is simply more efficient because the operating system doesn’t need to make educated guesses about the hardware. It can shut down and restart instantly without mysterious “sleep” or “hibernation” mechanisms that work inconsistently. Ask any PC gamer they never know how well their latest game will run on their machine until they run it.

NVIDIA’s AI factories are intended for industrial placement. The idea is that they can train for a specific industry and then handle inference requests. But let’s imagine that a suitable factory is eventually available to downstream developers.

The future of agent startups and the role of NVIDIA

So how would a future agentic CLI work in an AI factory?

You’d probably have an account at your local “AI factory”, or perhaps a more comprehensive offering through NVIDIA. And there should only be one invoice, the size of which can be accurately tracked (or predicted) in real time.

If your code was in a repository visible to NVIDIA, this would allow analysis of the project and isolated branches for parallel execution. The benefit of owning GitHub belongs directly to Microsoft (and by extension, OpenAI). In any case, the project would be scanned locally and sent to the “AI factory” or read via a shared repository. One of the benefits of vertical integration is that your query goes through fewer hands. You obviously have to trust NVIDIA, but potentially no one else. NVIDIA needs to decide how much this matters to its customers.

NVIDIA may offer different service offerings depending on the range of chips available for your account. The wider the range offered, the easier it would be to transfer superficial agents to cheaper LLMs, which could allow the user to realize some of the savings.

Now, I would expect an API to be available, but the temptation to design (or buy) a Claude Code style agentic CLI would be too great. Of course, that’s where we started, talking about Solver.

Conclusion

What happens behind the scenes shouldn’t matter much to the software developer, but the benefits of the agentic era are closely related to the speed of token generation and parallel execution.

At some point, the initial bloom must reach winter. When that happens, NVIDIA will certainly be one of the companies best positioned to pick up the pieces. Of course, it does not have a relationship with the development community beyond the specialist sector. CUDA Platform. So if it happened this way, I would expect to see beta tools released while the agentic era is still enjoying its summer.


Band Created with Sketch.



Leave a Reply

Your email address will not be published. Required fields are marked *