Blog | Embedl

The cost of running frontier AI models

Written by Embedl | Jan 14, 2026 8:09:32 AM

Research groups pushing the limits of artificial intelligence are running into a new kind of barrier. The economic cost and energy necessary to develop and run state-of-the-art AI models are one of the pressing issues in this area. Model development is not only expensive, but after training, inference and serving expenses grow quickly when the model is deployed at a large scale.

For instance, Kimi K2 Thinking from Moonshot AI, a firm backed by Alibaba, was stated to incur $4.6 million in training costs, which is still much less compared to most leading-edge models, but depending on usage, the inference costs for such a model could quickly add up, making it infeasible to run such models locally. A part of this cost is building and running large datacenters that consume huge amounts of power and require initial investments. As an example, OpenAI might be spending almost 50% of its revenue on inference costs in Q1 2025. Similar numbers from Anthropic and the rising cost for subscriptions to AI coding tools like Cursor show how inference costs can quickly add up.  

Meanwhile, leaders in the AI industry have begun to acknowledge just how extreme these costs have become. Dario Amodei of Anthropic has noted that a single cutting-edge training effort this year could approach a billion dollars, with several more projects in the near future potentially multiplying that figure to over $10 billion. If this trajectory continues, only organizations with extraordinary financial resources will be able to attempt the most ambitious forms of AI research, especially those that rely on training models at unprecedented scale.

Progress in modern AI, therefore, seemingly depends on huge amounts of computation, from training ever larger systems on massive datasets. 


How Does Running Frontier-Scale LLMs In-House Affect Cost?

 

The Cost of Training

If current spending patterns hold, the priciest AI system disclosed to the public around early 2027 will require roughly one billion dollars to train. Since 2016, the combined hardware and energy expenses associated with training top-tier models have risen at an average pace of about 2.5x per year. This trend comes from looking at the costs in terms of long-term hardware investment, along with the power needed to run it. The steady climb in these expenses shows how quickly money is pouring into advanced AI development.

 

The Cost of Inference Will Keep Rising

The financial cost curve of running your own AI models is not fully visible during the training process. It manifests fully once you deploy the model, and the actual interaction with users at real-time starts. Every inference with the AI model requires a computation process that takes place silently in the backend of the application, be it answering a question, creating text, retrieving information, or accomplishing a task, and this adds up to the inference cost.

When the usage is low, the costs are not so high, almost invisible, but with the rising usage and thousands of queries rolling into the application every minute, the spending graph begins to rise much quicker than anticipated. As a result, inference becomes no longer just another operating expense to be managed but an increasing subtraction from the bottom line with every success.

A figure from Microsoft puts this reality into context. It was reported that for the first three quarters of last year, OpenAI spent $8.7 billion on Azure inference, and this expenditure did not include any model training or research development. This spending added up to the mundane but necessary work of actually being able to serve and function.

 

Why AI Inference Is More Expensive Than It Looks

Inference is often viewed as the cost-effective part of the AI ecosystem, but such notions are likely to falter when the systems are actually used. As users continue adopting your AI model, compute expenditures will escalate with the increased usage, with intense GPU utilization, large memory requirements, and even auto-scaling based on traffic volumes instead of being guided by budget. Other than the rising compute demands, "tech debt" also begins to build, due to issues related to data quality, model retraining, and adapting processes according to the output from an AI solution.

Running these systems entails a continuous overhead of MLOps, monitoring, and debugging, among other compliance tasks, which continues even after the system's first deployment. The cost also arises from the storage of the data, the transfer of the data, the management of the pipeline, and the lack of qualified AI expertise, in addition to the cost of governance.

 

AI Accelerator Chips Account for Almost 50% of the Amortized Hardware

When you look at how long-term hardware spending and electricity use are distributed in large-scale AI training, nearly half of that money ends up going toward the accelerator chips that carry out the actual computation. The rest is split among several other pieces of the stack.

The servers that hold those chips, along with the margin added by vendors, make up approximately 29%. High-speed networking across the cluster takes another sizable share, approximately 17%. The remaining slice belongs to electricity costs, which typically sit in the single digits (9%) but rise quickly as each generation of models demands more power.

 

The Cost of Acquiring the Hardware

When discussing the price of the machines used to train advanced models, it helps to separate two very different cost figures. One is the long-term, spread-out cost of using the hardware over its effective lifetime. The other is the cash required to buy the equipment in the first place.

Which number matters depends on what you are trying to evaluate. The amortized figure is useful for understanding how expensive it is to run and maintain models over time. The acquisition figure reveals the financial hurdles and risks a developer faces before any training can even begin. Once training the model is complete, inference cost will dominate, where optimizations can play a huge role in reducing costs.

Because the purchase price reflects the full cost of building out the cluster upfront, it is vastly higher than the amount represented by the amortized accounting. In many cases, the gap stretches widely. As an example, the machines behind GPT4 likely required close to $800 million to procure, even though the portion of that cost attributed to the training run itself, when spread over the hardware’s operating life along with power usage, was estimated at $40 million.

 

How Significant are R&D Costs in Running Frontier AI Models?

When you look at the full life cycle of building a frontier-scale model, the people doing the research and engineering represent a major share of the overall expense. Once stock compensation is counted, labor accounts for roughly 30% to 50% of the total cost, depending on the system. Without equity included, that slice becomes smaller but still significant.

How Does Embedl SDK Help Address this Cost Barrier?

Embedl SDK reduces the cost of building and deploying AI by making models more efficient and easier to run on existing hardware. It optimizes networks in a hardware-aware way, shrinking memory needs, cutting energy use, and reducing computation without hurting accuracy. This means teams can reduce costs during inference by compressing advanced models, and for small to medium-sized models, Embedl’s optimization can enable AI to run on edge devices, further reducing reliance on power-hungry datacenter GPUs.

The SDK also accelerates development by automatically performing tasks that would otherwise require months of manual tuning. Instead of weeks, engineers can have optimized models in hours, reducing labor costs and shortening the path to deployment.

Embedl SDK creates deployment-ready executables that run efficiently across CPUs, GPUs, embedded processors, and AI accelerators, allowing companies to provide high-performance AI without having to invest in expensive new hardware. Overall, it lowers upfront costs and operating costs, making sophisticated AI more accessible to teams at companies without billion-dollar budgets.