A new paper1 from a group at Stanford has set up new metrics to evaluate the energy efficiency of AI models: intelligence per watt. How much performance do AI models deliver per unit of energy consumed? They introduce metrics tailored to both edge and cloud devices:
They evaluate the performance of models and hardware across various query sets, ranging from knowledge question answering to reasoning tasks. For a given model and hardware pair (m, h) and query q, they measure
Given these measurements, they define power-based, which measures efficiency relative to instantaneous power draw:
They also define energy-based metrics, which measure efficiency relative to total energy consumed per query:
They ask:
For the first question, the conclusion is that intelligence efficiency is improving rapidly and predictably! From 2023-2025, there was a 5.3× overall increase, with model improvements yielding a 3.1× gain in accuracy per watt while accelerator improvements provided a 1.7× gain.
For the second question, the study shows that local models on edge devices are fast catching up to frontier models on cloud: 88.7% of queries can be successfully handled by small local models as of October 2025, with coverage varying by domain—exceeding 90% for creative tasks (e.g., Arts & Media) but dropping to 68% for technical fields (e.g., Architecture & Engineering). Longitudinal analysis shows consistent improvement: the best local LM matched frontier model quality on 23.2% of queries in 2023, 48.7% in 2024, and 71.3% in 2025—a 3.1× increase over two years.
For the third question, the study shows that one can achieve substantial savings by routing queries to local versus cloud infrastructure appropriately: it can reduce energy consumption by 80.4%, compute usage by 77.3%, and cost by 73.8% compared to a cloud-only deployment. Moreover, the routing need not be perfect to realize substantial savings while maintaining task quality: a routing system with 80% accuracy (correctly assigning 80% of queries to local vs. cloud) captures 80% of the theoretical maximum gains, achieving 64.3% energy reduction, 61.8% compute reduction, and 59.0% cost reduction with no degradation in answer quality.
For the moment, cloud devices still maintain an advantage over edge devices in terms of intelligence per watt, due to specially optimized hardware. However, this efficiency disadvantage is offset by complementary system-level benefits: local deployment avoids data center infrastructure costs and enables 88.7% of queries that local models can handle to avoid cloud compute entirely, yielding 60−80% resource reductions.
Moreover, as we continue to optimize edge devices via Embedl’s technologies, the gap is likely to reduce rapidly. Overall, the study shows that edge devices will be increasingly important for businesses to deploy to realize substantial savings in energy, compute and infrastructure costs. In future posts, we will discuss some of the innovations that Emdedl’s research is bringing to efficient edge AI.
1 https://arxiv.org/pdf/2511.07885