2025 is shaping up to be the year of agentic AI. Identified as one of the most transformative technology shifts on the horizon, agentic AI is pushing artificial intelligence beyond reactive assistance and into purposeful action.1 These systems introduce a new level of agency – AI that not only responds to prompts, but actively perceives, plans, and takes action to accomplish defined goals.
Powered by large language models (LLMs) as their cognitive engine, so called AI agents are gaining the ability to interact with their environment in increasingly human-like ways.2 With the introduction of communication enablers like Model Context Protocol (MCP)3, their ability to interact with tools and systems is accelerating rapidly.
However, running such intelligence at the edge remains a significant technical challenge – and its resolution will shape the future of agentic edge AI.
The Transition From Generative AI to AI Agents
Although generative AI and LLMs have captured much of the spotlight, they still operate within clear boundaries: generate responses, summarize documents, and answer questions – but only when prompted. The gap between these capabilities and true autonomous behavior remains wide. Agentic AI, on the other hand, refers to goal-driven software entities that are capable of adapting to their context and acting autonomously with minimal human supervision. Such agents are therefore capable of executing tasks in complex environments.1 This dramatically increases AI’s potential4, and bridges between generative AI and actions.
Figure 1: The AI Agency Gap.1
Rather than replacing generative AI, AI agents can leverage it. Thanks to the rapid advancement of LLMs, a growing number of LLM-based agents have emerged.2 These agents use LLMs as their brains to plan, remember, and explore possibilities, giving them human-like qualities. They have an added layer that enables them to interact with tools, applications, other models, and network systems in order to execute specific goals.4
For example, an AI agent tasked with booking transport for a vacation could start by identifying key travel requirements – such as dates, destination, group size, and budget. It would then use a generative model in conjunction with browsing the internet to explore available options, compare availability and pricing based on preferences, and select the most suitable fit. Once this reasoning process is complete, the agent would carry out the booking autonomously; connecting to the user’s calendar, querying airline APIs, and updating travel routes in real time – all without requiring human intervention.
How MCP Powers Actionable Intelligence
In order to act effectively, AI agents need to integrate with their environment. This requires infrastructure that allows them to understand context, access tools, and communicate with other systems. One of the key developments accelerating this is MCP. Even the most sophisticated models have long been limited by their isolation from external data. Information silos and outdated systems have meant that each new data source required custom integration, making it difficult to scale truly connected AI systems.5 MCP addresses this by introducing an open, lightweight protocol that standardizes how AI models communicate with external resources and how applications deliver context to LLMs – enabling implementation of agents as an extension of the LLM.3
Figure 2: MCP introduced to simplify communication between AI models and data sources
MCP can be compared to a USB-C port for AI applications. Just as USB-C offers a universal way to connect devices to peripherals and accessories, MCP provides a standardized interface for connecting AI models to diverse tools and data sources.3 In this way, MCP becomes a game-changing enabler for agentic AI that accelerates its practical deployment.
What Keeps Intelligent Agents Away From the Edge?
Despite their accelerating potential, LLM-based AI agents still face significant challenges before they can be widely deployed at the edge. In practice, these agents are often bound to inference servers, as their tasks typically rely on real-time information retrieval through external APIs. Furthermore, the computational demands of their brains; LLMs, often exceed the capabilities of edge devices, which are limited in compute power, storage, and memory, and often lack the energy capacity required to run large LLMs.6 As a result, LLM-based agents remain primarily on the cloud, not on the device, running functions on the device through the tools provided by an MCP.
There are strong advantages to deploying agentic models directly on edge devices. By running large language models (LLMs) locally, data can be processed immediately to eliminate the latency typically introduced by relying on cloud infrastructure. This is particularly important for applications that demand real-time decision-making and responsiveness, such as autonomous vehicles or IoT systems.
Edge-based processing also plays a critical role in preserving privacy. For example, consider an AI nurse assisting during a confidential consultation. If the model can process and act on the conversation locally, such as scheduling a follow-up, so that there is no need to transmit sensitive data to a remote server, significantly reducing privacy risks. Similarly, in security-sensitive contexts where network transmission might expose vulnerabilities, local inference enables agents to act independently while minimizing the risk of data leakage or corruption during transmission. However, this presents a fundamental paradox: although deploying agentic AI on the edge offers clear benefits in speed, autonomy, and privacy, current models are often too large and resource-intensive to fit within the constraints of edge hardware.
A Different Vision for Running Agentic AI
So how can AI agents be brought to the edge? One possible approach could be to move away from having a full LLM-based agent hosted on a centralized inference server – handling all perception, reasoning, and action remotely – and instead imagine a decentralized setup where small, specialized agents and LLMs are deployed directly on edge devices, each tightly coupled to the tools and data available locally and able to communicate with different LLMs coupled to other devices.
For example, imagine a smart camera equipped with its own local vision language model. This lightweight LLM can respond to questions like "how much of item A is left in the freezer?". A separate agent, perhaps on a local home server, queries the camera's LLM, interprets the results, and uses that information to compile a shopping list. Another agent connected to a phone could then take over and place an order via a delivery service. This would replace the need for one massive LLM-agent tied to an inference server that must handle the entire process; image interpretation, inventory analysis, planning, and execution.
Figure 3: The difference between centralized models and decentralized edge-ready setups
By distributing and deploying small, specialized agents and models directly on edge servers and devices, and enabling them to collaborate with one another, truly scalable agentic edge systems could be unlocked – realizing the benefits of running agentic AI at the edge.
Enabling Agentic Edge AI and Embedl’s Role
Distributing and integrating LLM-based agents into existing edge computing infrastructure presents several technical challenges. Key issues include ensuring compatibility with various operating systems and communication protocols, as well as optimizing the models enough to fit on edge devices without compromising performance.6 The development of MCP is a true enabler for the integration challenge. By providing a universal "language" for these interactions, each device can retrieve information via APIs, invoke local tools, and communicate with other agents seamlessly and efficiently.
With the integration challenge solved, a remaining main obstacle lies in model efficiency and size. This is where Embedl comes in – another critical enabler for making this decentralized agentic AI vision possible. By compressing and optimizing AI models to run efficiently on edge devices, Embedl reduces memory footprint, power consumption, and latency without compromising performance – a crucial step toward realizing agentic edge AI.
Embedl is not just solving a technical challenge – we are redefining the edge! Our vision is to empower the deployment of any model, on any hardware, anywhere. By bridging the gap between cutting-edge AI capabilities and the realities of edge environments, Embedl is building the foundation for a future where intelligent agents operate independently, responsibly, and everywhere they’re needed.
References
1: https://www.gartner.com/doc/reprints?id=1-2K8Y7LEY&ct=250212&st=sb
2: https://ieeexplore.ieee.org/document/10748304
3: https://modelcontextprotocol.io/introduction
4: https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality
5: https://www.anthropic.com/news/model-context-protocol
6: https://ieeexplore.ieee.org/document/10569285
Author