top of page
  • Writer's pictureRicardo Rosales

Demystifying AI Agents: A guide to understanding and building them

Updated: Feb 29





What are AI Agents?


An agent is an application designed to understand, interpret, and engage with its surroundings independently in a goal-driven manner. The decision to build an application as an agent stems from the need for a system that's autonomous, flexible, and adept at navigating the complexity and dynamism of real-world environments.


Leveraging advancements in Generative AI, particularly large language models (LLMs), paves the way for the development of smarter agents. These agents boast the ability to reason, make decisions, and act towards achieving specific objectives with sufficient accuracy for production use across various domains.


Agents have made significant impacts across industries.

  • In customer service, chatbots and virtual assistants provide real-time, personalized responses.

  • In finance, algorithmic trading agents analyze vast amounts of market data to execute trades at optimal times.

  • In transportation, autonomous vehicles rely on agents to navigate and make split-second decisions in unpredictable traffic conditions.

These examples show how agents are outperforming traditional software approaches in environments that require quick adaptation, strategic thinking, and ongoing learning.


LLMs allow agents to digest and interpret large volumes of unstructured and structured data, engage in complex dialogues with users, and produce content or decisions that resonate with human intuition. Integrating language models into agents signals a significant pivot towards building more intelligent, versatile, and human-like applications. These applications have the ability to perform tasks that require deep understanding and creativity, thereby opening up new avenues for automation and AI assistance across various fields.


The concept of an agent is not new, but developments in the generative AI field have allowed them to be more autonomous and intelligent than ever. What used to be intricately-crafted logic can now be replaced with LLMs capable of handling a range of tasks.


How to build an Agent: Key Components & Considerations


An agent primarily consists of two fundamental elements: a reasoning engine for processing and decision-making and a suite of interaction tools.



Elements that help agents think and operate


Reasoning Engine


The reasoning engine is the “brain” of the agent, enabling the functionalities that make the agent intelligent. This engine consists of two distinct layers:

  1. The metacognition layer controls how the agent thinks, and acts to structure the reasoning done by the agent. This could be hardcoded logic that describes a list of sequential steps, or could be a prompt that tells an LLM how to approach problems.

  2. The cognition layer is the core of the agent's thought process, where actual reasoning occurs, and might look like a call to the OpenAI API with a prompt designed to analyze data.


Here, we explore various implementations of the reasoning engine, highlighting the strengths and limitations of common approaches.


Finite state machines

An FSM defines actions based on the current state, accounting for all possible states. With each new observation, the state is updated. By utilizing LLMs to generate reasoning traces for interpreting observations, the dimensionality of potential states is reduced, rendering the complexity more manageable.


Strengths:

  • Can restrict behavior to a defined set

  • Can easily codify fallback logic when observations cannot be interpreted

  • Can execute tool runs in parallel

Drawbacks:

  • Incorrectly classified observations can lead to non-sensical actions

  • Coordinating asynchronous tool runs is complex since you have to implement waiting states and handle them appropriately


The FSM implementation is well suited for use cases that have a well defined set of actions and observations but have fuzzy inputs. This implementation is particularly useful when the cost of a bad response is higher. For example, a customer service agent might be designed to support a few types of questions very well and have specialized tooling for each of the questions. However, the agent needs to be able to respond based on natural text inputs that can vary widely across users.


ReAct model

ReAct is a new method in language models that blends reasoning and action. It helps these models think and plan actions simultaneously, improving their ability to answer questions, make decisions, and take actions. For more information, you can refer to the initial paper here.

ReAct agents require a well engineered system prompt, designed to take turns emitting reasoning traces based on observations and actions based on reasoning traces. A convergence criteria and the corresponding response format are also defined in the system prompt.


Strengths:

  • Do not have to explicitly define the full set of possible states and transitions

Drawbacks:

  • Requires user query to have a well-defined goal

  • Lacks the ability to explain the model used to arrive at decisions

  • Observations and reasoning traces must be processed sequentially


The ReAct implementation is well suited for use cases that do not have a well-defined set of actions and observations. This implementation is particularly useful when you want to address a wide variety of use cases using the same agent. For example, a research assistant agent can be designed to answer general questions about any topic and have generalized tools for extracting information from research articles and the internet.


More innovation is required in the implementation of reasoning engines for agents given that the two typical approaches are not suitable for all use cases. The Aineko framework provides us a powerful tool to experiment with new approaches. We look forward to sharing our discoveries on this front.


Tools


Tools equip agents with the ability to engage with their environment. Linked directly to the reasoning engine, they serve as the agent's “actuators”. Below are typical tools integrated into agents.


  • Web search - This allows the agent to query the internet for information it might not know.

  • API connections - This allows for programmatic access and interaction with other services across the internet.

  • Data integrations - This allows the agent to read and write data from data sources such as vector databases, CRM record entries, customer data platforms, etc.

  • Other Agents - This allows the agent to interact with other agents that are specially designed for a specific task.


Many more tools are being created as builders in the space aim to make data and services easily accessible. Implementations for tools can range in complexity, from a simple API call to stateful long-running interactions with agents. The Aineko framework provides us the flexibility to implement any tool we can build in Python and chain them together in a way that is both scalable and resilient.


What to Consider for a “Production-Ready” Agent?


Given that agents rely on LLMs for critical functionality, deploying agents to production environments requires model responses to fulfill metrics along the following dimensions:


  • Performance - How well does the model address a broad spectrum of prompts?

  • Reliability - Can the model reliably provide satisfactory answers to repeated prompts?

  • Governance - Is there enough oversight and understanding of the model to ensure its trustworthiness?


From experiments and discussions with developers, we’ve found performance to be the most pressing issue. Performant LLM interactions can be the difference between making a use case viable and rendering an application unusable.


An upcoming article will focus on performance — exploring considerations and outlining strategies to enhance the efficiency of LLMs within a production environment.



We welcome you to share your insights with us, and if you're seeking solutions, we’d love to hear from you. Contact us at support@aineko.dev


165 views0 comments

Recent Posts

See All

Comentários


bottom of page