从零开始构建一个极简的 AI agent

第一个原型，共 50 行代码

让我们开始吧：从顶层的角度看，AI 代理就是一个大循环：你从提示开始，代理提出一个动作，你执行动作，告诉 LM 输出，然后重复。为了跟踪发生的事情，我们会继续添加到消息列表。

伪代码：

messages = [{"role": "user", "content": "Help me fix the ValueError in main.py"}]
while True:
    lm_output = query_lm(messages)
    print("LM output", output)
    messages.append({"role": "assistant", "content": lm_output})  # remember what the LM said
    action = parse_action(lm_output)  # separate the action from output
    print("Action", action)
    if action == "exit":
        break
    output = execute_action(action)
    print("Output", output)
    messages.append({"role": "user", "content": output})  # send command output back

所以要让它正常工作，我们只需要实现三件事：

查询 LM API（如果你想支持所有 LM，或者想要详细的成本信息，这可能会有点烦人，但如果你已经知道想要哪个型号，这会很简单）
解析动作（parse_action）。如果你使用 LM 支持的工具调用功能，那就不需要这个功能，但这更具体的是提供者，所以我们暂时不会在本指南中详细介绍（别担心，性能不会因此受到影响）。
执行该动作（非常简单，在我们的情况下，只需在终端中以 bash 命令形式执行 LM 的任何动作）。

查询LM

通常我们使用openai。也可以选择Anthropic、OpenRouter、LiteLLM（支持大多数特定的 LM）、GLM。

1	`pip install openai`

以下是查询 API 的最小代码：

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here"
)  # or set OPENAI_API_KEY env var

def query_lm(messages):
    response = client.responses.create(
        model="gpt-5.1",
        input=messages
    )
    return response.output_text

解析动作

让我们来分析一下动作。LM 有两种简单的方式可以“编码”作（同样，如果你用工具调用，这个方法就不需要，但本教程我们会简化）：

Triple-backticks 三回溯跳：

Some thoughts of the LM explaining the action and the action below

```bash-action
cd /path/to/project && ls
```

XML 风格：

1
2
3

Some thoughts of the LM explaining the action and the action below

<bash_action>cd /path/to/project && ls</bash_action>

对于大多数模型，两种方式都很好，我们建议使用三重反向跳动。不过，有些模型（尤其是小型或开源模型）稍微不那么通用，你可以试试任一。这里有一个快速的正则表达式来解析该动作：

Triple-backticks 三回溯跳：

import re

def parse_action(lm_output: str) -> str:
    """Take LM output, return action"""
    matches = re.findall(
        r"```bash-action\s*\n(.*?)\n```", 
        lm_output, 
        re.DOTALL
    )
    return matches[0].strip() if matches else ""

XML 风格：

import re

def parse_action(lm_output: str) -> str:
    """Take LM output, return action"""
    matches = re.findall(
        r"<bash_action>(.*?)</bash_action>", 
        lm_output, 
        re.DOTALL
    )
    return matches[0].strip() if matches else ""

执行行动

至于执行动作，其实很简单，我们可以直接用 Python 的subprocess模块（或者直接用 os.system，虽然一般不太推荐）。

import subprocess
import os

def execute_action(command: str) -> str:
    """Execute action, return output"""
    result = subprocess.run(
        command,
        shell=True,
        text=True,
        env=os.environ,
        encoding="utf-8",
        errors="replace",
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        timeout=30,
    )
    return result.stdout

这有几个限制：

代理无法切换到不同的环境
代理无法轻易持久化环境变量

然而，实际上我们发现这些限制并不算限制性。事实上，减少隐藏状态并强制智能体使用绝对路径，在许多情况下对语言模型很有帮助。ClaudeCode 也类似（虽然它可以更改目录，但无法持久化环境变量，因为它同样使用subshells 执行命令）。

添加系统提示

我们还需要告诉 LM 更多关于该如何行为的事：

messages = [{
    "role": "system", 
    "content": "You are a helpful assistant. When you want to run a command, wrap it in ```bash-action\n<command>\n```. To finish, run the exit command."
}
]

让我们把它组装起来并运行起来吧！

你现在应该有类似这样的代码（这个例子用了 litellm + 三重回溯）：

import re
import subprocess
import os
from litellm import completion

def query_lm(messages: list[dict[str, str]]) -> str:
    response = completion(
        model="openai/gpt-5.1",
        messages=messages
    )
    return response.choices[0].message.content

def parse_action(lm_output: str) -> str:
    """Take LM output, return action"""
    matches = re.findall(
        r"```bash-action\s*\n(.*?)\n```", 
        lm_output, 
        re.DOTALL
    )
    return matches[0].strip() if matches else ""

def execute_action(command: str) -> str:
    """Execute action, return output"""
    result = subprocess.run(
        command,
        shell=True,
        text=True,
        env=os.environ,
        encoding="utf-8",
        errors="replace",
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        timeout=30,
    )
    return result.stdout

# Main agent loop
messages = [{
    "role": "system", 
    "content": "You are a helpful assistant. When you want to run a command, wrap it in ```bash-action\n<command>\n```. To finish, run the exit command."
}, {
    "role": "user", 
    "content": "List the files in the current directory"
}]

while True:
    lm_output = query_lm(messages)
    print("LM output", lm_output)
    messages.append({"role": "assistant", "content": lm_output})  # remember what the LM said
    action = parse_action(lm_output)  # separate the action from output
    print("Action", action)
    if action == "exit":
        break
    output = execute_action(action)
    print("Output", output)
    messages.append({"role": "user", "content": output})  # send command output back

让我们让它更robust

以下部分是为提升性能所做的调整。没有什么花哨的，只是确保代理不会卡壳，并且能够处理出错的问题。这一部分稍微进阶一些。我们鼓励大家不要在结尾展示完整代码，而是查看我们mini代理的源代码 ;它包含了所有这些功能，几乎没有附加内容（另请参见下一节开始阅读代码）。

处理控制流程中的例外

这里的想法是：每当出现已知异常（超时、格式错误等），我们就告诉 LM，让它自己处理。这意味着要稍微调整我们的 while循环：

while True:
    try:
        # previous content
    except Exception as e:
        messages.append({"role": "user", "content": str(e)})

就是这样！

例如，如果代理做了蠢事（比如调用 vim），触发了TimeoutError，这会导致错误消息附加到消息后面，LM 可以从中接手，希望能意识到自己哪里做错了。

不过，我们可能只会将这种行为限制在一些已知问题上，或者在信息中添加更多信息。在这种情况下，我们可以更具体地说，比如

class OurTimeoutError(RuntimeError): ...

def execute_action(action: str) -> str:
    try:
        # as before
    except TimeoutError as e:
        raise OurTimeoutError("Your last command time out, you might want to ...") from e

就这样，我们为LM添加了更多信息。

你也可以更具体地说明哪些例外是交给 LM 的，哪些只会导致程序崩溃。在这种情况下，定义一个自定义异常类，只在 while 循环中捕捉它可能是合理的：

class NonterminatingException(RuntimeError): ...
class OurTimeoutError(NonterminatingException): ...

while True:
    try: 
        ...
    except NonterminatingException as e:
        ...

mini-swe-agent 还定义了一个 TerminatingException 类，用于替代 if action == “exit” 机制，以优雅的方式停止while 循环：

class TerminatingException(RuntimeError): ...
class Submitted: ...  # agent wants to stop

def execute_action(action: str) -> str:
    if action == "exit":
        raise TerminatingException("LM requested to quit")
    ...

while True:
    try:
        ...
    except NonterminatingException as e:
        ...
    except TerminatingException as e:
        print("Stopping because of ", str(e))
        break

处理格式错误的输出

有时（尤其是弱的LM），L,不会正确格式化行动。在这种情况下提醒它正确的方法会很好：现在我们有了通用例外处理，这应该非常简单：

incorrect_format_message = """Your output was malformated.
Please include exactly 1 action formatted as in the following example:

```bash-action
ls -R
```
"""
class FormatError(RuntimeError): ...

def parse_action(action: str) -> str:
   matches = ...
   if not len(matches) == 1:
       raise FormatError(incorrect_format_message)
   ...

环境变量

我们可以设置几个环境变量，在命令行工具中禁用交互元素，避免代理卡住（你可以在 mini-swe-agent 的 SWE-bench 配置中看到它们的设置）：

env_vars = {
    "PAGER": "cat",
    "MANPAGER": "cat",
    "LESS": "-R",
    "PIP_PROGRESS_BAR": "off",
    "TQDM_DISABLE": "1",
}

# ...

def execute_action(command: str) -> str:
    # ...
    result = subprocess.run(
        command,
        # ...
        env=os.environ | env_vars
        # ...
)

mini-swe-agent

Mini-SWE-Agent 完全按照这个教程的蓝图构建，你应该很容易理解它的源代码。唯一重要的是它更模块化，可以更换所有组件。

Agent 类（完整代码）包含 run 函数中的大 while 循环

class Agent:
    def __init__(self, model, environment):
        self.model = model
        self.environment = environment
        ... 

    def run(self, task: str):
        while True:
            ...

模型类（litellm 的例子）处理不同的 LM

1
2
3

class Model:
    def query(messages: list[dict[str, str]]):
        ...

环境类（本地环境）执行：

1
2
3

class Environment:
    def execute(command: str):
        ...

mini-swe-agent 提供了不同的环境类，例如允许在 Docker 容器中执行动作，而不必直接在本地环境中执行。还有更复杂的问题吗？其实并不是：我们所做的只是从 subprocess.run 切换到调用 docker exec。

#深度学习 #大模型

从零开始构建一个极简的 AI agent

https://lijianxiong.space/2026/20260217/

作者

LJX

发布于

2026年2月17日

许可协议

Learning to Reason in 13 Parameters 下一篇