AEnvironment Python SDK Guide#
In reinforcement learning (RL) training phases, the environment is an indispensable key factor. High-quality, rapidly extensible environments can significantly improve the efficiency and quality of RL training. To help developers quickly build required environments, we provide a complete development SDK that enables efficient environment creation and management.
Environment Concept: An environment is a containerized sandbox that can integrate any form of tools or code, providing standardized tool and reward services externally to achieve rapid delivery of production-grade environments.
Quick Start#
Install SDK#
pip install aenvironment
Verify Installation#
import aenv
print(aenv.__version__)
Run Environment Instance#
Runtime Mode Selection#
Local Mode#
In local mode, you need to start the environment execution sandbox via aenv run and set environment variables:
export DUMMY_INSTANCE_IP=http://localhost
Remote Mode#
In remote mode, you need to deploy components first, obtain the service address and configure:
export AENV_SYSTEM_URL=http://service
Start Environment Instance#
mini-terminal code example
import asyncio
from aenv import Environment
async def main():
# Create environment instance
mini_terminal = Environment("mini-terminal@1.0.1", ttl="60m")
try:
# Get available tools list
tools = await mini_terminal.list_tools()
print("Successfully obtained tool list:", tools)
assert tools is not None
except Exception as e:
print(
"Test completed - environment created successfully, but tool list may be empty:",
str(e),
)
# Interactive command execution
while True:
try:
user_input = input(">>> ").strip()
if user_input.lower() in ("exit", "quit"):
print("Exiting interactive mode")
break
# Call remote tool to execute command
result = await mini_terminal.call_tool(
"mini-terminal@1.0.1/execute_command",
{"command": user_input, "timeout": 5}
)
print("Execution result:\n", result)
print("-" * 60)
except KeyboardInterrupt:
print("\nInterrupt detected, enter 'exit' to quit or continue input")
await mini_terminal.release()
print("Environment successfully released")
if __name__ == "__main__":
asyncio.run(main())
Environment Execution Demo#

Environment Development#
Tool Registration#
Use the @register_tool decorator to quickly register existing Python methods as environment tools. Registered tools will ultimately be provided as MCP Server services.
How It Works#
Start corresponding containers when creating environment instances
Launch FastMCP Server within containers
Register all tools to MCP Server
Unified invocation through MCP standard protocol
Development Example#
Example code:
from typing import Dict, Any
from aenv import register_tool
@register_tool
def my_custom_echo_env(content: str) -> Dict[str, Any]:
"""
Custom environment tool description.
This tool receives any string content and returns it as-is output, used to verify environment integration effects.
Args:
content: Any content to echo
Returns:
Dictionary containing results in format {"result": "original content"}
"""
return {"result": f"{content}"}
Parameter Specification#
Input Parameters#
Input parameter Schema must be consistent with MCP Tool input Schema, mainly supporting basic data types and avoiding complex nested structures.
Supported Types:
Basic types:
str,int,float,boolLists:
list[str],list[int], etc.Simple Pydantic models
Optional types:
Optional[str]Default parameter values
Not Recommended or Limited Types:
Deep nested structures
Complex generic types
Overly complex custom validation logic
Dynamically generated models
Example:
from aenv import register_tool
mcp = FastMCP("Example")
# ✅ Recommended type usage
@register_tool
def simple_types(
name: str, # string
count: int, # integer
price: float, # float
enabled: bool, # boolean
tags: list[str] # string list
) -> str:
return "OK"
# ✅ Using Pydantic models
class UserQuery(BaseModel):
query: str
max_results: int = 10
include_details: bool = False
@register_tool
def search_users(params: UserQuery) -> str:
return f"Searching: {params.query}"
Return Value Processing#
When calling environment instances, pay attention to correct parsing of return values. We have transformed the original mcp.types.CallToolResult type, retaining only core data.
Data Transformation Logic:
# MCP original result format
class CallToolResult(Result):
is_error: Boolean = False
content: List[Dict[str, Any]]
# AEnvironment transformed format
class ToolResult:
is_error: Boolean = False
content: List[Dict[str, Any]]
# Transformation process
content: List[Dict[str, Any]] = []
for item in mcp_call_results.content:
if hasattr(item, "text") and item.text:
content.append({"type": "text", "text": item.text})
elif hasattr(item, "type") and hasattr(item, "data"):
content.append({"type": item.type, "data": item.data})
else:
content.append({"type": "text", "text": str(item)})
return ToolResult(content=content, is_error=result.isError)
Calling Example:
result = await mini_terminal.call_tool(
"env_name@1.0.1/execute_command",
{"input": input, "timeout": 5}
)
# Execution log
20251212-14:47:57.174 aenv.environment INFO: [ENV:mini-terminal-f3e28p] Executing tool: execute_command in environment mini-terminal@1.0.1
# Return result
Execution result:
ToolResult(content=[{
'type': 'text',
'text': '{"output":["helloworld\n",true],"returncode":0}'
}], is_error=False)
Tool Discovery#
Query available tools in the current environment via the list_tools interface:
# 1. Get all tools
tools = await mini_terminal.list_tools()
# 2. Tool details example
[
{
"name": "mini-terminal@1.0.1/execute_command",
"description": null,
"inputSchema": {
"properties": {
"command": {"type": "string"},
"timeout": {"default": 60, "title": "Timeout"}
},
"required": ["command"],
"type": "object"
}
}
]
Reward Functions#
Reward functions are mainly used in reinforcement learning training processes to evaluate whether the impact of agent actions on environment states meets expectations. They are typically divided into positive rewards and negative rewards.
Since they need to perceive environmental changes, reward functions are closely coupled with the environment and therefore implemented as part of the environment. Use the @register_reward decorator to register Python methods as reward functions.
from aenv import register_reward
@register_reward
def evaluate_code_quality(code: str, test_results: dict) -> dict:
"""
Reward function for evaluating code quality.
Comprehensive scoring based on test results, code length, and code standards.
Args:
code: Code string to evaluate
test_results: Test result dictionary containing pass/fail statistics
Returns:
Dictionary containing score, feedback, and detailed information
"""
score = 0.0
feedback = []
# Score based on test results
if test_results.get("passed", 0) > 0:
score += 0.5
feedback.append("Tests passed")
# Score based on code length
if len(code) < 1000:
score += 0.3
feedback.append("Code concise")
# Score based on code standards
if "def " in code and "import " in code:
score += 0.2
feedback.append("Good structure")
return {
"score": score,
"feedback": "; ".join(feedback),
"details": test_results
}
# Use reward function
reward = await env.call_reward({
"code": "def hello(): return 'world'",
"test_results": {"passed": 5, "failed": 0}
})
Function Registration#
Functions serve as extension points for the environment. Use the @register_function decorator to register any Python function to the environment. Registered functions essentially become endpoints of HTTP services within environment-associated containers, providing services externally via HTTP.
from aenv import register_function
@register_function
def custom_endpoint(data: dict) -> dict:
"""Custom HTTP endpoint function"""
return {"status": "success", "data": data}
Health Checks#
You can customize environment health check logic according to specific scenarios. Use the @register_health decorator to implement.
from aenv import register_health
@register_health
def system_health_check() -> dict:
"""
System health check function.
Check CPU, memory, disk usage, and system running status.
Returns:
Dictionary containing various health indicators
"""
import psutil
return {
"status": "healthy",
"cpu_percent": psutil.cpu_percent(),
"memory_percent": psutil.virtual_memory().percent,
"disk_usage": psutil.disk_usage("/").percent,
"uptime": "Running normally"
}
# Execute health check
health = await env.check_health()
Environment Usage#
Basic Usage#
import asyncio
from aenv import Environment
async def basic_example():
# Create environment instance
env = Environment("my-python-env")
# Initialize environment
await env.initialize()
# Use environment
tools = await env.list_tools()
print(f"Available tools: {len(tools)}")
# Destroy environment
await env.destroy()
# Run example
asyncio.run(basic_example())
Execution Flow#
Instantiate Environment
User requests to create specific type environment (e.g., “trading-env”)
Environment scheduler starts corresponding Docker containers based on preset image mapping
FastMCP service automatically starts within containers, loading environment-specific tool sets and reward functions
Tool Invocation
User calls via environment instance:
env.call_tool("analyze_market", data)
Internal Routing
Environment manager forwards requests to container corresponding endpoints (e.g.,
http://localhost:8000/tools/analyze_market)
Service Execution
FastMCP service within container receives and parses requests
Execute corresponding tool logic or reward calculation
Result Return
Execution results return via HTTP response
Constructor Parameters#
Parameter Name |
Type |
Default |
Description |
Use Case |
|---|---|---|---|---|
|
str |
Required |
Environment name |
Identify environment instance |
|
str |
|
Data source path |
Mount data volumes |
|
str |
|
Lifecycle |
Auto-destruction time |
|
dict |
|
Environment variables |
Configure runtime environment |
|
list |
|
Startup arguments |
Container startup parameters |
|
str |
|
Service address |
AEnvironment platform |
|
float |
|
Request timeout |
Network request timeout |
|
int |
|
Max retries |
Failure retry count |
|
str |
|
API key |
Authentication |
Recommended Usage: Context Manager#
Using context managers ensures proper initialization and automatic cleanup of environment resources:
import asyncio
from aenv import Environment
async def recommended_usage():
# Recommended: automatic initialization and destruction
async with Environment("safe-env") as env:
# Environment automatically initialized
tools = await env.list_tools()
# Execute tools
result = await env.call_tool("python", {
"code": "print('Hello from AEnvironment!')"
})
print(result.content)
# Automatically destroys environment on exit
asyncio.run(recommended_usage())
Advanced Configuration#
Complete Configuration Example#
env = Environment(
env_name="data-analysis-env", # Environment name
datasource="/data/workspace", # Data source path
ttl="2h", # Lifecycle 2 hours
environment_variables={
"PYTHONPATH": "/app",
"MODEL_PATH": "/models/bert",
"API_KEY": "sk-xxx"
},
arguments=["--verbose", "--gpu"], # Startup arguments
aenv_url="http://localhost:8080", # AEnvironment service address
timeout=60.0, # Timeout (seconds)
max_retries=5, # Max retry count
api_key="your-api-key" # API key
)
Dataset Specification#
env = Environment(
env_name="swe-env",
datasource="/path/to/dataset"
)
Environment Variable Injection#
Pass environment variables to containers:
env = Environment(
env_name="my-env",
environment_variables={
"API_KEY": "secret-key",
"DEBUG": "true",
"LOG_LEVEL": "info"
}
)
Startup Parameter Configuration#
Specify startup arguments for custom environments:
env = Environment(
env_name="my-env",
arguments=["--config", "/app/config.yaml", "--verbose"]
)
Lifecycle Management#
Environment instances have a default lifecycle of 30 minutes, after which the system automatically recycles them. Can be configured as follows:
1. Parameter Configuration (Highest Priority)#
# 30 minutes (default)
env = Environment("my-env", ttl="30m")
# 2 hours
env = Environment("my-env", ttl="2h")
# 1 day
env = Environment("my-env", ttl="24h")
2. Global Configuration#
Configure uniformly in config.json, affecting all instances created by this environment:
{
"deployConfig": {
"ttl": "2h"
}
}
Configuration Priority: Parameter Configuration > Global Configuration > System Default
FAQ#
Q1: Do environment instances need to be actively released after creation?#
Recommendation: Environment instances should be actively released after use to avoid resource waste.
Best Practices:
Short-term tasks: Set reasonable TTL for automatic recycling
Long-term tasks: Use context managers to ensure resource cleanup
Interactive use: Explicitly call
await env.release()
Evaluation Principle: Assess reasonable TTL values based on specific usage scenarios, balancing resource utilization and user experience.