AEnvironment Architecture

AEnvironment Architecture#

AEnvironment is a distributed platform for managing AI agent environments at scale. This document provides a comprehensive overview of the system architecture, core components, sandbox engines, and how they work together.

System Architecture#

The AEnvironment platform is divided into two main domains: Development Side and Traffic Side. The Development Side handles environment creation and metadata storage, while the Traffic Side handles runtime execution and sandbox management.

Development Side#

        graph LR
    subgraph "Development Side"
        Developer[Environment Developer]
        CLI[AEnv CLI]
        EnvHub[envhub]
        CI[CI Optional]
        Redis[(redis)]
    end

    Developer -->|1. edit env| CLI
    CLI -->|2. push env| EnvHub
    EnvHub -.->|3. trigger CI| CI
    EnvHub -->|4. store env meta| Redis

The Development Side handles environment definition and metadata management:

Components:

Environment Developer: Developer who creates and manages environments
AEnv CLI: Command-line interface for pushing environment configurations
envhub: Central registry service (EnvHub) that stores environment definitions and metadata
CI (Optional): Optional Continuous Integration pipeline for building and testing environments
redis: Metadata storage for environment configurations

Workflow:

Edit Environment: Developer edits environment configuration using AEnv CLI
Push Environment: Developer pushes environment configuration to envhub via push env command
CI Integration (Optional): envhub can optionally trigger CI pipeline for building and testing
Metadata Storage: envhub stores environment metadata in redis for fast retrieval

Traffic Side#

        graph TB
    subgraph "Traffic Side"
        RL[RL/Agent]
        SDK[AEnv SDK]
        APIService[api service]
    end

    subgraph "K8S Engine"
        Controller[Controller]
        K8S[K8S]
    end

    subgraph "Other Engine"
        FutureEngine[Future Engines]
    end

    subgraph "Sandbox"
        SDKInSandbox[AEnv SDK]
        MCP[MCP Tools/Functions]
    end

    RL -->|1. create env instance| SDK
    SDK -->|2. request| APIService
    APIService -->|3. create sandbox| Controller
    APIService -.->|pull env meta| EnvHub
    Controller -->|4. schedule| K8S
    K8S -->|5. deploy| Sandbox
    APIService -->|6. create sandbox| FutureEngine
    FutureEngine -.->|deploy| Sandbox
    SDK -->|7. call tool/function| APIService
    APIService -->|8. proxy| SDKInSandbox
    SDKInSandbox -->|9. execute| MCP
    MCP -->|10. result| SDKInSandbox
    SDKInSandbox -->|11. result| APIService
    APIService -->|12. tool result| SDK

The Traffic Side handles runtime execution and sandbox orchestration:

Components:

RL/Agent: Reinforcement Learning agents or AI agents that use environments
AEnv SDK: Client SDK for interacting with the platform
api service: API gateway for environment instance management
K8S Engine: Kubernetes orchestration layer containing:
- Controller: Manages sandbox lifecycle and scheduling
- K8S: Kubernetes cluster for pod management
Other Engine: Placeholder for future sandbox engines (extensible architecture)
Sandbox: Running environment instance containing:
- AEnv SDK: Runtime SDK within the sandbox
- MCP Tools/Functions: Model Context Protocol tools and functions

Workflow:

Create Environment Instance: RL/Agent requests to create environment instance via AEnv SDK
API Request: AEnv SDK sends request to api service
Metadata Retrieval: api service pulls environment metadata from envhub
Sandbox Creation: api service requests Controller (K8S Engine) or Future Engine to create sandbox
Scheduling: Controller schedules pod creation via K8S (Controller only supports K8S)
Sandbox Deployment: K8S or Future Engine deploys the sandbox instance
Tool Execution: AEnv SDK calls tools/functions via api service (acting as proxy)
Proxy Communication: api service proxies the request to AEnv SDK within the sandbox
MCP Execution: AEnv SDK in sandbox executes MCP Tools/Functions
Result Return: Results flow back through api service proxy to the client AEnv SDK

Sandbox Engine Architecture#

AEnvironment is designed to support multiple sandbox engines through a pluggable architecture. The API Service adapts to different sandbox engines, while the Controller only supports Kubernetes.

Sandbox Engine Interface#

The API Service implements adapters for different sandbox engines:

// EnvInstanceService defines sandbox crud interfaces
type EnvInstanceService interface {
   GetEnvInstance(id string) (*models.EnvInstance, error)
   CreateEnvInstance(req *backend.Env) (*models.EnvInstance, error)
   DeleteEnvInstance(id string) error
   ListEnvInstances(envName string) ([]*models.EnvInstance, error)
   Warmup(req *backend.Env) error
   Cleanup() error
}

Current Implementation: Kubernetes#

Kubernetes is the current production sandbox engine. It provides:

Container Orchestration: Native Kubernetes pod management
Resource Management: CPU, memory, GPU allocation
Scaling: Horizontal and vertical scaling capabilities
High Availability: Built-in redundancy and failover
Networking: CNI-based networking and service mesh support

Architecture:

Kubernetes Sandbox
├── Pod Management      # Create, get, list, delete pods
├── Template System     # Pod templates for consistent configs
├── State Cache         # In-memory pod state cache
└── Watch API           # Real-time pod state updates

see customize sandbox engine integrating for further usage

Core Components#

API Service#

The API Service is the entry point for all client requests, providing a unified interface to the platform.

Key Features:

REST API: Standard HTTP endpoints for environment management
MCP Gateway: Model Context Protocol support for tool communication
Sandbox Engine Adaptation: Adapts to different sandbox engines (K8S via Controller, ASandbox, E2B, Ray, etc.)
Proxy Communication: Acts as proxy between client AEnv SDK and AEnv SDK in sandbox
Authentication: API key and token-based authentication
Rate Limiting: Request throttling and quotas
Instance Management: Environment lifecycle operations

Architecture:

api-service/
├── controller/     # Request handlers (env instances, MCP proxy)
├── middleware/     # Auth, logging, rate limiting, metrics
├── models/         # Data models
└── service/        # Business logic (backend client, scheduler client)

Endpoints:

POST /env-instance - Create a new environment instance
GET /env-instance/:id/list - List instances for a user
GET /env-instance/:id - Get instance details
DELETE /env-instance/:id - Delete an instance
GET /health - Health check
GET /metrics - Prometheus metrics

Ports:

:8080 - Main API server
:8081 - MCP Gateway

Middleware Stack:

Authentication middleware (token validation with caching)
Rate limiting middleware (QPS throttling)
Instance limit middleware (per-user limits)
Logging middleware (structured logging)
Metrics middleware (Prometheus metrics)

Controller#

The Controller manages environment lifecycle and interacts with the Kubernetes sandbox engine. Note: The Controller only supports Kubernetes engine. For other sandbox engines (ASandbox, E2B, Ray, etc.), the API Service directly adapts to them.

Key Features:

Kubernetes Integration: Direct integration with Kubernetes for pod management
Pod Management: Create, get, list, and delete pods (Kubernetes only)
Pod Template Management: Load and apply pod templates
State Tracking: Cache and track instance status
Leader Election: High availability with Kubernetes leader election

Architecture:

controller/
├── cmd/               # Entry points
├── controllers/       # Kubernetes controllers
├── pkg/
│   ├── aenvhub_http_server/  # HTTP server and pod handlers
│   ├── http_server/          # Pod template utilities
│   └── model/                # Data models
└── deploy/
    └── pod_templates/ # Pod template definitions

Endpoints:

POST /pods - Create a new pod
GET /pods - List pods
GET /pods/{podName} - Get pod details
DELETE /pods/{podName} - Delete a pod

Port: :8080

Kubernetes Integration:

The Controller directly uses the Kubernetes client-go library for pod management:

// Controller only supports Kubernetes
type Controller struct {
    clientset kubernetes.Interface
    podCache  *PodCache
    templates *TemplateManager
}

func (c *Controller) CreatePod(spec *PodSpec) (*Pod, error) {
    // Create pod via Kubernetes API
    pod, err := c.clientset.CoreV1().Pods(spec.Namespace).Create(ctx, spec.PodSpec, metav1.CreateOptions{})
    if err != nil {
        return nil, err
    }

    // Track in cache
    c.podCache.Add(pod)

    return pod, nil
}

Note: The Controller is Kubernetes-specific. For other sandbox engines (ASandbox, E2B, Ray, etc.), the API Service directly integrates with them and adapts to their specific APIs.

EnvHub#

EnvHub is the central registry for environment definitions and container images.

Key Features:

Environment Registry: Store and retrieve environment definitions
Image Management: Container image storage and distribution
Version Control: Semantic versioning and version history
Discovery: Search and browse available environments

Architecture:

envhub/
├── controller/    # API handlers
├── service/       # Registry operations
├── models/        # Data models
└── clients/       # External service clients

API Endpoints:

GET /api/v1/environments - List environments
GET /api/v1/environments/{name} - Get environment details
POST /api/v1/environments - Push new environment
DELETE /api/v1/environments/{name} - Delete environment
GET /api/v1/environments/{name}/versions - Get version history

Data Flow#

Development Side Flow#

        sequenceDiagram
    participant Developer as Environment Developer
    participant CLI as AEnv CLI
    participant EnvHub as envhub
    participant CI as CI Optional
    participant Redis as redis

    Developer->>CLI: edit env
    CLI->>EnvHub: push env
    EnvHub-.->>CI: trigger CI (optional)
    CI-->>EnvHub: CI results (optional)
    EnvHub->>Redis: store env meta
    EnvHub-->>CLI: Environment Registered

Traffic Side Flow#

        sequenceDiagram
    participant RL as RL/Agent
    participant SDK as AEnv SDK
    participant API as api service
    participant EnvHub as envhub
    participant Controller as Controller
    participant K8S as K8S
    participant Sandbox as Sandbox

    RL->>SDK: create env instance
    SDK->>API: Create Environment Instance Request
    API->>EnvHub: pull env meta
    EnvHub-->>API: Environment Metadata
    API->>Controller: create sandbox
    Controller->>K8S: schedule
    K8S->>Sandbox: deploy
    Sandbox-->>K8S: Sandbox Ready
    K8S-->>Controller: Pod Running
    Controller-->>API: Instance Created
    API-->>SDK: Environment Ready
    SDK-->>RL: Instance Ready
    RL->>SDK: call tool/function
    SDK->>API: call tool/function
    API->>Sandbox: proxy to AEnv SDK in sandbox
    Sandbox->>Sandbox: Execute MCP Tools/Functions
    Sandbox-->>API: tool result
    API-->>SDK: tool result
    SDK-->>RL: Tool Result

Tool Execution Flow#

        sequenceDiagram
    participant Agent
    participant Env as Environment Instance
    participant MCP as MCP Server
    participant Tool

    Agent->>Env: call_tool("search", {...})
    Env->>MCP: tools/call
    MCP->>Tool: Execute function
    Tool-->>MCP: Result
    MCP-->>Env: Tool result
    Env-->>Agent: ToolResult

Sandbox Engine Configuration#

Kubernetes Configuration#

Pod Templates:

The Controller supports two pod template types: Single Container (default) and Dual Container. Templates are configured in deploy/controller/values.yaml under the podTemplates section.

Single Container Template (Default):

The single container template is the default configuration. It creates pods with a single container running the environment:

# From deploy/controller/values.yaml - podTemplates.singleContainer
apiVersion: v1
kind: Pod
metadata:
  name: placeholder
  namespace: {{ .Values.sandboxNamespace }}
  labels:
    app.kubernetes.io/name: aenv-sandbox
    template-type: singleContainer
spec:
  containers:
  - name: main
    image: weather:v0.1.0
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: "2"
        memory: "4Gi"
        ephemeral-storage: "10Gi"
      requests:
        cpu: "1"
        memory: "2Gi"
        ephemeral-storage: "5Gi"
    livenessProbe:
      httpGet:
        path: /health
        port: 8081
        scheme: HTTP
    env:
    - name: AENV_TYPE
      value: "singleContainer"
  restartPolicy: Never

Dual Container Template:

The dual container template creates pods with two containers: a main container and a sidecar container. This is useful for scenarios requiring additional services or shared storage:

# From deploy/controller/values.yaml - podTemplates.dualContainer
apiVersion: v1
kind: Pod
metadata:
  name: placeholder
  namespace: {{ .Values.sandboxNamespace }}
  labels:
    app.kubernetes.io/name: aenv-sandbox
    template-type: dualContainer
spec:
  containers:
  - name: main
    image: aenv-base:latest
    # Main container configuration
    volumeMounts:
    - mountPath: /shared
      name: shared-data
  - name: second
    image: aenv-sidecar:latest
    # Sidecar container configuration
    volumeMounts:
    - mountPath: /shared
      name: shared-data
  volumes:
  - name: shared-data
    emptyDir: {}
  restartPolicy: Never

Template Selection:

By default, the single container template is used
The template type can be specified in the environment creation request, which comes from the aenvironment config.json file
Templates are loaded from the ConfigMap mounted at /etc/aenv/pod-templates/

Controller Configuration:

controller \
  --namespace aenv \
  --server-port 8080 \
  --metrics-addr :8088 \
  --leader-elect true \
  --qps 50 \
  --burst 100

Deployment Modes#

AEnvironment supports multiple deployment modes for different use cases:

Kubernetes Mode#

Production deployment on Kubernetes clusters with high availability and scalability.

Key Features:

High availability with leader election
Horizontal scaling support
Production-ready configuration
Helm chart-based deployment

For detailed deployment instructions, see Deployment Guide.

Quick Start:

helm install aenv-platform ./deploy \
  --namespace aenv \
  --create-namespace \
  --wait \
  --timeout 10m

Docker Compose Mode (TODO)#

Single-node deployment for development and testing using Docker Compose.

Planned Features:

Quick local setup
All components in one stack
Suitable for development and testing
Easy to start/stop

Other Deployment Modes (TODO)#

Additional deployment options planned for future releases:

Cloud Provider Managed Services: AWS ECS, Azure Container Instances, GCP Cloud Run
Serverless Mode: Function-as-a-Service deployment
Edge Deployment: Lightweight deployment for edge computing scenarios

Scalability#

Horizontal Scaling#

        graph LR
    LB[Load Balancer]

    subgraph API Tier
        API1[API 1]
        API2[API 2]
        API3[API N]
    end

    subgraph Controller Tier
        C1[Controller 1<br/>Leader]
        C2[Controller 2<br/>Standby]
    end

    subgraph Sandbox Tier
        R1[Node 1]
        R2[Node 2]
        R3[Node N]
    end

    LB --> API1
    LB --> API2
    LB --> API3
    API1 --> C1
    API2 --> C1
    API3 --> C2
    C1 --> R1
    C1 --> R2
    C2 --> R3

Security#

Network Isolation#

        graph TB
    subgraph Public
        Client[Client]
    end

    subgraph DMZ
        GW[API Gateway]
    end

    subgraph Private
        API[API Service]
        Ctrl[Controller]
    end

    subgraph Sandbox
        Env1[Env 1]
        Env2[Env 2]
    end

    Client --> GW
    GW --> API
    API --> Ctrl
    Ctrl --> Env1
    Ctrl --> Env2
    Env1 -.- Env2

Authentication & Authorization#

API Keys: Service-to-service authentication
Namespace Isolation: Multi-tenant separation

Observability#

Metrics#

All components expose Prometheus metrics:

Component	Key Metrics
API Service	Request rate, latency, error rate, active instances
Controller	Pod operations, cache size, sandbox engine calls
EnvHub	Push/pull duration, cache hits, registry errors

High Availability#

Controller Leader Election#

        graph TB
    subgraph Controller HA
        C1[Controller 1<br/>Leader<br/>HTTP Server Active]
        C2[Controller 2<br/>Standby]
        C3[Controller 3<br/>Standby]
    end

    subgraph Leader Election
        K8sAPI[Kubernetes API<br/>Lease Resource]
    end

    subgraph API Service
        API[API Service]
    end

    C1 --> K8sAPI
    C2 --> K8sAPI
    C3 --> K8sAPI
    API --> C1

Features:

Leader election via Kubernetes Lease resource
Automatic failover when leader becomes unavailable
Multiple replicas for redundancy
Only leader instance serves HTTP requests

Future Enhancements#

Planned Sandbox Engines#

The architecture supports adding additional sandbox engines such as:

ASandbox: High-performance container runtime for fast startup and execution
E2B: Secure cloud development environments with isolated sandboxes
Ray: Distributed computing framework for scalable AI workloads
Container Runtimes: Docker, containerd, CRI-O
VM-based Sandboxes: Firecracker, Kata Containers

Each sandbox engine implements the SandboxEngine interface and can be configured via the API Service. The API Service adapts to different engines, while the Controller only supports Kubernetes.

Sandbox Engine Features#

Auto-scaling: Dynamic scaling based on demand
Resource Optimization: Intelligent resource allocation
Multi-tenancy: Enhanced isolation between tenants

Future Plans#

The following features are planned for future releases:

Observability Enhancements#

Logging#

Real-time log streaming from sandbox execution environments:

Features:

Stream Logs: Real-time log streaming from running sandbox instances
Structured Logging: JSON-formatted logs with correlation IDs for request tracing
Log Aggregation: Centralized log collection from all sandbox instances
Log Query: Search and filter logs by instance ID, correlation ID, timestamp, and log level

Log Format:

{
  "timestamp": "2025-01-15T10:30:00Z",
  "level": "info",
  "message": "Tool execution completed",
  "instance_id": "env-instance-123",
  "env_name": "swe-env",
  "sandbox_engine": "kubernetes",
  "correlation_id": "abc-123",
  "tool_name": "search",
  "duration_ms": 150,
  "stdout": "Search results: ...",
  "stderr": ""
}

API Endpoints:

GET /env-instance/:id/logs - Stream logs from a specific sandbox instance
GET /env-instance/:id/logs?follow=true - Follow logs in real-time (streaming)
GET /env-instance/:id/logs?tail=100 - Get last N lines of logs

Distributed Tracing#

OpenTelemetry support for end-to-end request tracing across all components and sandbox engines.

Authentication & Authorization Enhancements#

JWT Tokens: User authentication support
RBAC: Role-based access control for fine-grained permission management

AEnvironment Architecture

Contents

AEnvironment Architecture#

System Architecture#

Development Side#

Traffic Side#

Sandbox Engine Architecture#

Sandbox Engine Interface#

Current Implementation: Kubernetes#

Core Components#

API Service#

Controller#

EnvHub#

Data Flow#

Development Side Flow#

Traffic Side Flow#

Tool Execution Flow#

Sandbox Engine Configuration#

Kubernetes Configuration#

Deployment Modes#

Kubernetes Mode#

Docker Compose Mode (TODO)#

Other Deployment Modes (TODO)#

Scalability#

Horizontal Scaling#

Security#

Network Isolation#

Authentication & Authorization#

Observability#

Metrics#

High Availability#

Controller Leader Election#

Future Enhancements#

Planned Sandbox Engines#

Sandbox Engine Features#

Future Plans#

Observability Enhancements#

Logging#

Distributed Tracing#

Authentication & Authorization Enhancements#