Tau2 RL#
Train Your Agent with AReaL & AEnvironment
This example shows how to run a TAU2 task inside AEnvironment and expose a single
entrypoint (run_agent_return_reward) that can be used by AReaL as an RL
trajectory + reward function.
The implementation lives in:
aenv/examples/tau2_rl/agent.py
What this script does#
agent.py runs a loop:
Create a TAU2 environment (
tau2-env@1.0.0) viaaenv.core.environment.EnvironmentFetch the TAU2 system prompt and available tools from the environment
Run an
OpenAI Agents SDKagent turn-by-turn (tools are invoked automatically)Send the agent output back into the environment
When the episode ends, call
env.call_reward({})and return a scalar reward
This makes it suitable as the “episode runner” used in agentic RL.
Sampling temperature (important for RL)#
Following the AReaL agentic RL tutorial, temperature is an important knob for controlling exploration during data collection.
In aenv/examples/tau2_rl/agent.py, the sampling parameters are configured here:
ModelSettings(temperature=1.0, top_p=1.0, extra_args={"max_completion_tokens": 8192})
Recommended practice:
Use higher temperature (e.g.,
0.8 ~ 1.2) to collect diverse trajectories.Use lower temperature (e.g.,
0.0 ~ 0.3) during evaluation to reduce variance.
If you plan to run large-scale training, consider making temperature a configurable
argument (so AReaL can sweep it via config).
Running the agent locally (smoke test)#
Install dependencies for this example:
uv pip install -r aenv/examples/tau2_rl/requirements.txt
Run a single episode:
python aenv/examples/tau2_rl/agent.py --domain telecom --task_id <TASK_ID>
Optional environment variables (for using your own LLM in TAU2):
TAU2_USER_LLMTAU2_USER_LLM_API_BASETAU2_USER_LLM_API_KEY
Using this with AReaL (agentic RL)#
In AReaL, you typically configure:
a reward/rollout function path (Python import path)
agent sampling parameters (e.g.,
temperature,max_tokens,n_samples)
For this repository, the reward/rollout entrypoint is:
aenv.examples.tau2_rl.agent.run_agent_return_reward
Example AReaL-style config snippet:
# Pseudocode example to mirror AReaL's OpenAI Agents tutorial
reward_fn_path: "aenv.examples.tau2_rl.agent.run_agent_return_reward"
gconfig:
n_samples: 4
max_tokens: 8192
temperature: 1.0