Debugging Guide#
Here’s how to debug AReaL training applications, focusing on RolloutWorkflow
and
custom RL algorithms.
Debugging RolloutWorkflow
with a Persistent Inference Server#
The trick is to launch a standalone, persistent inference server for your agent’s generation logic. This way, you can test repeatedly without restarting the server each time.
Why this works well:
Lightweight - Your debug program only needs CPU while inference runs on GPU
IDE friendly - Works perfectly with VS Code’s Python debugger
Fast iterations - No need to restart servers between debugging sessions
1. Launch the Standalone SGLang Server#
First, start your SGLang server with an inference-only allocation_mode
like
sglang.d4p1t1
:
nohup python -m areal.launcher.local examples/gsm8k_grpo.py \
--config examples/configs/gsm8k_grpo.yaml \
allocation_mode=sglang.d4p1t1 > llm_server.log 2>&1 &
Note: For debugging purposes, only the allocation_mode
and sglang
configs
matter. You can ignore everything else in the example YAML file.
Once it’s running, you’ll find the server address in the log:
LLM inference server launched at: AREAL_LLM_SERVER_ADDRS=127.0.0.1:20082
2. Run Your Debug Program#
Create a debug script (e.g., agent_debug.py
) with your custom workflow implementation:
# Create dataset and dataloaders
train_dataset = get_custom_dataset(...)
# Select a small subset of the dataset for debugging
train_dataset = train_dataset.select(range(config.train_dataset.batch_size))
train_dataloader = StatefulDataLoader(...)
# Initialize inference engine - reads server addresses from environment variable
rollout = RemoteSGLangEngine(config.rollout)
rollout.initialize(None, None)
# Create rollout workflow
workflow = MyWorkflow(...)
dump_dir = os.path.join(
StatsLogger.get_log_path(config.stats_logger), "generated"
)
data_generator = itertools.cycle(train_dataloader)
generated_data = rollout.rollout_batch(next(data_generator), workflow=workflow)
# Save generated data for later use
torch.save(generated_data, os.path.join(dump_dir, "batch_data.pt"))
rollout.destroy()
Now run your debug script, passing the server address through the environment:
AREAL_LLM_SERVER_ADDRS=127.0.0.1:20082 \
python agent_debug.py --config agent_debug.yaml \
rollout.enable_rollout_tracing=True
Debugging Custom RL Algorithms#
If you’re using existing AReaL algorithms like GRPO, you can skip this section.
For custom RL algorithms, you can debug them just like offline training (think SFT) by using pre-generated data instead of running inference.
This approach is great because:
No inference servers - You don’t need to manage any servers
Faster iterations - Skip the expensive data collection step
Reproducible - Use the same data across debugging sessions
Isolated testing - Focus purely on your RL logic
1. Configure Allocation Mode#
First, turn off SGLang inference in your config:
allocation_mode: d4p1t1
2. Create Your RL Debug Script#
Then create your debug script that loads the pre-generated data:
# Create dataset and dataloaders
train_dataset = get_custom_dataset(...)
train_dataloader = StatefulDataLoader(train_dataset, ...)
# Configure tokenizer stop tokens
if tokenizer.pad_token_id not in config.gconfig.stop_token_ids:
config.gconfig.stop_token_ids.append(tokenizer.pad_token_id)
if tokenizer.eos_token_id not in config.gconfig.stop_token_ids:
config.gconfig.stop_token_ids.append(tokenizer.eos_token_id)
# Load previously generated data
dump_dir = os.path.join(
StatsLogger.get_log_path(config.stats_logger), "generated"
)
batch = torch.load(os.path.join(dump_dir, "batch_data.pt"), weights_only=False)
# Prepare batch for training
batch = batch.to('cuda')
dist.barrier(device_ids=[actor.device.index])
torch.cuda.synchronize()
# Your custom algorithm logic here
...