Configurations#
This page provides a comprehensive reference for all configuration parameters available in AReaL’s command-line interface. These parameters are defined using dataclasses and can be specified in YAML configuration files or overridden via command line arguments.
Usage#
Configuration files are specified using the --config parameter:
python -m areal.launcher --config path/to/config.yaml
You can override specific parameters from the command line:
python -m areal.launcher --config path/to/config.yaml actor.lr=1e-4 seed=42
For detailed examples, see the experiment configurations in the examples/ directory.
Table of Contents#
Core Experiment Configurations#
Training Configurations#
Inference Configurations#
Dataset#
System and Cluster Configurations#
Logging and Monitoring#
Others#
BaseExperiment Configuration#
Base configuration class for all experiment types with common settings.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
GRPO Configuration#
A dummy place holder of GRPO config for backward compatibility.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
- |
|
|
|
- |
PPO Configuration#
Configuration for Proximal Policy Optimization (PPO) reinforcement learning experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
- |
|
|
|
- |
RW Configuration#
Configuration for Reward Model (RW) training experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
SFT Configuration#
Configuration for Supervised Fine-Tuning (SFT) experiments.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Name of the experiment (no ‘_’ or ‘/’). Required. |
|
string |
Required |
Name of the trial (no ‘-’ or ‘/’). Required. |
|
Required |
Cluster specification. Mainly used by slurm. |
|
|
string |
|
Pattern-based GPU parallel strategy allocation mode. |
|
integer |
|
Random seed for reproducibility. |
|
boolean |
|
Whether to enable training offload using torch_memory_saver. This requires setting up the environment for TMS (e.g., via LD_PRELOAD). |
|
integer |
|
Total number of epochs to train the model. |
|
integer | None |
|
Terminate training after this number of steps. For benchmarking purposes only. None indicates normal training. |
|
integer | None |
|
Terminate training after consuming this number of samples. For benchmarking purposes only. None indicates normal training. |
|
string |
|
Path to the tokenizer. |
|
Required |
- |
|
|
|
|
- |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
|
|
Performance tracer configuration. None means disabled. |
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
|
|
Required |
- |
FSDPEngine Configuration#
Configuration for Fully Sharded Data Parallel (FSDP) training backend.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
FSDP wrap policy, specifying model layers to wrap. |
|
boolean |
|
Whether to offload FSDP parameters to CPU. |
FSDPWrapPolicy#
Policy configuration for FSDP model layer wrapping. None defaults to wrapping transformer decoder layers defined by transformers.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
list of string | None |
|
A list of transformer layer names for FSDP to wrap. |
MicroBatch Specification#
Specification for splitting micro-batches during training.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer | None |
|
Number of micro-batches (or minimum number if max_tokens_per_mb is set). Used when max_tokens_per_mb is None or as minimum count |
|
integer |
|
Granularity of each micro-batch. Adjacent sequences are grouped by this size when dividing microbatches. |
|
integer | None |
|
Maximum tokens per micro-batch for each forward pass. When set, n_mbs becomes the minimum number of micro-batches. |
|
integer |
|
Divisor for the number of micro-batches. The final number of micro-batches will be adjusted to be divisible by this value. |
Norm Configuration#
Configuration for reward/advantage normalization.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
Mean level for normalization. None for no mean normalization. Choices: |
|
boolean |
|
Whether to use leave-one-out average. |
|
string | None |
|
Standard deviation level for normalization. None for no std normalization. Choices: |
|
boolean |
|
Whether to use unbiased standard deviation computation. Defaults to True (changed from False in v0.3.4). |
|
float |
|
The eps when dividing by standard deviation to avoid numerical issues. |
|
integer |
|
Group size for group-level normalization |
Optimizer Configuration#
Configuration for model optimization during training.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Optimizer type. Adam_bf16 currently only supported FSDP Engine. Choices: |
|
float |
|
Learning rate |
|
float |
|
Weight decay |
|
float |
|
Adam beta1 parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Adam beta2 parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Adam epsilon parameter. Only effective when optimizer_type is adam/adam_bf16 |
|
float |
|
Minimum learning rate ratio after annealing |
|
string |
|
Learning rate scheduler type Choices: |
|
float |
|
Proportion of training steps for warmup |
|
boolean |
|
Enable optimizer state offloading |
|
float |
|
Initial loss scaling factor |
|
float |
|
Minimum loss scaling factor |
|
float |
|
Window size for loss scaling adjustment |
|
integer |
|
Hysteresis (scaling factor) for loss scaling |
|
float |
|
Gradient clipping threshold |
PPOActor Configuration#
Configuration for PPO actor model, a subclass of a TrainEngine.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
|
|
integer |
|
Number of sequences in each group |
|
integer |
|
Number of minibatches for each PPO update |
|
float |
|
Clipping factor for policy ratio |
|
float | None |
|
Clipping factor (higher value) for policy ratio. Default is None. When eps_clip_higher is set (decoupled), eps_clip will be used as the lower value. |
|
float | None |
|
Dual clipping factor for policy ratio, must be > 1.0. None disables dual clipping. |
|
float | None |
|
The second momentum threshold for M2PO. |
|
|
|
Normalization configuration for rewards |
|
float |
|
Reward scaling factor |
|
float |
|
Reward bias |
|
float |
|
Maximum absolute value for reward clipping |
|
boolean |
|
Penalty for overlong sequences. Used within DAPO. |
|
integer | None |
|
Number of tokens in the tail that will receive a penalty |
|
float | None |
|
Penalty factor for tokens in the tail |
|
boolean |
|
Mask truncated generations (no EOS token) and exclude from training |
|
float |
|
Discount factor for future rewards |
|
float |
|
Lambda parameter for GAE |
|
|
|
Normalization configuration for advantages. |
|
float |
|
KL divergence coefficient |
|
string |
|
KL divergence estimator Choices: |
|
boolean |
|
Recompute log probability and replace the log probability returned by inference. |
|
boolean |
|
Use the decoupled loss. Implicitly enables recompute_logprob. |
|
float | None |
|
Filter out tokens where behav_imp_weight exceeds behav_imp_weight_cap when computing loss. Must be > 1.0. use_decoupled_loss must be true. |
|
string |
|
Level at which to compute importance sampling ratios. ‘token’: per-token ratios (standard PPO). ‘sequence’: sequence-level geometric mean of per-token ratios (GSPO). Choices: |
|
string |
|
Method for computing proximal policy log-probabilities in decoupled PPO. Only effective when use_decoupled_loss=True. Options: ‘recompute’ (default): Standard decoupled PPO, recompute proximal policy via forward pass. ‘loglinear’: Use log-linear interpolation to approximate proximal policy (skip forward pass). ‘metrics’: Like ‘recompute’, but also compute approximation metrics for evaluation. Choices: |
|
boolean |
|
Enable dynamic sampling (within DAPO). If enabled, groups with the same reward will be masked out. Note that enabling this option will lead to variable batch sizes. If you want to use a constant batch size with dynamic filtering, you should use the |
|
boolean |
|
Log statistics for agent trajectories |
|
list of string |
Required |
Keys for logging agent trajectory statistics |
|
integer |
|
Maximum number of new tokens to generate |
PPOCritic Configuration#
Configuration for PPO critic model, a subclass of a TrainEngine.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
|
|
integer |
|
Number of minibatches for each PPO update |
|
float |
|
Clipping factor for value loss |
|
boolean |
|
Mask truncated generations (no EOS token) and exclude from training |
TrainEngine Configuration#
Core configuration for model training, including optimization and backend settings.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
|
Path to HuggingFace checkpoint |
|
string |
|
Attention implementation for huggingface transformers model. Choices: |
|
boolean |
|
Initialize model weights randomly |
|
boolean |
|
Whether to use a critic/reward model |
|
float |
|
Temperature during generation. |
|
Required |
- |
|
|
boolean |
|
Whether to pad each microbatch to the length upper bound specified by mb_spec. Can reduce memory fragmentation but slows down training. |
|
boolean |
|
Disable dropout layers during training |
|
boolean |
|
Enable gradient checkpointing |
|
string |
|
Parameter data type. |
|
string |
|
Gradient reduction data type. |
|
|
|
Optimizer configuration. None means no training. |
|
string |
|
Weight update backend type. Choices: |
|
Required |
- |
|
|
Required |
- |
|
|
boolean |
|
Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
|
integer |
|
lora rank |
|
integer |
|
lora alpha |
|
list of string |
Required |
lora target_modules. |
|
string |
|
peft method type. Only LoRA is supported for now. |
|
|
Required |
Train engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the TrainController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the TrainController. |
GenerationHyperparameters#
Controls text generation behavior for rollout.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Number of sequences to generate per prompt. |
|
integer |
|
Maximum number of tokens to generate. |
|
integer |
|
Minimum number of tokens to generate. |
|
integer |
|
Maximum number of tokens including prompt and generated tokens. |
|
boolean |
|
Whether to use greedy decoding (max probability). |
|
float |
|
Nucleus sampling probability threshold (0.0, 1.0]. |
|
integer |
|
Number of highest probability tokens to consider. |
|
float |
|
Sampling temperature. Higher values increase diversity. |
|
list of integer |
Required |
Stop generation when encountering these token IDs. |
|
list of string | None |
|
One or multiple stop words. Generation will stop if one of these words is sampled. |
|
float |
|
Penalizes tokens based on their frequency in generation so far. Must be between -2 and 2 where negative numbers encourage repetition. |
|
string |
|
Lora name to be used for this generation. |
InferenceEngine Configuration#
Configuration for inference servers, including offpolicyness control.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
|
string | None |
|
- |
|
integer | None |
|
Maximum number of concurrent rollouts to the inference engine. Defaults to consumer_batch_size. |
|
integer | None |
|
Input/Output queue size for async rollout. |
|
integer |
|
Batch size for consuming rollouts from the queue. |
|
integer |
|
Maximum off-policyness for the head. If the current version is more than this many versions behind, the request will not be accepted. |
|
boolean |
|
Whether to output verbose tracing messages for each generation request. |
|
boolean |
|
Whether to check the format of produced trajectories of a customized workflow. Useful when debugging the workflow in isolation. Should be False during RL training. |
|
string |
|
Request scheduling policy Choices: |
|
float |
|
Timeout in seconds of connecting to remote servers or launching local servers. |
|
float |
|
Timeout for HTTP requests. |
|
integer |
|
Number of retries for failed requests. |
|
float |
|
The grace period after calling /pause_generation. Wait until all requests have been dropped. |
|
|
Required |
inference engine schedule specs. Can accept 1 or 2 SchedulingSpec: if 1 spec provided, it’s used for both worker and engine, engine is embedded in the worker; if 2 specs provided, first one is for worker, second one is for engine. Currently only used by the RolloutController. |
|
Required |
The scheduling strategy of this TrainEngine, either separation or colocation. Currently only used by the RolloutController. |
|
|
boolean |
|
Whether to use LoRA. Should be same as actors LORA option. |
SGLang Configuration#
Configuration for SGLang runtime. Refer to:
sgl-project/sglang for detailed documentation.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
integer | None |
|
- |
|
list of integer | None |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
integer | None |
|
- |
|
float | None |
|
- |
|
integer | None |
|
- |
|
integer | None |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
float |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
string |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
boolean | None |
|
- |
|
integer | None |
|
- |
|
list of string | None |
|
- |
|
list of string | None |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string | None |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
vLLM Configuration#
Configuration for vLLM runtime. Refer to:
https://docs.vllm.ai/en/stable/api/index.html for detailed documentation.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
integer |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
string |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
integer |
|
- |
|
float |
|
- |
|
boolean |
|
- |
|
integer | None |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
float |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
string |
|
- |
TrainDataset Configuration#
Configuration for training dataset loading and preprocessing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to the dataset. Can be a local path or a HuggingFace dataset name. |
|
string |
Required |
Type of training method, e.g., ‘sft’, ‘rl’, etc. |
|
integer |
|
Batch size for the dataloader |
|
boolean |
|
Whether to shuffle the dataset |
|
boolean |
|
Pin memory for faster data loading (set True for GPU training) |
|
integer |
|
Number of worker processes for data loading |
|
boolean |
|
Drop the last incomplete batch |
|
integer | None |
|
Maximum token length of sequences in dataset. Longer sequences are filtered out. |
ValidDataset Configuration#
Configuration for validation dataset loading and preprocessing.
It has different default values with TrainDatasetConfig. shuffle and drop_last
default to False.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
Path to the dataset. Can be a local path or a HuggingFace dataset name. |
|
string |
Required |
Type of training method, e.g., ‘sft’, ‘rl’, etc. |
|
integer |
|
Batch size for the dataloader |
|
boolean |
|
Whether to shuffle the dataset |
|
boolean |
|
Pin memory for faster data loading (set True for GPU training) |
|
integer |
|
Number of worker processes for data loading |
|
boolean |
|
Drop the last incomplete batch |
|
integer | None |
|
Maximum token length of sequences in dataset. Longer sequences are filtered out. |
Cluster Specification Configuration#
Configuration for cluster specification and distributed computing setup.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
Required |
Name resolving configuration. |
|
|
string |
|
Name of the cluster. Used to set specific environs. |
|
string |
|
Root for logs and checkpoints. Should be available on all nodes. |
|
integer |
|
The size of the cluster. Used to decide slurm hostname suffix. |
|
integer |
|
Number of GPUs per node (physical). |
Launcher Configuration#
Configuration for launching the LLM server and trainer processes.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Number of CPUs allocated per GPU for inference server. |
|
integer |
|
Memory allocated per GPU for inference server in MB. |
|
integer |
|
Number of CPUs allocated per GPU for training. |
|
integer |
|
Memory allocated per GPU for training in MB. |
|
string |
|
Environment variables for inference server, separated by commas. Example: ‘ENV1=val1,ENV2=val2’. |
|
string |
|
Environment variables for training, separated by commas. Example: ‘ENV1=val1,ENV2=val2’. |
|
Required |
Slurm launcher configuration. |
NameResolve Configuration#
Configuration for distributed name resolution and service discovery.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Type of the distributed KV store for name resolving. Choices: |
|
string |
|
Record root for NFS name resolving. Should be available on all nodes. |
|
string |
|
Address of the ETCD3 server. |
|
string |
|
Name of the distributed Ray KV store. |
SlurmLauncher Configuration#
Configuration for launching the training jobs with Slurm.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Additional arguments to pass to the srun command. |
|
list of string | None |
|
Additional bash commands to setup the container before running the torchrun command. |
|
string |
|
Type of containers used in slurm Choices: |
|
string |
|
Mount path for slurm. |
|
string | None |
|
slurm image for trainers. |
|
string | None |
|
slurm image for LLM inference. |
Evaluator Configuration#
Configuration for model evaluation scheduling and timing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
Recover Configuration#
Configuration for experiment recovery and fault tolerance.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
|
string |
|
Recovery mode for the launcher. Options: ‘disabled’: Never recover from previous runs. ‘auto’: Automatically recover from previous runs if recover info and checkpoints are available. ‘fault’: Only recover from previous runs if the new run fails. ‘resume’: Force to resume, raise an error if no recover info was found. Never resume if failed again. |
|
integer |
|
Number of recovery retries (auto/fault modes only). |
Saver Configuration#
Configuration for model checkpoint saving scheduling and timing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
integer | None |
|
Trigger frequency in epochs. None disables epoch-based saving. |
|
integer | None |
|
Trigger frequency in steps. None disables step-based saving. |
|
integer | None |
|
Trigger frequency in seconds. None disables time-based saving. |
StatsLogger Configuration#
Configuration for experiment statistics logging and tracking services.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
Required |
Weights & Biases configuration. |
|
|
Required |
SwanLab configuration. |
|
|
Required |
TensorBoard configuration. Only ‘path’ field required. |
Swanlab Configuration#
Configuration for SwanLab experiment tracking and monitoring.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
|
string | None |
|
- |
|
|
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
TensorBoard Configuration#
Configuration for TensorBoard logging and visualization.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string | None |
|
- |
WandB Configuration#
Configuration for Weights & Biases experiment tracking.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
list of string | None |
|
- |
|
|
|
- |
|
string | None |
|
- |
DistributedDataParallel Configuration#
Configuration for Megatron’s DistributedDataParallel.
Refer to Megatron-LM documentation for details.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
integer | None |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
MegatronEngine Configuration#
Configuration for Megatron-LM training framework.
Refer to Megatron-LM documentation for implementation details.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
Required |
- |
|
|
integer |
|
Virtual pipeline parallel size for Megatron interleaved schedule. Set to >1 to enable VPP. Default is 1 (disabled). |
|
boolean |
|
- |
|
boolean |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
boolean |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
integer | None |
|
- |
|
boolean | None |
|
- |
|
list of string | None |
|
- |
PerfTracer Configuration#
Configuration for perf tracer emission.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
Required |
- |
|
string |
Required |
- |
|
string |
Required |
- |
|
boolean |
|
Explicitly enable or disable perf tracing. Set to true to capture perf traces. |
|
integer |
|
Flush trace events to disk every N calls to save(step=…). A value of 1 writes on every step; values <= 0 fall back to 1. |
|
list of integer | None |
|
List of step numbers at which to capture detailed profiling traces. If None, no detailed profiling traces are captured. |
|
|
|
Session tracing configuration. |
Scheduler Configuration#
Configuration for worker scheduling. Used in the single-controller mode. Experimental.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- |
|
string |
|
- |
|
string |
|
- |
|
|
Required |
- |
|
string |
|
- |
|
string |
|
- |
Scheduling Specification#
Configuration class: SchedulingSpec
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Number of CPU cores required |
|
integer |
|
Number of GPU units required |
|
integer |
|
Amount of memory (GB) required |
|
integer |
|
Number of ports to expose |
|
string |
|
Docker/Singularity container image to use |
|
string |
|
Task type (e.g., worker, engine) Choices: |
|
|
Required |
Environment variables for the container |
|
string | None |
|
Command to execute inside the container. Defaults to AReaL’s RPC server. |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
|
string | None |
|
- |
SchedulingStrategy#
Configuration class: SchedulingStrategy
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
- Choices: |
|
string | None |
|
The target role to be colocated with |
SessionTracer Configuration#
Configuration for per-session lifecycle tracing.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
Enable per-session lifecycle tracing alongside perf events. When true, session metadata is captured to sessions.jsonl. |
|
integer |
|
Flush session trace records once this many entries are ready. Values <= 0 fall back to 1. |