Preparation
Last updated: 2025-11-08
With the environment and dependencies installed, the final step is to prepare the necessary assets for training. This guide covers the two main prerequisites: downloading and converting the model weights, and formatting the training dataset.
Download and Merge Model Weights
Our training scripts require model weights in a “merged-expert” format for optimal performance. Before starting, you must download the standard weights and convert them.
Step 1: Download Original Model
We provide a helper script to download the weights from Hugging Face:
# Choose a destination for the original model files
python ./scripts/download_hf_model.py \
--repo_id inclusionAI/LLaDA2.0-mini-preview \
--local_dir /path/to/separate_expert_model
Step 2: Convert to Merged Format
Run the following script to create the merged checkpoint required for training:
# Use the path from the previous step as the source
python scripts/moe_convertor.py \
--input-path /path/to/separate_expert_model \
--output-path /path/to/save/merged_model \
--mode merge
The directory /path/to/save/merged_model is what you will use for
the training script. For more details, see MoE Expert Merging and
Splitting Utilities
Prepare Training Data
This tutorial uses the openai/gsm8k dataset and demonstrates how to convert it into the conversational format.
Provided Script
We provide an example script ./scripts/build_gsm8k_dataset.py for this purpose. You can adapt this script or write your own to process other datasets.
The script converts the “question” and “answer” fields into a conversational messages field. The processed dataset is saved to the ./gsm8k_datasets/ directory, split into:
train.jsonl- Training datatest.jsonl- Evaluation data
Run the script:
python ./scripts/build_gsm8k_dataset.py