Preparation

Last updated: 2025-11-08

With the environment and dependencies installed, the final step is to prepare the necessary assets for training. This guide covers the two main prerequisites: downloading and converting the model weights, and formatting the training dataset.

Download and Merge Model Weights

Our training scripts require model weights in a “merged-expert” format for optimal performance. Before starting, you must download the standard weights and convert them.

Step 1: Download Original Model

We provide a helper script to download the weights from Hugging Face:

# Choose a destination for the original model files
python ./scripts/download_hf_model.py \
  --repo_id inclusionAI/LLaDA2.0-mini-preview \
  --local_dir /path/to/separate_expert_model

Step 2: Convert to Merged Format

Run the following script to create the merged checkpoint required for training:

# Use the path from the previous step as the source
python scripts/moe_convertor.py \
  --input-path /path/to/separate_expert_model \
  --output-path /path/to/save/merged_model \
  --mode merge

The directory /path/to/save/merged_model is what you will use for the training script. For more details, see MoE Expert Merging and Splitting Utilities

Prepare Training Data

This tutorial uses the openai/gsm8k dataset and demonstrates how to convert it into the conversational format.

Provided Script

We provide an example script ./scripts/build_gsm8k_dataset.py for this purpose. You can adapt this script or write your own to process other datasets.

The script converts the “question” and “answer” fields into a conversational messages field. The processed dataset is saved to the ./gsm8k_datasets/ directory, split into:

train.jsonl - Training data
test.jsonl - Evaluation data

Run the script:

python ./scripts/build_gsm8k_dataset.py