# End-to-end training tutorial based on the Qwen model This document describes DPO training based on the ChatLearn, DeepSpeed framework, and Qwen model. **The following is a collection of common environment variables used in this tutorial script:** | ENV | Meaning | | --- |-------------------------------------------------------------------------------------------------------------------------------| | `CHATLEARN` | Location where the ChatLearn code repository is cloned [https://github.com/alibaba/ChatLearn.git](https://github.com/alibaba/ChatLearn.git) | | `DATASET_ROOT` | Root directory where the training datasets are stored | ## Setup: Image, Code, and Data Preparation ### Image / Code Please refer to [Environment and Code Setup](../installation.md). ### Data The data format required by qwen2 is chatml: ``` {"type": "chatml", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me something about large language models."}, {"role": "assistant", "content": "Large language models are a type of language model that is trained on a large corpus of text data. They are capable of generating human-like text and are used in a variety of natural language processing tasks..."}], "source": "unknown"} ``` The following script can convert `Dahoas/full-hh-rlhf` to data in chatml format and store it in the file `$DATASET_ROOT/alignment/train.jsonl`: ```bash cd ${CHATLEARN}/examples/huggingface/ DATASET_ROOT=path-to-dataset-root python data/preprocess_data_chatml.py $DATASET_ROOT ``` ## DPO Here is an example of DPO training for Qwen2-7B. In this example, the user needs to set `policy_model_path` to the initialization model checkpoint path, and the Policy model and Reference model will be initialized with this checkpoint. ``` export CHATLEARN=path-to-chatlearn export DATASET_PATH=$DATASET_ROOT/alignment/train.jsonl export policy_model_path=path-to-qwen2-ckpt cd ${CHATLEARN}/examples/huggingface/ bash scripts/train_dpo_qwen.sh ```