# Data

This document describes the data preparation process for different stages: SFT, Reward, RLHF, DPO, OnlineDPO and GRPO.


**The following is a collection of general environment variables used in this tutorial script:**

| ENV | Explanation |
| --- | --- |
| `CHATLEARN` | The location where the ChatLearn code is cloned [https://github.com/alibaba/ChatLearn.git](https://github.com/alibaba/ChatLearn.git) |
| `DATASET_ROOT` | The root directory for storing the SFT/Reward/RLHF/DPO/OnlineDPO/GRPO training dataset collection. |


## 1 Prepare SFT Training Data

Organize the question-response pairs of SFT data into a jsonl file, where each line of the jsonl file represents a SFT data sample in the following Python dictionary format:

```
{'query': question, 'response': reply}
```

Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/sft/train.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=$path_to_dataset_root
python data/prepare_data_sft.py $DATASET_ROOT
```

## 2 Prepare Reward Training Data

1. First, prepare question-different response pairs and organize them into a jsonl file. Each line in the jsonl file represents a Reward model training data sample in the following Python dictionary format:

```
{'query': question, 'response': [reply 1, reply 2, ...], 'score': [score1, score2, ...]}
```

The score value indicates the quality of the corresponding response, with higher scores indicating higher quality and closer to human preference.

2. Taking the example of Anthropic's helpful&harmless data, use the following code to store it in `$DATASET_ROOT/rm/train.jsonl` and `$DATASET_ROOT/rm/dev.jsonl`.

```bash
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_reward.py $DATASET_ROOT
```

## 3 Prepare Alignment Training Data

ChatLearn supports multiple alignments: RLHF, DPO, OnlineDPO, GRPO

1. Firstly, prepare a dataset of instructions to be explored and organize it into a JSON file. Each line in the JSON file should represent a prompt in the following format:

```
{"prompt": prompt}
```

2. Taking Anthropic's helpful & harmless data as an example, use the following code to store the dataset in `$DATASET_ROOT/alignment/train.jsonl` and `$DATASET_ROOT/alignment/dev.jsonl`:

```bash
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_alignment.py $DATASET_ROOT
```
## 4 Prepare Math Training Data

1. Firstly, prepare a dataset of math data to be explored and organize it into a JSON file. Each line in the JSON file should represent a prompt in the following format:

```
{"eval_func": "math_rule", "prompt": prompt, 'answer': answer}
```

2. Taking openai/gsm8k data as an example, use the following code to store the dataset in `$DATASET_ROOT/math/train.jsonl`:

```bash
cd ${CHATLEARN}/examples/megatron/
DATASET_ROOT=path-to-dataset-root
python data/prepare_data_math.py $DATASET_ROOT
```