# Configuration Files ## Training Configuration File Users need a main YAML configuration file to set up the runtime environment, model configurations, and RLHF training process-related configurations. Additionally, users may need separate model configurations for each model. The RLHF training configuration consists of three parts: 1. `runtime_env`: Configuration for the runtime environment. 2. `models`: Model configurations. Each model can have its own specific parameter configuration. Different models can be distinguished by using `model_name`, which corresponds to the `model_name` passed in when defining the model in the main file. 3. `rlhf`: RLHF training configuration. Below is an example of a training configuration. For detailed explanations of the configuration options, please refer to the [Config API Documentation](api/config.rst). To facilitate the configuration of different hyperparameters, we also support reading parameters from environment variables. The format is as follows: ``` param: ${env_name:default_value} ``` `param` is the parameter name, `env_name` is the environment variable name, and `default_value` is the default value (optional). In the following example, if the environment variable `ref_generation_batch_size` is set, the value will be read from the environment variable and assigned to `reference`'s `generation_batch_size`. If the environment variable `ref_generation_batch_size` is not set, the default value of 4 will be used. ```yaml runtime_env: platform: DLC excludes: - "*pt" - "logs" - "tensorboards" - ".nfs*" models: policy: model_config_file: policy_inference.yaml num_gpu: 8 trainable: False reference: model_config_file: reference.yaml num_gpu: 8 trainable: False generation_batch_size: ${ref_generation_batch_size:4} reward: model_config_file: reward_inference.yaml num_gpu: 8 trainable: False value: model_config_file: old_value_inference.yaml num_gpu: 8 trainable: False ppo_policy: model_config_file: ppo_policy.yaml num_gpu: 8 trainable: True ppo_value: model_config_file: ppo_value.yaml num_gpu: ${num_gpu:16} trainable: True runtime: colocation: - policy,ppo_policy,reward,reference,value,ppo_value generation_batch_size: ${generation_batch_size:4} train_micro_batch_size: 2 train_global_batch_size: ${train_global_batch_size:512} num_episode: 200 sample_per_episode: ${sample_per_episode:1024} num_training_epoch: 1 save_episode_interval: ${save_episode_interval:50} data_path: ${data_path} eval_episode_interval: ${eval_episode_interval:100} ``` ## Model Configuration YAML This framework supports separate configuration files for each model, which can be used to configure hyperparameters, parallelization strategies, checkpoint initialization, and more for different models. The model configuration file is in YAML format. Here is a simple example of a model configuration: ```yaml num_layers: 6 hidden_size: 768 num_attention_heads: 12 bf16: True seq_length: 2048 tensor_model_parallel_size: 8 pipeline_model_parallel_size: 2 load: path-to-ckpt ``` To simplify the sharing of configuration across different models, we have extended the syntax of YAML by introducing the `include` field to inherit configurations from a base configuration file. In the example below, `policy_inference.yaml` and `ppo_policy.yaml` share parameters such as `num_layers` and `hidden_size`, while each model has its own specific `pipeline_model_parallel_size` configuration. ![yaml](../images/yaml.jpg)