Chinese-Llama-2：增强 Llama-2 中文能力的大语言模型项目

Chinese-Llama-2 是由澳门大学和莫纳什大学研究团队开发的开源项目，旨在增强 Meta AI 的 Llama-2 大语言模型在中文理解、生成和翻译方面的能力。该项目提供了完整的训练代码、数据集和模型参数，支持 LoRA 微调、全参数指令微调和持续预训练等多种技术方案。

项目简介

Chinese-Llama-2 基于 Meta AI 发布的 Llama-2 模型进行中文适配优化。Llama-2 本身是一个强大的开源大语言模型，但在中文处理能力上存在不足。该项目通过三种关键技术路径来提升 Llama-2 的中文性能：

LoRA 参数高效微调：使用低秩适配技术，在最小化参数数量的同时优化模型性能
全参数指令微调：在中文指令数据集上微调所有参数，使模型充分适应中文语言特性
持续预训练：在大规模中文语料上继续预训练，捕捉中文语言的细微差别

截至目前，该项目在 GitHub 上已获得 443+ stars，是中文开源 LLM 领域的重要项目之一。

核心特性

多阶段训练方案：支持 LoRA 微调、全参数微调和持续预训练
完整训练代码：提供从数据预处理到模型训练的全流程代码
预训练模型权重：已发布 Chinese-Llama-2-7B 和 Chinese-Llama-2-LoRA-7B 模型
中文指令数据集：基于 BAAI/COIG 等公开数据集构建
多节点训练支持：支持分布式训练，可扩展至多节点环境
DeepSpeed 集成：使用 DeepSpeed 进行高效的大规模模型训练

技术栈

Python 3.8+：主要开发语言
PyTorch：深度学习框架
Transformers：Hugging Face 模型库（含自定义修改版本）
DeepSpeed：微软分布式训练框架
LoRA：低秩适配技术
Flash Attention：内存优化注意力机制

安装指南

环境要求

Python >= 3.8
CUDA 支持的 GPU（推荐多卡环境）
至少 32GB 显存（全参数微调）或 16GB 显存（LoRA 微调）

安装步骤

bash

# 克隆仓库
git clone https://github.com/longyuewangdcu/chinese-llama-2.git
cd chinese-llama-2

# 安装依赖
pip install -e ./transformers
pip install -r requirements.txt

# 如需使用 Flash Attention（推荐用于全参数微调）
pip install flash-attn==1.0.4

模型下载

LoRA 权重（参数高效微调）

Hugging Face: seeledu/Chinese-Llama-2-LoRA-7B
百度网盘: 链接（密码: zq4r）

全参数微调模型

Hugging Face: seeledu/Chinese-Llama-2-7B
百度网盘: 链接（密码: futk）

持续预训练模型

Chinese-Llama-2-7B-conpre: 微云链接

使用示例

LoRA 模型推理

bash

path=  # 项目路径
model_path=  # 原始模型路径
lora_model_path=  # LoRA 权重路径

python3 $path/test/inference_lora.py \
    --model-name-or-path $model_path \
    --lora-weights $lora_model_path \
    -t 0.7 \
    -sa 'sample' \
    -i $path/test/test_case.txt \
    -o $path/test/test_case.general-task.txt

全参数微调模型推理

bash

path=  # 项目路径
model_path=  # 模型路径

python3 $path/test/inference.py \
    --model-name-or-path $model_path \
    -t 0.7 \
    -sa 'sample' \
    -i $path/test/test_case.txt \
    -o $path/test/test_case.general-task.txt

训练指南

LoRA 微调

bash

# 设置多节点环境变量
export NCCL_DEBUG=INFO
export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

path=  # 项目路径
train_path=$path/train/run_clm_lora.py
model_path=$path/model/llama2-7B-HF
model_save=$path/checkpoint/chinese-llama2-7b-4096-enzh/

torchrun --nnodes 1 --node_rank 0 --nproc_per_node 8 \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT \
    ${train_path} \
    --deepspeed $path/train/deepspeed_config_bf16.json \
    --model_name_or_path ${model_path} \
    --train_file $path/data/instruction/all_instruction_hf.json \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --num_train_epochs 3 \
    --learning_rate 2e-5 \
    --use_lora True \
    --lora_config $path/train/lora_config.json \
    --bf16 True \
    --output_dir ${model_save}

全参数微调

bash

# 使用 Flash Attention 降低显存占用
export NCCL_DEBUG=INFO
export MASTER_ADDR="${CHIEF_IP:=localhost}"
export MASTER_PORT="${MASTER_PORT:=29500}"

path=  # 项目路径
train_path=$path/train/run_clm_llms_mem.py
model_path=$path/model/llama2-7B-HF
model_save=$path/checkpoint/llama2-7b-llama2_coig_dt_ca-all/

torchrun --nnodes 2 --node_rank $INDEX --nproc_per_node 8 \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT \
    ${train_path} \
    --deepspeed $path/train/deepspeed_config_bf16.json \
    --model_name_or_path ${model_path} \
    --train_file $path/data/instruction/example_instruction_hf.json \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 2 \
    --num_train_epochs 3 \
    --learning_rate 2e-5 \
    --bf16 True \
    --output_dir ${model_save}

性能对比

项目提供了与原始 Llama-2 7B Chat 的对比示例：

问题	Llama-2 7B Chat	Chinese-Llama-2-LoRA-7B	Chinese-Llama-2-7B
素数是什么？	英文回答，不理解"素数"	素数是那些没有正整数因子的整数	素数是一个大于一且只能被一和自身整除的数
科举制度是怎么样的	无回答	科举制度是中国古代的一种官员任命方式...	详细解释科举制度的考试选拔机制

数据集

项目使用以下数据集进行训练：

Chinese Alpaca 指令数据集：51K 中文指令示例
BAAI/COIG：大规模中文指令数据集
中文-英文文档级翻译数据集：用于翻译能力增强

项目链接

GitHub 仓库: https://github.com/longyuewangdcu/Chinese-Llama-2
Hugging Face 组织: https://huggingface.co/seeledu
论文引用: 详见项目 README 中的 Citation 部分

许可证

项目代码遵循相应开源许可证。模型使用需遵守 Llama-2 社区许可协议。

贡献与反馈

项目欢迎社区贡献。如有问题或建议，可通过 GitHub Issues 提交反馈。

字节笔记本

Chinese-Llama-2：增强 Llama-2 中文能力的大语言模型项目

项目简介

核心特性

技术栈

安装指南

环境要求

安装步骤

模型下载

LoRA 权重（参数高效微调）

全参数微调模型

持续预训练模型

使用示例

LoRA 模型推理

全参数微调模型推理

训练指南

LoRA 微调

全参数微调

性能对比

数据集

项目链接

许可证

贡献与反馈