国产亚洲成αV片在线观看_国产免费无遮挡色视频网站_久久中文字幕无码高清视频_午夜福利理论片高清在线观看

背景：

目前，大模型的技術應用已經遍地開花。最快的應用方式無非是利用自有垂直領域的數據進行模型微調。chatglm2-6b在國內開源的大模型上，效果比較突出。本文章分享的內容是用chatglm2-6b模型在集團EA的P40機器上進行垂直領域的LORA微調。

一、chatglm2-6b介紹

github： https://github.com/THUDM/ChatGLM2-6B

chatglm2-6b相比于chatglm有幾方面的提升：

1. 性能提升： 相比初代模型，升級了 ChatGLM2-6B 的基座模型，同時在各項數據集評測上取得了不錯的成績；

2. 更長的上下文： 我們將基座模型的上下文長度（Context Length）由 ChatGLM-6B 的 2K 擴展到了 32K，并在對話階段使用 8K 的上下文長度訓練；

3. 更高效的推理： 基于 Multi-Query Attention 技術，ChatGLM2-6B 有更高效的推理速度和更低的顯存占用：在官方的模型實現下，推理速度相比初代提升了 42%；

4. 更開放的協議：ChatGLM2-6B 權重對學術研究完全開放，在填寫問卷進行登記后亦允許免費商業使用。

二、微調環境介紹

2.1 性能要求

推理這塊，chatglm2-6b在精度是fp16上只需要14G的顯存，所以P40是可以cover的。

EA上P40顯卡的配置如下：

2.2 鏡像環境

做微調之前，需要編譯環境進行配置，我這塊用的是docker鏡像的方式來加載鏡像環境，具體配置如下：

FROM base-clone-mamba-py37-cuda11.0-gpu

# mpich
RUN yum install mpich  

# create my own environment
RUN conda create -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ --override --yes --name py39 python=3.9
# display my own environment in Launcher
RUN source activate py39 
    && conda install --yes --quiet ipykernel 
    && python -m ipykernel install --name py39 --display-name "py39"

# install your own requirement package
RUN source activate py39 
    && conda install -y -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/ 
    pytorch  torchvision torchaudio faiss-gpu 
    && pip install --no-cache-dir  --ignore-installed -i https://pypi.tuna.tsinghua.edu.cn/simple 
    protobuf 
    streamlit 
    transformers==4.29.1 
    cpm_kernels 
    mdtex2html 
    gradio==3.28.3 
	sentencepiece 
	accelerate 
	langchain 
    pymupdf 
	unstructured[local-inference] 
	layoutparser[layoutmodels,tesseract] 
	nltk~=3.8.1 
	sentence-transformers 
	beautifulsoup4 
	icetk 
	fastapi~=0.95.0 
	uvicorn~=0.21.1 
	pypinyin~=0.48.0 
    click~=8.1.3 
    tabulate 
    feedparser 
    azure-core 
    openai 
    pydantic~=1.10.7 
    starlette~=0.26.1 
    numpy~=1.23.5 
    tqdm~=4.65.0 
    requests~=2.28.2 
    rouge_chinese 
    jieba 
    datasets 
    deepspeed 
	pdf2image 
	urllib3==1.26.15 
    tenacity~=8.2.2 
    autopep8 
    paddleocr 
    mpi4py 
    tiktoken

如果需要使用deepspeed方式來訓練， EA上缺少mpich信息傳遞工具包，需要自己手動安裝。

2.3 模型下載

huggingface地址： https://huggingface.co/THUDM/chatglm2-6b/tree/main

三、LORA微調

3.1 LORA介紹

paper： https://arxiv.org/pdf/2106.09685.pdf

LORA（Low-Rank Adaptation of Large Language Models）微調方法：凍結預訓練好的模型權重參數，在凍結原模型參數的情況下，通過往模型中加入額外的網絡層，并只訓練這些新增的網絡層參數。

LoRA 的思想：

?在原始 PLM (Pre-trained Language Model) 旁邊增加一個旁路，做一個降維再升維的操作。

?訓練的時候固定 PLM 的參數，只訓練降維矩陣A與升維矩B。而模型的輸入輸出維度不變，輸出時將BA與 PLM 的參數疊加。

?用隨機高斯分布初始化A，用 0 矩陣初始化B，保證訓練的開始此旁路矩陣依然是 0 矩陣。

3.2 微調

huggingface提供的peft工具可以方便微調PLM模型，這里也是采用的peft工具來創建LORA。

peft的github： https://gitcode.net/mirrors/huggingface/peft?utm_source=csdn_github_accelerator

加載模型和lora微調：

    # load model
    tokenizer = AutoTokenizer.from_pretrained(args.model_dir, trust_remote_code=True)
    model = AutoModel.from_pretrained(args.model_dir, trust_remote_code=True)
    
    print("tokenizer:", tokenizer)
    
    # get LoRA model
    config = LoraConfig(
        r=args.lora_r,
        lora_alpha=32,
        lora_dropout=0.1,
        bias="none",)
    
    # 加載lora模型
    model = get_peft_model(model, config)
    # 半精度方式
    model = model.half().to(device)

這里需要注意的是，用huggingface加載本地模型，需要創建work文件，EA上沒有權限在沒有在.cache創建，這里需要自己先制定work路徑。

import os
os.environ['TRANSFORMERS_CACHE'] = os.path.dirname(os.path.abspath(__file__))+"/work/"
os.environ['HF_MODULES_CACHE'] = os.path.dirname(os.path.abspath(__file__))+"/work/"

如果需要用deepspeed方式訓練，選擇你需要的zero-stage方式：

    conf = {"train_micro_batch_size_per_gpu": args.train_batch_size,
            "gradient_accumulation_steps": args.gradient_accumulation_steps,
            "optimizer": {
                "type": "Adam",
                "params": {
                    "lr": 1e-5,
                    "betas": [
                        0.9,
                        0.95
                    ],
                    "eps": 1e-8,
                    "weight_decay": 5e-4
                }
            },
            "fp16": {
                "enabled": True
            },
            "zero_optimization": {
                "stage": 1,
                "offload_optimizer": {
                    "device": "cpu",
                    "pin_memory": True
                },
                "allgather_partitions": True,
                "allgather_bucket_size": 2e8,
                "overlap_comm": True,
                "reduce_scatter": True,
                "reduce_bucket_size": 2e8,
                "contiguous_gradients": True
            },
            "steps_per_print": args.log_steps
            }

其他都是數據處理處理方面的工作，需要關注的就是怎么去構建prompt，個人認為在領域內做微調構建prompt非常重要，最終對模型的影響也比較大。

四、微調結果

目前模型還在finetune中，batch=1，epoch=3，已經迭代一輪。