ChatGLM

#ChatGLM| 来源: 网络整理| 查看: 265

YZXPG开发笔记 (ChatGLM-6B本地部署安装记录)

GitHub官方地址：github.com/THUDM/ChatG… INT4 量化级别下的模型： huggingface.co/THUDM/chatg…

ChatGLM简介： ChatGLM -- 开源、支持中英双语的对话的语言模型，基于GLM架构，可在显卡上进行本地部署（INT4 量化级别下最低只需 6GB 显存），可以finetune

本机系统环境：Windows11，显卡 2060，python 3.10，cuda11.6，cudnn11

需要把cuda里的 bin， include， lib 文件夹用 cudnn 里的替换

重点pytorch

用Conda创建python 3.10的环境

下载cuda ：developer.nvidia.com/cuda-toolki…

下载cuDNN ：developer.nvidia.com/rdp/cudnn-d…

cuDNN下载时需要注册登录填写问卷调查下载 pytorch ： pytorch.org/get-started…

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia! image.png 创建文件夹克隆仓库代码 git clone https://github.com/THUDM/ChatGLM-6B.git 进入文件夹安装依赖 pip install -r requirements.txt pip install gradio

修改web_demo.py代码

model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(4).half().cuda()

根据本机显存修改，12G显存可修改为 .quantize(8)，6G显存 .quantize(4)

根据是否可在局域网使用，可修改最后一行为 demo.queue().launch(share=True,server_name="0.0.0.0",server_port=9234)

官方介绍里有提及到：注意，由于国内 Gradio 的网络访问较为缓慢，启用 demo.queue().launch(share=True, inbrowser=True) 时所有网络会经过 Gradio 服务器转发，导致打字机体验大幅下降，现在默认启动方式已经改为 share=False，如有需要公网访问的需求，可以重新修改为 share=True 启动。

运行后可在本机的IP地址，端口号为9234下访问 web端下运行python web_demo.py

运行后的模型文件夹一般缓存在 C:\Users\xxx\.cache\huggingface 也可以在huggingface上（ huggingface.co/THUDM/chatg… ）下载模型，放入指定文件夹，修改引用路径 tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

model = AutoModel.from_pretrained("THUDM/chatglm-6b",trust_remote_code=True).quantize(4).half().cuda()

【本文地址】

公司简介

联系我们