在一些大型神经网络中,每个参数需要使用32位或64位浮点数进行存储,这意味着每个参数需要占用4字节或8字节的存储空间。因此,对于包含70亿个参数的神经网络,其存储空间将分别为8 GB或12GB。
因此这种体量的模型单机跑绝对够我们喝一壶,所以本次使用最小的LLaMA 7B模型进行测试。
LLaMA项目安装和模型配置
llama.cpp首先适配的就是苹果的M系列芯片,这对于果粉来说无疑是一个重大利好,首先通过命令拉取C++版本的LLaMA项目:
git clone https://github.com/ggerganov/llama.cpp
随后进入项目目录:
llama.cpp
在项目中,需要单独建立一个模型文件夹models:
mkdir models
随后去huggingface官网下载LLaMA的7B模型文件:https://huggingface.co/nyanko7/LLaMA-7B/tree/main
随后在models目录建立模型子目录7B:
mkdir 7B
将tokenizer.model和tokenizer_checklist.chk放入和7B平行的目录中:
➜ models git:(master ✗ ls
7B tokenizer.model tokenizer_checklist.chk
随后将checklist.chk consolidated.00.pth和params.json放入7B目录中:
➜ 7B git:(master ✗ ls
checklist.chk consolidated.00.pth params.json
至此,模型就配置好了。
LLaMA模型转换
这里通过Python脚本进行转换操作:
python3 convert-pth-to-ggml.py models/7B/ 1
第一个参数是模型所在目录,第二个参数为转换时使用的浮点类型,使用 float32,转换的结果文件会大一倍,当该参数值为 1时,则使用 float16 这个默认值,这里我们使用默认数据类型。
程序输出:
➜ llama.cpp git:(master ✗ python convert-pth-to-ggml.py models/7B/ 1
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}
n_parts = 1
Processing part 0
Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096] and type: torch.float16
Processing variable: norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: output.weight with shape: torch.Size([32000, 4096] and type: torch.float16
Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008] and type: torch.float16
Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008] and type: torch.float16
Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008] and type: torch.float16
Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096] and type: torch.float16
Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096] and type: torch.float16
Converting to float32
Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096] and type: torch.float16
Processing variable: layers.3.attention.wo.weight w