# Safetensors 通过安全审计并将成为默认格式

- 来源：EleutherAI：Blog
- 发布时间：2023-05-23 09:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnxbjhcw004dsln0lnkolclu
- 原文链接：https://blog.eleuther.ai/safetensors-security-audit

## AI 摘要

Hugging Face 联合 EleutherAI 与 Stability AI 委托 Trail of Bits 对 Safetensors 库完成独立安全审计，结果证实该库安全可靠，具备成为默认格式的条件。三方组织宣布将推动 Safetensors 作为模型保存的默认格式。完整审计报告已公开发布，相关博客文章将进一步阐述该库的技术背景与后续实施计划。

## 正文

Audit shows that safetensors is safe and ready to become the default

Hugging Face, in close collaboration with EleutherAI and Stability AI, has ordered an external security audit of the safetensors library, the results of which allow all three organizations to move toward making the library the default format for saved models.

safetensors

The full results of the security audit, performed by Trail of Bits, can be found here: Report.

The following blog post explains the origins of the library, why these audit results are important, and the next steps.

What is safetensors?#

🐶Safetensors is a library for saving and loading tensors in the most common frameworks (including PyTorch, TensorFlow, JAX, PaddlePaddle, and NumPy).

For a more concrete explanation, we'll use PyTorch.

import torch from safetensors.torch import load_file, save_file weights = {"embeddings": torch.zeros((10, 100))} save_file(weights, "model.safetensors") weights2 = load_file("model.safetensors")

import torch from safetensors.torch import load_file, save_file weights = {"embeddings": torch.zeros((10, 100))} save_file(weights, "model.safetensors") weights2 = load_file("model.safetensors")

It also has a number of cool features compared to other formats, most notably that loading files is safe, as we'll see later.

When you're using transformers, if safetensors is installed, then those files will already be used preferentially in order to prevent issues, which means that

transformers

safetensors

pip install safetensors

pip install safetensors

is likely to be the only thing needed to run safetensors files safely.

safetensors

Going forward and thanks to the validation of the library, safetensors will now be installed in transformers by default. The next step is saving models in safetensors by default.

safetensors

transformers

safetensors

We are thrilled to see that the safetensors library is already seeing use in the ML ecosystem, including:

safetensors

Civitai

Stable Diffusion Web UI

dfdx

LLaMA.cpp

Why create something new?#

The creation of this library was driven by the fact that PyTorch uses pickle under the hood, which is inherently unsafe. (Sources: 1, 2, video, 3)

pickle

With pickle, it is possible to write a malicious file posing as a model that gives full control of a user's computer to an attacker without the user's knowledge, allowing the attacker to steal all their bitcoins 😓.

While this vulnerability in pickle is widely known in the computer security world (and is acknowledged in the PyTorch docs), it’s not common knowledge in the broader ML community.

Since the Hugging Face Hub is a platform where anyone can upload and share models, it is important to make efforts to prevent users from getting infected by malware.

We are also taking steps to make sure the existing PyTorch files are not malicious, but the best we can do is flag suspicious-looking files.

Of course, there are other file formats out there, but none seemed to meet the full set of ideal requirements our team identified.

In addition to being safe, safetensors allows lazy loading and generally faster loads (around 100x faster on CPU).

safetensors

Lazy loading means loading only part of a tensor in an efficient manner. This particular feature enables arbitrary sharding with efficient inference libraries, such as text-generation-inference, to load LLMs (such as LLaMA, StarCoder, etc.) on various types of hardware with maximum efficiency.

Because it loads so fast and is framework agnostic, we can even use the format to load models from the same file in PyTorch or TensorFlow.

The security audit#

Since safetensors main asset is providing safety guarantees, we wanted to make sure it actually delivered. That's why Hugging Face, EleutherAI, and Stability AI teamed up to get an external security audit to confirm it.

safetensors

Important findings:

No critical security flaw leading to arbitrary code execution was found.

Some imprecisions in the spec format were detected and fixed.

Some missing validation allowed polyglot files, which was fixed.

Lots of improvements to the test suite were proposed and implemented.

In the name of openness and transparency, all companies agreed to make the report fully public.

Full report

One import thing to note is that the library is written in Rust. This adds an extra layer of security coming directly from the language itself.

While it is impossible to prove the absence of flaws, this is a major step in giving reassurance that safetensors is indeed safe to use.

safetensors

Going forward#

For Hugging Face, EleutherAI, and Stability AI, the master plan is to shift to using this format by default.

EleutherAI has added support for evaluating models stored as safetensors in their LM Evaluation Harness and is working on supporting the format in their GPT-NeoX distributed training library.

safetensors

Within the transformers library we are doing the following:

transformers

Create safetensors.

safetensors

Verify it works and can deliver on all promises (lazy load for LLMs, single file for all frameworks, faster loads).

Verify it's safe. (This is today's announcement.)

Make safetensors a core dependency. (This is already done or soon to come.)

safetensors

Make safetensors the default saving format. This will happen in a few months when we have enough feedback to make sure it will cause as little disruption as possible and enough users already have the library to be able to load new models even on relatively old transformers versions.

safetensors

transformers

As for safetensors itself, we're looking into adding more advanced features for LLM training, which has its own set of issues with current formats.

safetensors

Finally, we plan to release a 1.0 in the near future, with the large user base of transformers providing the final testing step. The format and the lib have had very few modifications since their inception, which is a good sign of stability.

1.0

transformers

We're glad we can bring ML one step closer to being safe and efficient for all!
