# Foundation Model 开发速查表

- 来源：EleutherAI：Blog
- 发布时间：2024-02-29 17:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnxbjhcv0041sln09f4xepv5
- 原文链接：https://blog.eleuther.ai/fm-dev-cheatsheet

## AI 摘要

发布全新资源 FM Dev Cheatsheet，这是一份面向 Foundation Model 开发的实用速查表。该资源旨在为开发者提供基础模型开发全流程的关键参考信息，涵盖架构设计、训练优化及部署等环节的核心要点，帮助快速查阅技术规范与最佳实践，提升开发效率与项目质量。

## 正文

The pace of foundation model releases and progress has continued to grow rapidly over the past few years, with many new models released from organizations of all kinds worldwide. In addition to releasing models themselves, it's also important to make the tools to create these models - large-scale training libraries, data processing and creation tooling, and more - widely available. In April 2023 we released the Pythia model suite, the first LLMs with a fully released and reproducible technical pipeline from start to finish. We are excited to see other organizations following suit, with the LLM360 project releasing Amber later that year and AI2’s OLMo as fully-transparent artifact releases across the entire language model development process. Additionally, many other orgs have released new tools for underserved aspects of the development pipeline. Without full-pipeline transparency, accountability for undisclosed design decisions is prevented, and independent research and auditing are limited in their ability to draw robust conclusions or accurately assess harms.

As a continuation of EleutherAI’s mission to lower barriers to entry of research and provide mentorship and educational resources about large-scale AI model development, we have collaborated with researchers from MIT, AI2, Hugging Face, Stanford, Princeton, Masakhane, MLCommons, and more to release “The Foundation Model Development Cheatsheet”, a quick-start guide to familiarize new developers with useful tools and resources for developing new open models. The topics covered span the entire model development cycle, from data collection to licensing and release practices, and are aimed to give a jumping-off point and high level survey of all the important steps for responsibly and successfully developing new models. We hope that the Cheatsheet will be a useful learning resource and reference for newer developers to be exposed to not just the technical aspects of model creation, which rightfully receives much attention already, but also the crucially important good practices around responsible development practices and release management.

We hope the Cheatsheet will be a useful entry point into responsible and well-documented model development, and help raise awareness of these crucial issues. You can read the paper for full details, or explore the collection of resources interactively via the interactive website. It is intended as a living resource–all are welcome to submit new resources and be recognized for their contributions!