Show HN：大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南

2026-04-24 20:30·69天前·ynarwal__

AI 摘要

基于AI专家卡帕西讲座的交互式图解指南已发布，详细解析大型语言模型（LLMs）的工作原理。该指南以动态可视化形式呈现，简化了LLMs的架构、训练和推理过程等复杂概念，提升学习可访问性。在Hacker News社区获得103个赞，显示其受关注程度。用户可通过在线链接直接体验这一教育工具，深入了解LLMs内部机制。

原文 · 未翻译

How LLMsActually Work

A complete walkthrough of how large language models like ChatGPT are built — from raw internet text to a conversational assistant. Based on Andrej Karpathy's technical deep dive.

Representative figures from frontier models circa 2024 — exact numbers shift with every release. The scale is the point, not the precision.

Downloadingthe Internet

The first step is collecting an enormous amount of text. Organizations like Common Crawl have been crawling the web since 2007 — indexing 2.7 billion pages by 2024. This raw data is then filtered into a high-quality dataset like FineWeb.

The goal: large quantity of high quality, diverse documents. After aggressive filtering, you end up with about 44 terabytes — roughly 10 consumer hard drives worth of text — representing ~15 trillion tokens.

Click any stage to read more detail

Tokenization

Neural networks can't process raw text — they need numbers. The solution is tokenization: breaking text into "tokens" (sub-word chunks) and assigning each an ID.

GPT-4 uses a vocabulary of 100,277 tokens, built via the Byte Pair Encoding (BPE) algorithm. BPE starts with individual bytes (256 symbols), then iteratively merges the most frequent adjacent pairs — compressing the sequence length while expanding the vocabulary.

BPE in Action

Training theNeural Network

The Transformer neural network is initialized with random parameters — billions of "knobs". Training adjusts these knobs so the network gets better at predicting the next token in any sequence.

Every training step: sample a window of tokens → feed to network → compare prediction to actual next token → nudge all parameters slightly in the right direction. Repeat billions of times.

The loss — a single number measuring prediction error — falls steadily as the model learns the statistical patterns of human language.

Transformer Architecture

Select a training stage to see model output quality

Model Output at This Stage

Inference &Token Sampling

Once trained, the network generates text autoregressively: feed a sequence of tokens → get a probability distribution over all 100K possible next tokens → sample one → append → repeat.

This process is stochastic — the same prompt generates different outputs every time because we're flipping a biased coin. Higher-probability tokens are more likely but not guaranteed to be chosen.

Temperature controls randomness. Low temperature (0.1) → model always picks the top token. High temperature (2.0) → uniform chaos. 0.7–1.0 is the sweet spot for coherent-but-creative text.

Hacker News 热门（buzzing.cc 中文翻译）

58导出 Markdown

Show HN：大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南

2026-04-24 20:30·69天前·ynarwal__

阅读原文· ynarwal.github.io

AI 摘要

原文 · 保持原样，未翻译

How LLMsActually Work

A complete walkthrough of how large language models like ChatGPT are built — from raw internet text to a conversational assistant. Based on Andrej Karpathy's technical deep dive.

Representative figures from frontier models circa 2024 — exact numbers shift with every release. The scale is the point, not the precision.

Downloadingthe Internet

Show HN： 大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南

Show HN： 大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南

Show HN：大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南

Show HN：大型语言模型的工作原理--基于卡帕西（Karpathy）讲座的交互式图解指南