GitHub仓库中AI使用特征与演变的实证研究:来自代码注释的证据
阅读原文· arxiv.org研究分析了35,361条明确提及AI的GitHub代码注释及关联代码块,通过开放编码建立AI辅助开发活动分类法,并使用LLM分类器与Dawid-Skene模型标注全量数据。还分析了12,996条后续提交消息,追踪代码演变及2022年12月至2026年3月的时间趋势。结果显示,开发者主要用LLM进行代码实现,其次是增强、调试、文档和测试。后续提交常涉及重构清理、功能集成和Bug修复。随时间推移,AI引用从直接代码生成转向知识支持和代码增强。AI工具正嵌入为协作支持机制。
Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.