原文 · 未翻译
Over a year and a half have passed since EleutherAI's last retrospective, and a great deal of things have changed. In the first year, what started off as a Discord server created by some TPU enthusiasts grew into a much larger and more vibrant community. Since then, the EleutherAI collective has gone on to do many things, including becoming an inspirational launch point, stepping stone, and template for its members and many new organizations.
Given that we have so much to share in a second retrospective, we have condensed the important takeaways and announcements here. We look forward to sharing the full story soon!
Research#
from torch.nn import * def c(h,d,k,p,n):S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h));R=type('',(S,),{'forward':lambda s,x:s0+x});return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,1,k//2,1,h))),A(C(h,h,1))) for _ in [0]*d],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
from torch.nn import * def c(h,d,k,p,n):S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h));R=type('',(S,),{'forward':lambda s,x:s0+x});return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,1,k//2,1,h))),A(C(h,h,1))) for _ in [0]*d],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
from torch.nn import* def c(h,d,k,p,n):S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h));R=type('',(S,),{'forward':lambda s,x:s0+x});return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,1,k//2,1,h))),A(C(h,h,1)))for _ in[0]*d],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
from torch.nn import* def c(h,d,k,p,n):S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h));R=type('',(S,),{'forward':lambda s,x:s0+x});return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,1,k//2,1,h))),A(C(h,h,1)))for _ in[0]*d],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
EAI setting SotA in real time
EleutherAI members have authored 28 papers, trained dozens of models, and released 10 codebases in the past 18 months. Some notable highlights include:
This paper discusses our work on our largest-to-date open-source LLM. At time of release last February, it became the largest and most performant open-source autoregressive language model.
It took us about a year, but we finally wrote up our OG text-to-image work!
This BigScience-lead paper introduced the T0 language model and jumpstarted interest in task-structured data.
This paper, written for the NeurIPS Broadening Research Collaborations Workshop in ML, details our experience doing open collaborative science and gives an inside look into our thinking on an organizational level.
EleutherAI played a minor role in this paper, mostly supporting the interpretability work, compute, and HPC knowledge. It is a paper we are very excited about though, and a demonstration of both very high-quality interpretability research and the impact that sponsoring relatively small-scale trainings can have.