[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

The Inside View

コンテンツは Michaël Trazzi によって提供されます。エピソード、グラフィック、ポッドキャストの説明を含むすべてのポッドキャストコンテンツは、Michaël Trazzi またはそのポッドキャストプラットフォームパートナーによって直接アップロードされ、提供されます。誰かがあなたの著作物をあなたの許可なく使用していると思われる場合は、ここで概説されているプロセスに従うことができますhttps://ja.player.fm/legal。

1+ y ago 1:17:21

M4A•エピソードのホーム

Aran Komatsuzaki is a ML PhD student at GaTech and lead researcher at EleutherAI where he was one of the authors on GPT-J. In June 2022 we recorded an episode on scaling following up on the first Ethan Caballero episode (where we mentioned Aran as an influence on how Ethan started thinking about scaling).

Note: For some reason I procrastinated on editing the podcast, then had a lot of in-person podcasts so I left this one as something to edit later, until the date was so distant from June 2022 that I thought publishing did not make sense anymore. In July 2023 I'm trying that "one video a day" challenge (well I missed some days but I'm trying to get back on track) so I thought it made sense to release it anyway, and after a second watch it's somehow interesting to see how excited Aran was about InstructGPT, which turned to be quite useful for things like ChatGPT.

Outline

(00:00) intro

(00:53) the legend of the two AKs, Aran's arXiv reading routine

(04:14) why Aran expects Alignment to be the same as some other ML problems

(05:44) what Aran means when he says "AGI"

(10:24) what Aran means by "human-level at doing ML research"

(11:31) software improvement happening before hardware improvement

(13:00) is scale all we need?

(15:25) how "Scaling Laws for Neural Language Models" changed the process of doing experiments

(16:22) how Aran scale-pilled Ethan

(18:46) why Aran was already scale-pilled before GPT-2

(20:12) Aran's 2019 scaling paper: "One epoch is all you need"

(25:43) Aran's June 2022 interest: T0 and InstructGPT

(31:33) Encoder-Decoder performs better than encoder if multi-task-finetuned

(33:30) Why the Scaling Law might be different for T0-like models

(37:15) The Story Behind GPT-J

(41:40) Hyperparameters and architecture changes in GPT-J

(43:56) GPT-J's throughput

(47:17) 5 weeks of training using 256 of TPU cores

(50:34) did publishing GPT-J accelerate timelines?

(55:39) how Aran thinks about Alignment, defining Alignment

(58:19) in practice: improving benchmarks, but deception is still a problem

(1:00:49) main difficulties in evaluating language models

(1:05:07) how Aran sees the future: AIs aligning AIs, merging with AIs, Aran's takeoff scenario

(1:10:09) what Aran thinks we should do given how he sees the next decade

(1:12:34) regulating access to AGI

(1:14:50) what might happen: preventing some AI authoritarian regime

(1:15:42) conclusion, where to find Aran

55 つのエピソード

#Tech #Michaël Trazzi