最高の We Can Make This Work Probably ポッドキャスト (2024)

1
[QA] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective 7:59

18h ago7:59

7:59

This study analyzes layer-wise gradients in LLMs, revealing that slow thinking enhances learning stability and response correctness, while fast thinking shows larger gradient variations. https://arxiv.org/abs//2410.23743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple…

1
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective 15:27

18h ago15:27

15:27

This study analyzes layer-wise gradients in LLMs, revealing that slow thinking enhances learning stability and response correctness, while fast thinking shows larger gradient variations. https://arxiv.org/abs//2410.23743 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple…

1
[QA] Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 7:28

1d ago7:28

7:28

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by treating model parameters as tokens, allowing for flexible scaling without retraining, significantly reducing computational costs. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple…

1
Tokenformer: Rethinking Transformer Scaling with Tokenized Model Parameters 19:38

1d ago19:38

19:38

Tokenformer introduces a scalable architecture that enhances Transformers' efficiency by treating model parameters as tokens, allowing for flexible scaling without retraining, significantly reducing computational costs. https://arxiv.org/abs//2410.23168 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple…

1
E37 - Ranking the Presidents! (Part One: 1789-1901) 1:32:00

3d ago1:32:00

1:32:00

“We should not look back unless it is to derive useful lessons from past errors, and for the purpose of profiting by dearly bought experience.” - George Washington With the upcoming 2024 Election, Erik and Justin attempt to rank the US Presidents! (Part 1: Washington to McKinley / 1789-1901) You can help support Erik by buying a copy of his book, "…

1
[QA] Where Do Large Learning Rates Lead Us? 8:30

3d ago8:30

8:30

This study investigates optimal initial learning rates for neural networks, finding a narrow range enhances generalization by locating high-quality minima and focusing on relevant features, unlike extreme rates. https://arxiv.org/abs//2410.22113 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
Where Do Large Learning Rates Lead Us? 28:43

3d ago28:43

28:43

This study investigates optimal initial learning rates for neural networks, finding a narrow range enhances generalization by locating high-quality minima and focusing on relevant features, unlike extreme rates. https://arxiv.org/abs//2410.22113 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
[QA] Fourier Head: Helping Large Language Models Learn Complex Probability Distributions 7:10

3d ago7:10

7:10

The paper introduces a Fourier series-based neural network layer to improve continuous token modeling in decision-making and time series tasks, enhancing performance in various benchmarks. https://arxiv.org/abs//2410.22269 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions 13:56

3d ago13:56

13:56

The paper introduces a Fourier series-based neural network layer to improve continuous token modeling in decision-making and time series tasks, enhancing performance in various benchmarks. https://arxiv.org/abs//2410.22269 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.app…

1
[QA] LoRA vs Full Fine-tuning: An Illusion of Equivalence 7:47

4d ago7:47

7:47

This study analyzes the differences between full fine-tuning and LoRA in large language models, revealing distinct weight matrix structures and generalization behaviors despite similar performance on tasks. https://arxiv.org/abs//2410.21228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
LoRA vs Full Fine-tuning: An Illusion of Equivalence 13:44

4d ago13:44

13:44

This study analyzes the differences between full fine-tuning and LoRA in large language models, revealing distinct weight matrix structures and generalization behaviors despite similar performance on tasks. https://arxiv.org/abs//2410.21228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: ht…

1
[QA] Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? 6:57

5d ago6:57

6:57

Vision-Language Models show promise in reasoning across text and images but struggle with basic visual concepts, revealing significant gaps in their understanding and generalization abilities. https://arxiv.org/abs//2410.19546 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? 8:44

5d ago8:44

8:44

Vision-Language Models show promise in reasoning across text and images but struggle with basic visual concepts, revealing significant gaps in their understanding and generalization abilities. https://arxiv.org/abs//2410.19546 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts…

1
[QA] Computational Bottlenecks of Training Small-scale Large Language Models 8:10

5d ago8:10

8:10

This study investigates the training behavior and computational requirements of Small-scale Large Language Models (SLMs), focusing on hyperparameters and configurations to enhance efficiency and support low-resource AI research. https://arxiv.org/abs//2410.19456 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
Computational Bottlenecks of Training Small-scale Large Language Models 9:57

5d ago9:57

9:57

This study investigates the training behavior and computational requirements of Small-scale Large Language Models (SLMs), focusing on hyperparameters and configurations to enhance efficiency and support low-resource AI research. https://arxiv.org/abs//2410.19456 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_pap…

1
[QA] Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees 9:12

7d ago9:12

9:12

This paper introduces a hybrid approach combining physics-informed neural networks and cylindrical approximation to efficiently solve functional differential equations, addressing computational challenges and improving numerical analysis. https://arxiv.org/abs//2410.18153 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/…

1
Physics-informed Neural Networks for Functional Differential Equations: Cylindrical Approximation and Its Convergence Guarantees 19:53

7d ago19:53

19:53

This paper introduces a hybrid approach combining physics-informed neural networks and cylindrical approximation to efficiently solve functional differential equations, addressing computational challenges and improving numerical analysis. https://arxiv.org/abs//2410.18153 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/…

1
[QA] A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration 8:04

7d ago8:04

8:04

This paper shows that integrating coherent reasoning in Few-shot Chain-of-Thought prompting enhances transformer performance, revealing sensitivity to errors in intermediate steps and proposing improvements using varied reasoning paths. https://arxiv.org/abs//2410.16540 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@a…

1
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration 18:20

7d ago18:20

18:20

This paper shows that integrating coherent reasoning in Few-shot Chain-of-Thought prompting enhances transformer performance, revealing sensitivity to errors in intermediate steps and proposing improvements using varied reasoning paths. https://arxiv.org/abs//2410.16540 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@a…

1
HALLOWEEN (1978 /2018) - A Disassembled Double Feature! 2:32:53

8d ago2:32:53

2:32:53

"Death has come to your little town, Sheriff." - Dr. Loomis For our very first "Disassembled" Double Feature, Zack and Erik are joined once again by Stephen White from Horror Ramblings to talk about some creepy movies for spooky season... that's right, we're finally watching "Halloween"! Both the John Carpenter classic from 1978 *and* the 2018 lega…

1
[QA] LEGO: Language Model Building Blocks 7:19

8d ago7:19

7:19

LEGO is a novel technique for extracting and recombining small language models from large language models, enhancing efficiency, robustness, and user data privacy while reducing costs. https://arxiv.org/abs//2410.18287 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
LEGO: Language Model Building Blocks 16:46

8d ago16:46

16:46

LEGO is a novel technique for extracting and recombining small language models from large language models, enhancing efficiency, robustness, and user data privacy while reducing costs. https://arxiv.org/abs//2410.18287 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.c…

1
[QA] Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data 8:13

8d ago8:13

8:13

This study explores knowledge distillation from Llama-3.1-405B to smaller models, demonstrating improved accuracy and efficiency through synthetic data and diverse evaluation methods across various tasks. https://arxiv.org/abs//2410.18588 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
Knowledge Distillation Using Frontier Open-Source LLMs: Generalizability and the Role of Synthetic Data 19:45

8d ago19:45

19:45

This study explores knowledge distillation from Llama-3.1-405B to smaller models, demonstrating improved accuracy and efficiency through synthetic data and diverse evaluation methods across various tasks. https://arxiv.org/abs//2410.18588 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
College Experience | The Military Should ALWAYS Be An Option 1:09:37

9d ago1:09:37

1:09:37

Guest Episode with IT System Administrator, Nathan Jewell! Takeaways Preparation for life after high school is crucial. Many students lack guidance in college applications. Community college can be a stepping stone. Military service offers educational benefits. Work experience is valuable for personal growth. Practical skills can lead to career opp…

1
[QA] Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers 8:09

9d ago8:09

8:09

This paper explores how Rotary Positional Embeddings (RoPE) affect Transformer model dynamics, introducing phase shifts that influence embeddings, information retention, and attention through oscillatory behaviors and frequency components. https://arxiv.org/abs//2410.18067 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…

1
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers 20:50

9d ago20:50

20:50

This paper explores how Rotary Positional Embeddings (RoPE) affect Transformer model dynamics, introducing phase shifts that influence embeddings, information retention, and attention through oscillatory behaviors and frequency components. https://arxiv.org/abs//2410.18067 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com…

1
[QA] ALTA: Compiler-Based Analysis of Transformers 7:29

9d ago7:29

7:29

ALTA is a new programming language and compiler that maps programs to Transformer weights, enabling loop expression and improved algorithm representation, while providing tools for analyzing training challenges. https://arxiv.org/abs//2410.18077 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
ALTA: Compiler-Based Analysis of Transformers 22:56

9d ago22:56

22:56

ALTA is a new programming language and compiler that maps programs to Transformer weights, enabling loop expression and improved algorithm representation, while providing tools for analyzing training challenges. https://arxiv.org/abs//2410.18077 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcast…

1
[QA] UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs 8:06

10d ago8:06

8:06

This paper introduces UNSTAR, a novel unlearning method for large language models using anti-samples to efficiently and selectively reverse learned associations, enhancing privacy and model modification capabilities. https://arxiv.org/abs//2410.17050 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs 16:43

10d ago16:43

16:43

This paper introduces UNSTAR, a novel unlearning method for large language models using anti-samples to efficiently and selectively reverse learned associations, enhancing privacy and model modification capabilities. https://arxiv.org/abs//2410.17050 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Po…

1
[QA] Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing 7:48

10d ago7:48

7:48

This paper explores how Knowledge Editing algorithms can unintentionally distort model representations, leading to decreased factual recall and reasoning abilities, a phenomenon termed "representation shattering." https://arxiv.org/abs//2410.17194 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing 18:06

10d ago18:06

18:06

This paper explores how Knowledge Editing algorithms can unintentionally distort model representations, leading to decreased factual recall and reasoning abilities, a phenomenon termed "representation shattering." https://arxiv.org/abs//2410.17194 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
[QA] Generative Reward Models 7:10

11d ago7:10

7:10

The paper proposes GenRM, a hybrid approach combining RLHF and RLAIF, improving synthetic preference labels' quality and outperforming existing models in both in-distribution and out-of-distribution tasks. https://arxiv.org/abs//2410.12832 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
Generative Reward Models 12:09

11d ago12:09

12:09

The paper proposes GenRM, a hybrid approach combining RLHF and RLAIF, improving synthetic preference labels' quality and outperforming existing models in both in-distribution and out-of-distribution tasks. https://arxiv.org/abs//2410.12832 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: htt…

1
[QA] Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks 8:07

12d ago8:07

8:07

This paper presents an AI agent for error resolution in computational notebooks, enhancing bug-fixing capabilities while evaluating user experience and collaboration within the JetBrains Datalore service. https://arxiv.org/abs//2410.14393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks 10:28

12d ago10:28

10:28

This paper presents an AI agent for error resolution in computational notebooks, enhancing bug-fixing capabilities while evaluating user experience and collaboration within the JetBrains Datalore service. https://arxiv.org/abs//2410.14393 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: http…

1
[QA] Decomposing The Dark Matter of Sparse Autoencoders 7:58

12d ago7:58

7:58

This study explores "dark matter" in sparse autoencoders, revealing that much unexplained variance can be predicted and proposing methods to reduce nonlinear error in model activations. https://arxiv.org/abs//2410.14670 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.…

1
Decomposing The Dark Matter of Sparse Autoencoders 15:36

12d ago15:36

15:36

This study explores "dark matter" in sparse autoencoders, revealing that much unexplained variance can be predicted and proposing methods to reduce nonlinear error in model activations. https://arxiv.org/abs//2410.14670 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.…

1
[QA] A Hitchhiker's Guide to Scaling Law Estimation 10:08

13d ago10:08

10:08

The paper analyzes scaling laws in machine learning, providing best practices for estimating model performance using a large dataset of pretrained models and emphasizing the importance of intermediate training checkpoints. https://arxiv.org/abs//2410.11840 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
A Hitchhiker's Guide to Scaling Law Estimation 17:28

13d ago17:28

17:28

The paper analyzes scaling laws in machine learning, providing best practices for estimating model performance using a large dataset of pretrained models and emphasizing the importance of intermediate training checkpoints. https://arxiv.org/abs//2410.11840 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Ap…

1
[QA] Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations 7:39

13d ago7:39

7:39

This paper presents a novel method for image inversion and editing using rectified flow models, achieving superior performance in zero-shot tasks compared to existing diffusion model approaches. https://arxiv.org/abs//2410.10792 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcas…

1
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations 10:29

13d ago10:29

10:29

This paper presents a novel method for image inversion and editing using rectified flow models, achieving superior performance in zero-shot tasks compared to existing diffusion model approaches. https://arxiv.org/abs//2410.10792 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcas…

1
[QA] Looking Inward: Language Models Can Learn About Themselves by Introspection 7:28

14d ago7:28

7:28

The paper explores whether large language models (LLMs) can introspect, finding that finetuned models can predict their own behavior, suggesting a form of internal knowledge access. https://arxiv.org/abs//2410.13787 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
Looking Inward: Language Models Can Learn About Themselves by Introspection 26:21

14d ago26:21

26:21

The paper explores whether large language models (LLMs) can introspect, finding that finetuned models can predict their own behavior, suggesting a form of internal knowledge access. https://arxiv.org/abs//2410.13787 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/…

1
[QA] Thinking LLMs: General Instruction Following with Thought Generation 7:25

14d ago7:25

7:25

The paper proposes a method to enhance LLMs' thinking abilities for better instruction following, improving performance across various tasks without additional human data through iterative search and optimization. https://arxiv.org/abs//2410.10630 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
Thinking LLMs: General Instruction Following with Thought Generation 15:52

14d ago15:52

15:52

The paper proposes a method to enhance LLMs' thinking abilities for better instruction following, improving performance across various tasks without additional human data through iterative search and optimization. https://arxiv.org/abs//2410.10630 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podca…

1
[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs 7:44

15d ago7:44

7:44

The paper investigates extreme-token phenomena in transformer-based LLMs, revealing mechanisms behind attention sinks and proposing strategies to mitigate their impact during pretraining. https://arxiv.org/abs//2410.13835 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…

1
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs 17:44

15d ago17:44

17:44

The paper investigates extreme-token phenomena in transformer-based LLMs, revealing mechanisms behind attention sinks and proposing strategies to mitigate their impact during pretraining. https://arxiv.org/abs//2410.13835 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.appl…

1
[QA] MOVIE GEN: A Cast of Media Foundation Models 8:52

15d ago8:52

8:52

https://arxiv.org/abs//2410.13720 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/supp…

聞く価値のあるポッドキャスト

ポッドキャスト We Can Make This Work Probably

聞く価値のあるポッドキャスト

クイックリファレンスガイド