Show HN: Llama-8B Teaches Itself Baby Steps to Deep Research Using RL https://ift.tt/Z29yRzB

Maret 10, 2025

Show HN: Llama-8B Teaches Itself Baby Steps to Deep Research Using RL I've been tinkering with getting Llama-8B to bootstrap its own research skills through self-play. The model generates questions about documents, searches for answers, and then learns from its own successes/failures through RL (hacked up Unsloth's GRPO code). Started with just 23% accuracy on Apollo 13 mission report questions and hit 53% after less than an hour of training. Everything runs locally using open-source models. It's cool to see the model go from completely botching search queries to iteratively researching to get the right answer. https://ift.tt/12czdvT March 10, 2025 at 11:05PM

Cari Blog Ini

BlogViral

Show HN: Llama-8B Teaches Itself Baby Steps to Deep Research Using RL https://ift.tt/Z29yRzB

Komentar

Posting Komentar

Postingan populer dari blog ini

Launch HN: Wide Open School https://ift.tt/2WY1nob

Launch HN: PillarPlus (YC W20) – Automatically create construction blueprints https://ift.tt/2yet5m3

Support San Francisco Small Businesses