Show HN: TextPolicy – reinforcement learning for text generation on a MacBook https://ift.tt/l92yfTt

Agustus 30, 2025

Show HN: TextPolicy – reinforcement learning for text generation on a MacBook I built TextPolicy because I wanted a way to study reinforcement learning for text generation without needing a cluster or cloud GPUs. A MacBook is enough. The toolkit is simple: Implements GRPO and GSPO algorithms Provides a decorator interface for custom reward functions Includes LoRA and QLoRA utilities Runs on MLX, so it is efficient on Apple Silicon It is not intended for production. The purpose is learning and experimentation: to understand algorithms, to test ideas, to see how reward shaping affects behavior. Installation is through pip: pip install textpolicy There is a minimal example in the README. I am interested in feedback on: the clarity of the API, the usefulness of the examples, and whether this lowers the barrier for people new to RL. Repository: github.com/teilomillet/textpolicy https://ift.tt/DYcrNUe August 30, 2025 at 11:34PM

Cari Blog Ini

BlogViral

Show HN: TextPolicy – reinforcement learning for text generation on a MacBook https://ift.tt/l92yfTt

Komentar

Posting Komentar

Postingan populer dari blog ini

Launch HN: Wide Open School https://ift.tt/2WY1nob

Launch HN: PillarPlus (YC W20) – Automatically create construction blueprints https://ift.tt/2yet5m3

Support San Francisco Small Businesses