Show HN: TextPolicy – reinforcement learning for text generation on a MacBook https://ift.tt/l92yfTt

Show HN: TextPolicy – reinforcement learning for text generation on a MacBook I built TextPolicy because I wanted a way to study reinforcement learning for text generation without needing a cluster or cloud GPUs. A MacBook is enough. The toolkit is simple: Implements GRPO and GSPO algorithms Provides a decorator interface for custom reward functions Includes LoRA and QLoRA utilities Runs on MLX, so it is efficient on Apple Silicon It is not intended for production. The purpose is learning and experimentation: to understand algorithms, to test ideas, to see how reward shaping affects behavior. Installation is through pip: pip install textpolicy There is a minimal example in the README. I am interested in feedback on: the clarity of the API, the usefulness of the examples, and whether this lowers the barrier for people new to RL. Repository: github.com/teilomillet/textpolicy https://ift.tt/DYcrNUe August 30, 2025 at 11:34PM

Komentar

Postingan populer dari blog ini

Show HN: Guish – A GUI for constructing and executing Unix pipelines https://ift.tt/HrXz5ub

Twin Peaks for All: Survey Results

Taken with Transportation Podcast: For the Love of Muni