We present Perception-R1, a scalable RL framework using Group Relative Policy Optimization (GRPO) during MLLM post-training. Key innovations: 🎯 Perceptual Perplexity Analysis: We introduce a novel ...
This repository provides a collection of papers and resources focused on long-context LLMs, including architecture, infrastructure, training, and evaluation. For a clear taxonomy and more insights ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results