Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 7 days ago • 37
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 15 days ago • 15
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published Dec 19, 2024 • 39
KoLA: Carefully Benchmarking World Knowledge of Large Language Models Paper • 2306.09296 • Published Jun 15, 2023 • 20