共计 117 篇文章
2026
Why Steering Works:Toward a Unified View of Language Model Parameter Dynamics
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
RelayLLM: Efficient Reasoning via Collaborative Decoding
Text-to-LoRA: Instant Transformer Adaption
2025
A Unified Definition of Hallucination, Or: It's the World Model, Stupid
Blink:用于增强多模态理解的动态视觉token分辨率
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe