共计 79 篇文章
2026
REACT:SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS
RAGLens: Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Why Steering Works:Toward a Unified View of Language Model Parameter Dynamics
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
RelayLLM: Efficient Reasoning via Collaborative Decoding
Text-to-LoRA: Instant Transformer Adaption
2025
A Unified Definition of Hallucination, Or: It's the World Model, Stupid