共计 317 篇文章
2026
Hallucination Begins Where Saliency Drops
REACT:SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS
RAGLens: Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Why Steering Works:Toward a Unified View of Language Model Parameter Dynamics
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
RelayLLM: Efficient Reasoning via Collaborative Decoding