共计 214 篇文章
2026
RAGLens: Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Why Steering Works:Toward a Unified View of Language Model Parameter Dynamics
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
RelayLLM: Efficient Reasoning via Collaborative Decoding
mHC:流形约束的超连接
Text-to-LoRA: Instant Transformer Adaption