目录 - 小熊的小站

02-09

Hallucination Begins Where Saliency Drops

02-08

REACT：SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS

02-08

RAGLens： Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

02-06

Why Steering Works：Toward a Unified View of Language Model Parameter Dynamics

02-06

LLM-VA： Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

02-06

AlphaSteer： Learning Refusal Steering with Principled Null-Space Constraint

02-05

FRAUDAR： Bounding Graph Fraud in the Face of Camouflage

02-03

DiffuGuard： How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

02-02

One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs

01-15

RelayLLM： Efficient Reasoning via Collaborative Decoding