##article.return## AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding Download Download PDF