Visual Understanding Benchmark for Open-World Scenes

  • Vision-Grounded Decision-Making with Human Values: VIVA, VIVA+
  • Humorous Contradictions: YesBut, YesBut-v2
  • Causal Reasoning Evaluation: Causal3D
  • Vision-Language-Action Agent Evaluation: Nebula
Vision-Language Models Spatial Intelligence Survey