Blog

Blog sub title

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

A comprehensive survey addressing how VLMs currently lack spatial intelligence, covering recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

A comprehensive survey addressing how VLMs currently lack spatial intelligence, covering recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

BARD-GS is a novel approach for robust dynamic scene reconstruction that effectively handles blurry inputs and imprecise camera poses.

Balancing Fidelity and Diversity: Synthetic Data Could Stand on the Shoulder of the Real in Visual Recognition
Balancing Fidelity and Diversity: Synthetic Data Could Stand on the Shoulder of the Real in Visual Recognition

Investigates how data fidelity and diversity affect recognition performance through synthetic data curation, offering training-free improvements for visual recognition tasks.

Visual Understanding Benchmark for Open-World Scenes

Our lab offers a diverse range of benchmarks, including YesBut, Nebular, Viva, and Causal 3D, focused on advancing open-world visual understanding and robotic interaction.

Latest Posts

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

we propose Splat2BEV, a Gaussian Splatting-assisted BEV perception framework that aims to learn BEV feature representations that are both semantically rich and geometrically precise.