Blog

Blog sub title

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.

Spatial Memory for Long-Horizon Embodied Agents
Spatial Memory for Long-Horizon Embodied Agents

We are prototyping placeholder systems that combine spatial memory, semantic retrieval, and planning to support embodied agents acting over long horizons.

Visual Understanding Benchmark for Open-World Scenes
Visual Understanding Benchmark for Open-World Scenes

We are building a placeholder benchmark suite for evaluating open-world visual understanding across long-tail scene categories, ambiguous contexts, and multimodal evidence.

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

we propose Splat2BEV, a Gaussian Splatting-assisted BEV perception framework that aims to learn BEV feature representations that are both semantically rich and geometrically precise.

Latest Posts

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.

Spatial Memory for Long-Horizon Embodied Agents
Spatial Memory for Long-Horizon Embodied Agents

We are prototyping placeholder systems that combine spatial memory, semantic retrieval, and planning to support embodied agents acting over long horizons.

Visual Understanding Benchmark for Open-World Scenes
Visual Understanding Benchmark for Open-World Scenes

We are building a placeholder benchmark suite for evaluating open-world visual understanding across long-tail scene categories, ambiguous contexts, and multimodal evidence.