We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.
We are prototyping placeholder systems that combine spatial memory, semantic retrieval, and planning to support embodied agents acting over long horizons.
We are building a placeholder benchmark suite for evaluating open-world visual understanding across long-tail scene categories, ambiguous contexts, and multimodal evidence.
We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.
we propose Splat2BEV, a Gaussian Splatting-assisted BEV perception framework that aims to learn BEV feature representations that are both semantically rich and geometrically precise.