We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.
We utilize 3DGS serves as a persistent spatial memory for embodied navigation, enabling the agent to ‘‘hallucinate’’ optimal views for high-fidelity Vision-Language Model (VLM) reasoning.
We propose Splat2BEV, a Gaussian Splatting-assisted BEV perception framework that aims to learn BEV feature representations that are both semantically rich and geometrically precise.
YESBUT studies whether AI systems can understand contradictory humor in comics by reasoning across paired visual situations and resolving the contrast between what appears true and what undermines it.
Expo-GS introduces an exposure-aware signed distance formulation for Gaussian Splatting, targeting robust high dynamic range reconstruction under challenging illumination and exposure variation.