Video-Language Grounding for Open-World Agents

This placeholder project examines how agents align visual observations with language over time, especially when scenes, objects, and goals evolve beyond closed-set assumptions.

We are interested in open-world recognition, grounded language understanding, and long-horizon video reasoning for systems that operate continuously rather than on isolated clips.

This page is a placeholder for future project details, papers, demos, and datasets.

Vision-Language Recognition Project Overview