Jierui Peng
PhD Student
Computer Vision, Robotics AI, Online Learning.
I am a PhD student in Computer Science at Case Western Reserve University, working at the intersection of embodied AI, computer vision, and machine learning. My work focuses on building intelligent systems that can perceive, reason, and act in the real world, with an emphasis on embodied AI, world modeling, and efficient real-time inference.
I am particularly interested in bridging the gap between high-level understanding and real-world execution. My research explores how structured reasoning, language grounding, and causal understanding can be leveraged to build embodied systems that are more reliable, interpretable, and generalizable.
Education
- B.S. in Computer Science and Economics, Brandeis University
- M.S. in Computer Science, New York University
- Ph.D. in Computer Science, Case Western Reserve University (current)
Focus Areas
- Embodied Intelligence — perception, reasoning, and action in real-world environments
- World Models & Causal Reasoning — structured representations for prediction and decision-making
- Vision-Language Systems — grounding language in spatial and physical contexts
- Real-Time AI Systems — efficient, deployable models for real-world applications
Selected Work
- RT-LTP: Real-Time Latent Trajectory Prediction via Efficient Online Adaptation (2025)
- NEBULA: A Unified Framework for Evaluating Embodied AI Systems (2025)
- CLAIRE: Causally Explainable AI for EKG-Based Risk Prediction (2025)
Publications
-
@article{liu2026spatial, title = {Spatial Intelligence in Vision-Language Models: A Comprehensive Survey}, author = {Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu}, journal = {TechRxiv}, year = {2026}, status = {preprint}, pdf = {https://www.techrxiv.org/doi/full/10.36227/techrxiv.176231405.57942913/v2}, website = {https://dishengll.github.io/Awesome-Spatial-VLMs/} } - Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?In arXiv preprint arXiv:2510.16263, 2025.
@article{peng2025nebula, title = {Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?}, author = {Peng, Jierui and Zhang, Yanyan and Duan, Yicheng and Liang, Tuo and Chaudhary, Vipin and Yin, Yu}, journal = {arXiv preprint arXiv:2510.16263}, year = {2025}, status = {preprint}, pdf = {https://arxiv.org/pdf/2510.16263.pdf} } - When ’YES’ Meets ’BUT’: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?In arXiv preprint arXiv:2503.23137, 2025.
@article{liang2025yesbut, title = {When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?}, author = {Liang, Tuo and Hu, Zhe and Li, Jing and Zhang, Hao and Lu, Yiren and Zhou, Yunlai and Qiao, Yiran and Liu, Disheng and Peng, Jierui and Ma, Jing and others}, journal = {arXiv preprint arXiv:2503.23137}, year = {2025}, status = {preprint}, pdf = {https://arxiv.org/pdf/2503.23137.pdf} }