Jierui Peng

Personal page: https://jerrypeng.com/

Google Scholar: Profile

Location: VU Lab, Case Western Reserve University, Cleveland, USA

I am a PhD student in Computer Science at Case Western Reserve University, working at the intersection of embodied AI, computer vision, and machine learning. My work focuses on building intelligent systems that can perceive, reason, and act in the real world, with an emphasis on embodied AI, world modeling, and efficient real-time inference.

I am particularly interested in bridging the gap between high-level understanding and real-world execution. My research explores how structured reasoning, language grounding, and causal understanding can be leveraged to build embodied systems that are more reliable, interpretable, and generalizable.

Education

B.S. in Computer Science and Economics, Brandeis University
M.S. in Computer Science, New York University
Ph.D. in Computer Science, Case Western Reserve University (current)

Focus Areas

Embodied Intelligence — perception, reasoning, and action in real-world environments
World Models & Causal Reasoning — structured representations for prediction and decision-making
Vision-Language Systems — grounding language in spatial and physical contexts
Real-Time AI Systems — efficient, deployable models for real-world applications

Selected Work

RT-LTP: Real-Time Latent Trajectory Prediction via Efficient Online Adaptation (2025)
NEBULA: A Unified Framework for Evaluating Embodied AI Systems (2025)
CLAIRE: Causally Explainable AI for EKG-Based Risk Prediction (2025)

Publications

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey.
Disheng Liu, Tuo Liang, Zhe Hu, Jierui Peng, Yiren Lu, Yi Xu, Yun Fu and Yu Yin.
In TechRxiv, 2026.
```
@article{liu2026spatial,
  title = {Spatial Intelligence in Vision-Language Models: A Comprehensive Survey},
  author = {Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu},
  journal = {TechRxiv},
  year = {2026},
  status = {preprint},
  pdf = {https://www.techrxiv.org/doi/full/10.36227/techrxiv.176231405.57942913/v2},
  website = {https://github.com/vulab-AI/Awesome-Spatial-VLMs}
}
```
When ’YES’ Meets ’BUT’: Can AI Comprehend Contradictory Humor in Comics?
Tuo Liang, Zhe Hu, Jing Li, Hao Zhang, Yiren Lu, Yunlai Zhou, Yiran Qiao, Disheng Liu, Jierui Peng, Jing Ma and Yu Yin.
In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026.
(Impact Factor: 20.4)
```
@article{liang2026yesbut,
  title = {When 'YES' Meets 'BUT': Can AI Comprehend Contradictory Humor in Comics?},
  author = {Liang, Tuo and Hu, Zhe and Li, Jing and Zhang, Hao and Lu, Yiren and Zhou, Yunlai and Qiao, Yiran and Liu, Disheng and Peng, Jierui and Ma, Jing and Yin, Yu},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year = {2026},
  doi = {10.1109/TPAMI.2026.3688191},
  note = {Impact Factor: 20.4},
  status = {accepted},
  pdf = {https://arxiv.org/pdf/2503.23137.pdf},
  website = {/projects/yesbut-v2/},
  data = {https://huggingface.co/datasets/zhehuderek/YESBUT_Benchmark}
}
```
Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?
Jierui Peng, Yanyan Zhang, Yicheng Duan, Tuo Liang, Vipin Chaudhary and Yu Yin.
In arXiv preprint arXiv:2510.16263, 2025.
```
@article{peng2025nebula,
  title = {Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?},
  author = {Peng, Jierui and Zhang, Yanyan and Duan, Yicheng and Liang, Tuo and Chaudhary, Vipin and Yin, Yu},
  journal = {arXiv preprint arXiv:2510.16263},
  year = {2025},
  status = {preprint},
  pdf = {https://arxiv.org/pdf/2510.16263.pdf},
  website = {/projects/nebula-alpha/}
}
```