Yu Yin

Assistant Professor

Yu Yin

Yu Yin

Assistant Professor

Computer Vision, 3D Vision, Multimodal Learning, Embodied AI

Email: yu.yin@case.edu

Personal page: https://yin-yu.github.io/

Google Scholar: Profile

Location: Olin 606, CWRU

Team page: Back to Team

I am a tenure-track Assistant Professor in the Department of Computer & Data Science at Case Western Reserve University, where I lead the VU Lab. My research focuses on computer vision and 3D vision, multimodal large language models (MLLMs), and embodied AI systems, with the goal of building spatially grounded AI systems that can perceive, reason, and act in complex real-world environments.

At VU Lab, we study 3D vision and spatial representation, including Gaussian splatting (3DGS) and Nerual radiance fields (NeRF), with representative projects such as Reconstruction Matters, Segment then Splat, BARD-GS, and NeRFInvertor. We also investigate spatial intelligence for vision-language and embodied systems, including GSMem and our Spatial Intelligence in VLM survey. In addition, we develop multimodal methods and benchmarks for human-centered reasoning and decision-making, including VIVA, VIVA+, YesBut, YesBut-V2, and When Words Outperform Vision. Overall, our goal is to make multimodal and embodied AI systems more robust, trustworthy, and effective in real-world settings.

Education

  • Ph.D. in Computer Engineering, Northeastern University, Boston, USA (2019 - 2023)
  • M.S. in Electrical and Computer Engineering, Northeastern University, Boston, USA (2016 - 2018)
  • B.E. in Electrical and Information Engineering, Wuhan University of Technology, Wuhan, China (2012 - 2016)

Teaching

  • CSDS 570 - Deep Generative Models, Case Western Reserve University, USA, 2025-2026 Spring
  • CSDS 465 - Computer Vision, Case Western Reserve University, USA, 2024 Spring & Fall, 2025 Fall
  • CSDS 600 - Special Topics on Generative Models, Case Western Reserve University, USA, 2023 Fall
  • EECE 5642 - Data Visualization, Northeastern University, USA, 2021 Spring

Selected Awards

  • OpenAI Researcher Access Program, 2025
  • Teaching Award, Department of Computer and Data Sciences, Case Western Reserve University, USA, 2024
  • PhD Spotlight, Northeastern University, USA, 2023
  • Women Who Empower Innovator Awards semi-finalists, Northeastern University, USA, 2023
  • Dissertation Fellowship, Northeastern University, USA, 2023
  • NSF I-Corps Grant, 2022
  • PhD Network Grant, Northeastern University, USA, 2019, 2023

Selected Academic Activities

  • Area Chair (AC)
    • ACL (2025-2026)
    • ICLR (2026)
    • NeurIPS (2026)
  • Workshop Organizer / Program Chair
    • ICCV - AMFG (2025)
    • ICCV - AMFG (2023)
    • CVPR - AMFG (2021)
    • FG - RFIW (2020)
  • Panel Reviewer
    • NSF/NIH, Smart Health (SCH) Program, 2026
  • Program Committee Member (2019-Now)
    • Journals: TPAMI, TIP, TNNLS, TCyber, TCSVT, IoT, Elsevier
    • Conferences: CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, ACL Rolling Review (ARR), AAAI, IJCAI, ACM MM

Publications

  1. AdvSplat: Adversarial Attacks on Feed-Forward Gaussian Splatting Models.
    Yiran Qiao, Yiren Lu, Yunlai Zhou, Ruize Yang, Lufan Hou, Yu Yin and Jing Ma.
    In arXiv preprint arXiv:2603.23686, 2026.

    @article{qiao2026advsplat,
      title = {AdvSplat: Adversarial Attacks on Feed-Forward Gaussian Splatting Models},
      author = {Qiao, Yiran and Lu, Yiren and Zhou, Yunlai and Yang, Ruize and Hou, Lufan and Yin, Yu and Ma, Jing},
      journal = {arXiv preprint arXiv:2603.23686},
      year = {2026},
      pdf = {https://arxiv.org/pdf/2603.23686.pdf}
    }
    
  2. GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning.
    Yiren Lu, Yi Du, Disheng Liu, Yunlai Zhou, Chen Wang and Yu Yin.
    In arXiv preprint arXiv:2603.19137, 2026.

    @article{lu2026gsmem,
      title = {GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning},
      author = {Lu, Yiren and Du, Yi and Liu, Disheng and Zhou, Yunlai and Wang, Chen and Yin, Yu},
      journal = {arXiv preprint arXiv:2603.19137},
      year = {2026},
      pdf = {https://arxiv.org/pdf/2603.19137.pdf},
      website = {https://yiren-lu.com/project_pages/GSMem/}
    }
    
  3. Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting.
    Yiren Lu, Xin Ye, Burhaneddin Yaman, Jingru Luo, Zhexiao Xiong, Liu Ren and Yu Yin.
    In arXiv preprint arXiv:2603.19193, 2026.

    @article{lu2026reconstruction,
      title = {Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting},
      author = {Lu, Yiren and Ye, Xin and Yaman, Burhaneddin and Luo, Jingru and Xiong, Zhexiao and Ren, Liu and Yin, Yu},
      journal = {arXiv preprint arXiv:2603.19193},
      year = {2026},
      pdf = {https://arxiv.org/pdf/2603.19193.pdf},
      website = {https://yiren-lu.com/project_pages/Splat2BEV/}
    }
    
  4. Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing.
    Meng Wang, Chang Ma, Aoran Jiao, Tuo Liang, Pengfei Lu, Saanvi Hegde, Yu Yin and others.
    In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 19, 2026.

    @inproceedings{wang2026serendipity,
      title = {Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing},
      author = {Wang, Meng and Ma, Chang and Jiao, Aoran and Liang, Tuo and Lu, Pengfei and Hegde, Saanvi and Yin, Yu and others},
      booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
      volume = {40},
      number = {19},
      year = {2026},
      pdf = {https://ojs.aaai.org/index.php/AAAI/article/view/38618}
    }
    
  5. Spatial Intelligence in Vision-Language Models: A Comprehensive Survey.
    Disheng Liu, Tuo Liang, Zhe Hu, Jierui Peng, Yiren Lu, Yi Xu, Yun Fu and Yu Yin.
    In TechRxiv, 2026.

    @article{liu2026spatial,
      title = {Spatial Intelligence in Vision-Language Models: A Comprehensive Survey},
      author = {Liu, Disheng and Liang, Tuo and Hu, Zhe and Peng, Jierui and Lu, Yiren and Xu, Yi and Fu, Yun and Yin, Yu},
      journal = {TechRxiv},
      year = {2026},
      pdf = {https://www.techrxiv.org/doi/full/10.36227/techrxiv.176231405.57942913/v2},
      website = {https://dishengll.github.io/Awesome-Spatial-VLMs/}
    }
    
  6. DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering.
    Yiran Qiao, Yiren Lu, Yunlai Zhou, Ruize Yang, Lufan Hou, Yu Yin and Jing Ma.
    In arXiv preprint arXiv:2602.19323, 2026.

    @article{qiao2026defensesplat,
      title = {DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering},
      author = {Qiao, Yiran and Lu, Yiren and Zhou, Yunlai and Yang, Ruize and Hou, Lufan and Yin, Yu and Ma, Jing},
      journal = {arXiv preprint arXiv:2602.19323},
      year = {2026},
      pdf = {https://arxiv.org/pdf/2602.19323}
    }
    
  7. HugRAG: Hierarchical Causal Knowledge Graph Design for RAG.
    Nian Wang, Tuo Liang, Varun Singh, Chaoda Song, Vivian Yang, Yu Yin, Jing Ma, Jaideep Singh and others.
    In arXiv preprint arXiv:2602.05143, 2026.

    @article{wang2026hugrag,
      title = {HugRAG: Hierarchical Causal Knowledge Graph Design for RAG},
      author = {Wang, Nian and Liang, Tuo and Singh, Varun and Song, Chaoda and Yang, Vivian and Yin, Yu and Ma, Jing and Singh, Jaideep and others},
      journal = {arXiv preprint arXiv:2602.05143},
      year = {2026},
      pdf = {https://arxiv.org/pdf/2602.05143.pdf}
    }
    
  8. Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs.
    Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin and Pan Li.
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025.

    @inproceedings{fu2025quantized,
      title = {Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs},
      author = {Fu, Yao and Long, Xianxuan and Li, Runchao and Yu, Haotian and Sheng, Mu and Han, Xiaotian and Yin, Yu and Li, Pan},
      booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2508.19432}
    }
    
  9. Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?
    Jierui Peng, Yanyan Zhang, Yicheng Duan, Tuo Liang, Vipin Chaudhary and Yu Yin.
    In arXiv preprint arXiv:2510.16263, 2025.

    @article{peng2025nebula,
      title = {Nebula: Do we Evaluate Vision-Language-Action Agents Correctly?},
      author = {Peng, Jierui and Zhang, Yanyan and Duan, Yicheng and Liang, Tuo and Chaudhary, Vipin and Yin, Yu},
      journal = {arXiv preprint arXiv:2510.16263},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2510.16263.pdf}
    }
    
  10. Fix False Transparency by Noise Guided Splatting.
    Aly El Hakie, Yiren Lu, Yu Yin, Michael Jenkins and Yehe Liu.
    In Advances in Neural Information Processing Systems, 2025.

    @article{hakie2025fix,
      title = {Fix False Transparency by Noise Guided Splatting},
      author = {Hakie, Aly El and Lu, Yiren and Yin, Yu and Jenkins, Michael and Liu, Yehe},
      journal = {Advances in Neural Information Processing Systems},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2510.15736},
      website = {https://opsiclear.github.io/ngs/}
    }
    
  11. A Survey on Multi-Robot Collaboration Systems: Architectures, Performances, and Applications.
    Siyuan Zhang, Zhipeng Li, Shasha Xu, Yu Yin, Vipin Chaudhary and Haibin Xu.
    In Authorea Preprints, 2025.

    @article{zhang2025multirobot,
      title = {A Survey on Multi-Robot Collaboration Systems: Architectures, Performances, and Applications},
      author = {Zhang, Siyuan and Li, Zhipeng and Xu, Shasha and Yin, Yu and Chaudhary, Vipin and Xu, Haibin},
      journal = {Authorea Preprints},
      year = {2025},
      pdf = {https://www.techrxiv.org/doi/full/10.36227/techrxiv.176045766.60277537/v2}
    }
    
  12. Viva+: Human-Centered Situational Decision-Making.
    Zhe Hu, Yixiao Ren, Guang Liu, Jing Li and Yu Yin.
    In Findings of the Association for Computational Linguistics: EMNLP, 2025.

    @article{hu2025vivaplus,
      title = {Viva+: Human-Centered Situational Decision-Making},
      author = {Hu, Zhe and Ren, Yixiao and Liu, Guang and Li, Jing and Yin, Yu},
      journal = {Findings of the Association for Computational Linguistics: EMNLP},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2509.23698.pdf},
      website = {https://derekhu.com/project_page/viva_plus_website/}
    }
    
  13. Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs.
    Yao Fu, Runchao Li, Xianxuan Long, Haotian Yu, Xiaotian Han, Yu Yin and Pan Li.
    In Findings of the Association for Computational Linguistics: EMNLP, 2025.

    @article{fu2025pruning,
      title = {Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs},
      author = {Fu, Yao and Li, Runchao and Long, Xianxuan and Yu, Haotian and Han, Xiaotian and Yin, Yu and Li, Pan},
      journal = {Findings of the Association for Computational Linguistics: EMNLP},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2509.00096.pdf}
    }
    
  14. Counterfactual Visual Explanation via Causally-Guided Adversarial Steering.
    Yiran Qiao, Disheng Liu, Yiren Lu, Yu Yin, Mengnan Du and Jing Ma.
    In arXiv preprint arXiv:2507.09881, 2025.

    @article{qiao2025counterfactual,
      title = {Counterfactual Visual Explanation via Causally-Guided Adversarial Steering},
      author = {Qiao, Yiran and Liu, Disheng and Lu, Yiren and Yin, Yu and Du, Mengnan and Ma, Jing},
      journal = {arXiv preprint arXiv:2507.09881},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2507.09881.pdf}
    }
    
  15. Towards Open-set Face Anti-spoofing with Unseen Attack Synthesis.
    Chang Liu, Yitian Zhang, Yu Yin and Yun Fu.
    In 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), 2025.

    @inproceedings{liu2025opensetface,
      title = {Towards Open-set Face Anti-spoofing with Unseen Attack Synthesis},
      author = {Liu, Chang and Zhang, Yitian and Yin, Yu and Fu, Yun},
      booktitle = {2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)},
      year = {2025},
      pdf = {https://ieeexplore.ieee.org/abstract/document/11099152}
    }
    
  16. ResSVD: Residual Compensated SVD for Large Language Model Compression.
    Hongyu Bai, Shuo Jian, Tuo Liang, Yu Yin and Huan Wang.
    In arXiv preprint arXiv:2505.20112, 2025.

    @article{bai2025ressvd,
      title = {ResSVD: Residual Compensated SVD for Large Language Model Compression},
      author = {Bai, Hongyu and Jian, Shuo and Liang, Tuo and Yin, Yu and Wang, Huan},
      journal = {arXiv preprint arXiv:2505.20112},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2505.20112.pdf}
    }
    
  17. Certified Causal Defense with Generalizable Robustness.
    Yiran Qiao, Yu Yin, Chen Chen and Jing Ma.
    In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 19, 2025.

    @inproceedings{qiao2025certified,
      title = {Certified Causal Defense with Generalizable Robustness},
      author = {Qiao, Yiran and Yin, Yu and Chen, Chen and Ma, Jing},
      booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
      volume = {39},
      number = {19},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2408.15451}
    }
    
  18. Praxis-vlm: Vision-grounded decision making via text-driven reinforcement learning.
    Zhe Hu, Jing Li, Zhongzhu Pu, Hou Pong Chan and Yu Yin.
    In Advances in Neural Information Processing Systems, 2025.

    @article{hu2025praxis,
      title = {Praxis-vlm: Vision-grounded decision making via text-driven reinforcement learning},
      author = {Hu, Zhe and Li, Jing and Pu, Zhongzhu and Chan, Hou Pong and Yin, Yu},
      journal = {Advances in Neural Information Processing Systems},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2503.16965},
      code = {https://github.com/Derekkk/Praxis-VLM}
    }
    
  19. Causal3D: A Comprehensive Benchmark for Causal Learning from Visual Data.
    Disheng Liu, Yiran Qiao, Wuche Liu, Yiren Lu, Yunlai Zhou, Tuo Liang, Yu Yin and Jing Ma.
    In arXiv preprint arXiv:2503.04852, 2025.

    @article{liu2025causal3d,
      title = {Causal3D: A Comprehensive Benchmark for Causal Learning from Visual Data},
      author = {Liu, Disheng and Qiao, Yiran and Liu, Wuche and Lu, Yiren and Zhou, Yunlai and Liang, Tuo and Yin, Yu and Ma, Jing},
      journal = {arXiv preprint arXiv:2503.04852},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2503.04852.pdf},
      data = {https://huggingface.co/datasets/LLDDSS/Causal3D_Dataset}
    }
    
  20. When Words Outperform Vision: VLMs Can Self-Improve via Text-Only Training for Human-Centered Decision Making.
    Zhe Hu, Jing Li and Yu Yin.
    In arXiv e-prints, arXiv:2503.16965, 2025.

    @article{hu2025whenwords,
      title = {When Words Outperform Vision: VLMs Can Self-Improve via Text-Only Training for Human-Centered Decision Making},
      author = {Hu, Zhe and Li, Jing and Yin, Yu},
      journal = {arXiv e-prints, arXiv:2503.16965},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2503.16965.pdf}
    }
    
  21. Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting.
    Yiren Lu, Yunlai Zhou, Yiran Qiao, Chaoda Song, Tuo Liang, Jing Ma, Huan Wang and Yu Yin.
    In Advances in Neural Information Processing Systems, 2025.

    @article{lu2025segment,
      title = {Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting},
      author = {Lu, Yiren and Zhou, Yunlai and Qiao, Yiran and Song, Chaoda and Liang, Tuo and Ma, Jing and Wang, Huan and Yin, Yu},
      journal = {Advances in Neural Information Processing Systems},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2503.22204v2},
      website = {https://yiren-lu.com/project_pages/Segment-then-Splat/},
      code = {https://github.com/luyr/Segment-then-Splat}
    }
    
  22. BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting.
    Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang and Yu Yin.
    In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 16532–16542, 2025.

    @inproceedings{lu2025bard,
      title = {BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting},
      author = {Lu, Yiren and Zhou, Yunlai and Liu, Disheng and Liang, Tuo and Yin, Yu},
      booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
      pages = {16532--16542},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2503.15835},
      website = {https://yiren-lu.com/project_pages/BARD-GS/},
      code = {https://github.com/luyr/BARD-GS},
      data = {https://drive.google.com/drive/u/0/folders/1CRBQ_HR3yKhT3G9_ttTWA1PWXWL6DtsV}
    }
    
  23. Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation.
    Zhe Hu, Hou Pong Chan, Jing Li and Yu Yin.
    In Proceedings of the 31st International Conference on Computational Linguistics, 2025.

    @inproceedings{hu2025debatetowrite,
      title = {Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation},
      author = {Hu, Zhe and Chan, Hou Pong and Li, Jing and Yin, Yu},
      booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
      year = {2025},
      pdf = {https://arxiv.org/pdf/2406.19643},
      code = {https://github.com/Derekkk/LLM4ArgGen}
    }
    
  24. VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values.
    Zhe Hu, Yixiao Ren, Jing Li and Yu Yin.
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024.

    @inproceedings{hu2024viva,
      title = {VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values},
      author = {Hu, Zhe and Ren, Yixiao and Li, Jing and Yin, Yu},
      booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
      year = {2024},
      pdf = {https://aclanthology.org/2024.emnlp-main.137.pdf},
      website = {https://derekhu.com/project_page/viva_website_emnlp24/},
      code = {https://github.com/Derekkk/VIVA_EMNLP24},
      data = {https://huggingface.co/datasets/zhehuderek/VIVA_Benchmark_EMNLP24}
    }
    
  25. View-Consistent Object Removal in Radiance Fields.
    Yiren Lu, Jing Ma and Yu Yin.
    In Proceedings of the 32nd ACM International Conference on Multimedia, pp. 3597–3606, 2024.

    @inproceedings{lu2024viewconsistent,
      title = {View-Consistent Object Removal in Radiance Fields},
      author = {Lu, Yiren and Ma, Jing and Yin, Yu},
      booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia},
      pages = {3597--3606},
      year = {2024},
      pdf = {https://arxiv.org/pdf/2408.02100v1.pdf},
      website = {https://yiren-lu.com/project_pages/View-consistent_Object_Removal_in_Radiance_Fields/}
    }
    
  26. AMERICANO: Argument Generation with Discourse-Driven Decomposition and Multi-Agent Interaction.
    Zhe Hu, Hou Pong Chan and Yu Yin.
    In Proceedings of the 17th International Natural Language Generation Conference, 2024.

    @inproceedings{hu2024americano,
      title = {AMERICANO: Argument Generation with Discourse-Driven Decomposition and Multi-Agent Interaction},
      author = {Hu, Zhe and Chan, Hou Pong and Yin, Yu},
      booktitle = {Proceedings of the 17th International Natural Language Generation Conference},
      year = {2024},
      pdf = {https://aclanthology.org/2024.inlg-main.8/}
    }
    
  27. Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions.
    Zhe Hu, Tuo Liang, Jing Li, Yiren Lu, Yunlai Zhou, Yiran Qiao, Jing Ma and Yu Yin.
    In Advances in Neural Information Processing Systems, vol. 37, pp. 47166–47188, 2024.

    @article{hu2024cracking,
      title = {Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions},
      author = {Hu, Zhe and Liang, Tuo and Li, Jing and Lu, Yiren and Zhou, Yunlai and Qiao, Yiran and Ma, Jing and Yin, Yu},
      journal = {Advances in Neural Information Processing Systems},
      volume = {37},
      pages = {47166--47188},
      year = {2024},
      pdf = {https://openreview.net/pdf?id=bCMpdaQCNW},
      website = {https://vulab-ai.github.io/YESBUT_Homepage/},
      dataset = {https://huggingface.co/datasets/zhehuderek/YESBUT_Benchmark},
      code = {https://github.com/Derekkk/VIVA_EMNLP24}
    }
    
  28. Nerfinvertor: High fidelity nerf-gan inversion for single-shot real image animation.
    Yu Yin, Kamran Ghasedi, HsiangTao Wu, Jiaolong Yang, Xin Tong and Yun Fu.
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8539–8548, 2023.

    @inproceedings{yin2023nerfinvertor,
      title = {Nerfinvertor: High fidelity nerf-gan inversion for single-shot real image animation},
      author = {Yin, Yu and Ghasedi, Kamran and Wu, HsiangTao and Yang, Jiaolong and Tong, Xin and Fu, Yun},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages = {8539--8548},
      year = {2023},
      pdf = {https://openaccess.thecvf.com/content/CVPR2023/papers/Yin_NeRFInvertor_High_Fidelity_NeRF-GAN_Inversion_for_Single-Shot_Real_Image_Animation_CVPR_2023_paper.pdf},
      website = {https://yuyin1.github.io/NeRFInvertor_Homepage/},
      code = {https://github.com/YuYin1/NeRFInvertor}
    }
    
  29. Cautious Next Token Prediction.
    Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi and Yun Fu.
    In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, pp. 25685–25697, 2025.

    @inproceedings{wang-etal-2025-cautious,
      title = {Cautious Next Token Prediction},
      author = {Wang, Yizhou and Zhang, Lingzhi and Bai, Yue and Chiu, Mang Tik and Hu, Zhengmian and Zhang, Mingyuan and Dong, Qihua and Yin, Yu and Amirghodsi, Sohrab and Fu, Yun},
      editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher},
      booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
      month = jul,
      year = {2025},
      address = {Vienna, Austria},
      publisher = {Association for Computational Linguistics},
      url = {https://aclanthology.org/2025.findings-acl.1318/},
      doi = {10.18653/v1/2025.findings-acl.1318},
      pages = {25685--25697},
      isbn = {979-8-89176-256-5}
    }
    
    Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model’s capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings’ behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.