Junjie Hu | 胡俊杰


Assistant Professor
Biostatistics & Medical Informatics
Computer Science
Data Science Institute
University of Wisconsin-Madison
Office: 4735 MSC, 420 North Charter Street, Madison, WI
Office phone: +1-6082656118
Email: junjie.hu@wisc.edu
[Research Statement]

About

I am an assistant professor with appointments in the Department of Biostatistics, Department of Computer Science and Data Science Institute at the University of Wisconsin-Madison. I obtained my Ph.D. from School of Computer Science at Carnegie Mellon University, where I worked with Jaime Carbonell and Graham Neubig.

I have a broad interest in natural language processing and machine learning. My research goal is to build robust intelligent systems that evolve with changes in the environment and interact with people speaking different languages. In particular, my research focuses on algorithmic design and fundamental understanding of machine learning models in NLP that enable safe deployment in the wild. Most recently, I’m fascinated by understanding behaviors of large language models, adapting them effectively to knowledge-intensive reasoning tasks, and aligning them safely with users from diverse backgrounds. Specific topics of interest include the following aspects of large language models:

  • Multilingual/multimodal representation learning from self-supervision
  • Unifying LLMs with knowledge graphs for structured reasoning
  • Evaluation and interpretation of black-box foundation models
  • Language models as agent for task executions

Prospective students (Updated on Nov 10 2023): Thanks for your interest! I may not be able to reply to all inqueries due to the large amounts of emails. If you still want to bring my attention to your papers by email, please add “[prospective student to Hulab]” in the email subject. I’ll update hiring information on my website. I am looking for ~2 excellent PhD students to join our lab in the fall of 2024. Please apply to the CS or BDS program, and mention my name in your application and research statement. UW-Madison is an excellent place for research, and Madison is a wonderful city to live in. Please check out these videos (Why UW-Madison, Madison). I’m also happy to work with masters or undergraduate students at UW-Madison. If you are interested, please send me an email.

Research Group

I am really fortunate to work with a group of excellent students at UW-Madison. Stay tuned for our latest works!

Graduate Students Undergraduate Students
  • Agam Goyal (BS in CS)

Alumni

Recent Preprints

2024

  1. arXiv
    Data augmentation using llms: Data perspectives, learning paradigms and challenges Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, and Shafiq Joty arXiv preprint arXiv:2403.02990 2024
  2. arXiv
    Prompting Large Vision-Language Models for Compositional Reasoning Timothy Ossowski, Ming Jiang, and Junjie Hu arXiv preprint arXiv:2401.11337 2024

2023

  1. arXiv
    CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model Yang Lei, Jiangtong Li, Ming Jiang, Junjie Hu, Dawei Cheng, Zhijun Ding, and Changjun Jiang arXiv preprint arXiv:2311.05812 2023
  2. arXiv
    Empowering LLM-based Machine Translation with Cultural Awareness Binwei Yao, Ming Jiang, Diyi Yang, and Junjie Hu arXiv preprint arXiv:2305.14328 2023

Publications


2024

  1. CogSci
    Evaluating LLM Agent Group Dynamics against Human Group Dynamics: A Case Study on Wisdom of Partisan Crowds Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy T Rogers In The Annual Conference of the Cognitive Science Society (CogSci). 2024
  2. NAACL
    Simulating Opinion Dynamics with Networks of LLM-based Agents Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy T Rogers In Findings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics 2024
  3. NAACL
    How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes Harmon Bhasin, Timothy Ossowski, Yiqiao Zhong, and Junjie Hu In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics 2024
  4. CVPR
    Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, and Shuqiang Jiang In The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
  5. EACL
    Learning Label Hierarchy with Supervised Contrastive Learning Ruixue Lian, William A. Sethares, and Junjie Hu In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics 2024
  6. CSCW
    MetaWriter: Exploring the Potential and Perils of AI Writing Support in Scientific Peer Review Lu Sun, Stone Tao, Junjie Hu, and Steven Dow In Proceedings of The 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing 2024

2023

  1. ACL
    Single Sequence Prediction over Reasoning Graphs for Multi-hop QA Gowtham Ramesh, Makesh Narsimhan Sreedhar, and Junjie Hu In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
  2. ACL
    Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection Rheeya Uppaal, Junjie Hu, and Yixuan Li In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
  3. ACL
    Local Byte Fusion for Neural Machine Translation Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, and Junjie Hu In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
  4. ACL
    Multimodal Prompt Retrieval for Generative Visual Question Answering Timothy Ossowski, and Junjie Hu In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2023
  5. WS
    Evolving Domain Adaptation of Pretrained Language Models for Text Classification Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T Rogers, and Junjie Hu In NeurIPS Workshop on Distribution Shifts, 37th Conference on Neural Information Processing Systems. 2023

2022

  1. EMNLP
    Beyond Counting Datasets: Investigating Multilingual Dataset Construction and Necessary Resources Xinyan Yu, Trina Chatterjee, Akari Asai, Junjie Hu, and Eunsol Choi In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022
  2. EMNLP
    Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, and Kangwook Lee In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022
  3. IEEE TPAMI
    Video Pivoting Unsupervised Multi-modal Neural Machine Translation Mingjie Li, Po-Yao Huang, Xiaojun Chang, Junjie Hu, Yi Yang, and Alex Hauptmann IEEE transactions on pattern analysis and machine intelligence (To Appear) 2022
  4. ACL
    DEEP: DEnoising Entity Pre-training for Neural Machine Translation Junjie Hu, Hiroaki Hayashi, Kyunghyun Cho, and Graham Neubig In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2022
  5. ACL
    GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems Bosheng Ding, Junjie Hu, Lidong Bing, Sharifah Aljunied Mahani, Shafiq R. Joty, Luo Si, and Chunyan Miao In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2022

2021

  1. WMT
    Phrase-level Active Learning for Neural Machine Translation Junjie Hu, and Graham Neubig In The Sixth Conference on Machine Translation (WMT) 2021 [Abs] [Code]
  2. EMNLP
    AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages Machel Reid, Junjie Hu, Graham Neubig, and Yutaka Matsuo In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 [Abs] [Code]
  3. EMNLP
    XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Dan Garrette, Graham Neubig, and Melvin Johnson In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 [Abs] [Code]
  4. NAACL
    Explicit Alignment Objectives for Multilingual Bidirectional Encoders Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Siddhant, and Graham Neubig In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics 2021 [Abs] [Code]
  5. NAACL
    Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models Po-Yao Huang, Mandela Patrick, Junjie Hu, Graham Neubig, Florian Metze, and Alexander Hauptmann In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics 2021 [Abs] [Code]

2020

  1. ICML
    XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson In International Conference on Machine Learning (ICML) 2020 [Abs] [Code]
  2. ICML
    On Learning Language-Invariant Representations for Universal Machine Translation Han Zhao, Junjie Hu, and Andrej Risteski In International Conference on Machine Learning (ICML) 2020 [Abs]
  3. ACL
    Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting Po-Yao Huang, Junjie Hu, Xiaojun Chang, and Alexander Hauptmann In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020 [Abs]
  4. Workshop
    TICO-19: the Translation Initiative for COvid-19 Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federmann, Dmitriy Genzel, Franscisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, and Sylwia Tur In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020 [Abs]
  5. AAAI
    What Makes A Good Story? Designing Composite Rewards for Visual Storytelling Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, and Graham Neubig In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) 2020 [Code]

2019

  1. ACL
    Domain Adaptation of Neural Machine Translation by Lexicon Induction Junjie Hu, Mengzhou Xia, Graham Neubig, and Jaime Carbonell In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [Abs] [Code]
  2. CIKM
    A hybrid retrieval-generation neural conversation model Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W Bruce Croft, Xiaodong Liu, Yelong Shen, and Jingjing Liu In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 [Code]
  3. EMNLP
    REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, and Jianfeng Gao In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
  4. EMNLP
    Handling Syntactic Divergence in Low-resource Machine Translation Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
  5. EMNLP
    Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, and Graham Neubig In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
  6. WNGT
    Domain Differential Adaptation for Neural Machine Translation Zi-Yi Dou, Xinyi Wang, Junjie Hu, and Graham Neubig In Proceedings of the 3rd Workshop on Neural Generation and Translation 2019 [Abs]
  7. NAACL
    compare-mt: A Tool for Holistic Comparison of Language Generation Systems Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) 2019 [Abs] [Code] [Best Demon Nomination]

2018

  1. EMNLP
    Rapid Adaptation of Neural Machine Translation to New Languages Graham Neubig, and Junjie Hu In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 [Abs] [Code]
  2. ACL
    Automatic Estimation of Simultaneous Interpreter Performance Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018 [Abs]
  3. WMT
    Contextual Encoding for Translation Quality Estimation Junjie Hu, Wei-Cheng Chang, Yuexin Wu, and Graham Neubig In Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018 [Abs] [Code]

2017

  1. EMNLP
    Structural Embedding of Syntactic Trees for Machine Comprehension Rui Liu, Junjie Hu, Wei Wei, Zi Yang, and Eric Nyberg In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017 [Abs]
  2. ACL
    Semi-Supervised QA with Generative Domain-Adaptive Nets Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, and William Cohen In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017 [Abs]
  3. AAAI
    Answer-aware attention on grounded question answering in images Junjie Hu, Desai Fan, Shuxin Yao, and Jean Oh In AAAI 2017 Fall Symposium on Natural Communication for Human-Robot Collaboration 2017
  4. IEEE TNNLS
    Online nonlinear AUC maximization for imbalanced data sets Junjie Hu, Haiqin Yang, Michael R Lyu, Irwin King, and Anthony Man-Cho So IEEE transactions on neural networks and learning systems 2017 [Abs]

2016

  1. HCOMP
    Learning Lexical Entries for Robotic Commands via Paraphrasing Junjie Hu, Jean Oh, and Anatole Gershman In AAAI conference on Human Computation 2016 [Abs]
  2. ICLR
    Words or Characters? Fine-grained Gating for Reading Comprehension Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, and Ruslan Salakhutdinov. In International Conference on Learning Representations 2016 [Abs]

2015

  1. IEEE Cybern.
    Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems Wing W. Y. Ng, Junjie Hu, Daniel Yeung Yeung, Shaohua Yin, and Fabio Roli IEEE Transactions on Cybernetics 2015 [Abs]
  2. AAAI
    Kernelized Online Imbalanced Learning with Fixed Budgets Junjie Hu, Haiqin Yang, Irwin King, Michael Lyu, and Anthony Man-Cho So In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) 2015 [Abs]
  3. SOSE
    Ar-tracker: Track the dynamics of mobile apps via user review mining Cuiyun Gao, Hui Xu, Junjie Hu, and Yangfan Zhou In 2015 IEEE Symposium on Service-Oriented System Engineering 2015 [Abs]

Talks

  • Invited Talk at University of Cambridge, LTL Seminar, June 09, 2022.

  • Invited Talk at Lingustics Fridays Seminar at UW-Madison, April 01, 2022.

  • Invited Talk at Microsoft Azure Cognitive Services Research, January 20, 2022.

  • Invited Talk at Bay Area NLP Seminar, November 18, 2021.

  • Invited Talk at ICTR Seminar at UW-Madison, October 26, 2021.

  • Invited Talk at Microsoft Research Summit, October 21, 2021.

  • Invited Talk at CIBM Seminar at UW-Madison, October 19, 2021.

  • Invited Talk at IFDS Ideas Forum at UW-Madison, October 11, 2021.

  • XTREME: A Massively Multilingual Multi-task Benchmarkfor Evaluating Cross-lingual Generalization, Junjie Hu, LTI Summer Seminar Series at Carnegie Mellon University, Pittsburgh, July 2, 2020.

  • Pre-training of Multilingual Encoder for Crosslingual Transfer, Junjie Hu, Google Translate Team, Mountain View, August 20 2019.

  • Cross-Lingual and Cross Domain Transfer for Neural Machine Translation, Junjie Hu, AI Seminar at Carnegie Mellon University, Pittsburgh April 30 2019.

  • Transfer Learning for Multilingual Neural Machine Translation, Junjie Hu, SMART-Select Workshop on Multilingual Models and Unsupervised NMT supported by DG Connect of the European Commission, Luxembourg, June 20 2019. Facebook AI Research Lab, Paris, June 21 2019.

  • Rethinking Visual Storytelling: What Makes A Good Story? Junjie Hu, Microsoft 365 AI Research, Redmond, August 23 2018.

  • Machine Reading Comprehension via Structural Tree Embeddings, Junjie Hu, Seminar at Chinese University of Hong Kong, March 5 2018.

  • Lorelei: Understanding Low Resource Languages, Pat Littell, Junjie Hu, Shruti Rijhwani, and Ruochen Xu. LTI Colloquium at Carnegie Mellon University, Pittsburgh, September 8, 2017.

  • Natural Communication for Human-Robot Collaboration, Junjie Hu, Symposium on Natural Communication for Human-Robot Collaboration, November 9, 2017.

Selected Awards and Scholarships

  • CMU Graduate Student Assembly Dissertation Writing Group Grant, 2020

  • CMU Graduate Student Assembly Conference Travel Grant, 2020

  • NAACL 2019 Best Demonstration Paper Nomination, 2019

  • Graduate Research Scholarship, Carnegie Mellon University, 2015-2021

  • Postgraduate Scholarship, The Chinese University of Hong Kong, 2013-2015

  • Certificate of Merit for Teaching Assistantship, Department of CSE, Chinese University of Hong Kong, 2013-2014

  • IBM Outstanding Student Scholarship (1 of 77 winners in China), 2012-2013

  • Outstanding Undergraduate Awards by China Computer Federation (99 winners), 2012-2013

  • National Scholarship, the Ministry of Education, 2010-2011, 2011-2012