Junjie Hu | 胡俊杰
Assistant Professor
Biostatistics & Medical Informatics
Computer Science
Data Science Institute
University of Wisconsin-Madison
Office: 4735 MSC, 420 North Charter Street, Madison, WI
Office phone: +1-6082656118
Email: junjie.hu@wisc.edu
[Research Statement]

About
I am an assistant professor with appointments in the Department of Biostatistics, Department of Computer Science and Data Science Institute at the University of Wisconsin-Madison. I obtained my Ph.D. from School of Computer Science at Carnegie Mellon University, where I worked with Jaime Carbonell and Graham Neubig. I have a broad interest in natural language processing and machine learning. In particular, I work on multilingual NLP, transfer learning, multimodal learning, and their applications to support human-machine communications. My research goal is to build robust intelligent systems that evolve with changes in the environment and interact with people speaking different languages. Prospective students: Thanks for your interest! I am always looking for excellent PhD students to join our lab. Please apply to the CS or BDS program, and mention my name in your application and research statement. UW-Madison is an excellent place for research, and Madison is a wonderful city to live in. Please check out these videos (Why UW-Madison, Madison). I’m also happy to work with masters or undergraduate students at UW-Madison. If you are interested, please send me an email. |
Research Group
I am really fortunate to work with a group of excellent students at UW-Madison. Stay tuned for our latest works! Graduate Students
|
Publications
2023
- ACLSingle Sequence Prediction over Reasoning Graphs for Multi-hop QA In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
- ACLIs Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
- ACLLocal Byte Fusion for Neural Machine Translation In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics 2023
- ACLMultimodal Prompt Retrieval for Generative Visual Question Answering In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL Findings) 2023
2022
- EMNLPBeyond Counting Datasets: Investigating Multilingual Dataset Construction and Necessary Resources In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022
- EMNLPUtilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings) 2022
- IEEE TPAMIVideo Pivoting Unsupervised Multi-modal Neural Machine Translation IEEE transactions on pattern analysis and machine intelligence (To Appear) 2022
- ACLDEEP: DEnoising Entity Pre-training for Neural Machine Translation In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2022
- ACLGlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2022
2021
- WMTPhrase-level Active Learning for Neural Machine Translation In The Sixth Conference on Machine Translation (WMT) 2021 [Abs] [Code]
- EMNLPAfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 [Abs] [Code]
- EMNLPXTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 [Abs] [Code]
- NAACLExplicit Alignment Objectives for Multilingual Bidirectional Encoders In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics 2021 [Abs] [Code]
- NAACLMultilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics 2021 [Abs] [Code]
2020
- ICMLXTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation In International Conference on Machine Learning (ICML) 2020 [Abs] [Code]
- ICMLOn Learning Language-Invariant Representations for Universal Machine Translation In International Conference on Machine Learning (ICML) 2020 [Abs]
- ACLUnsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020 [Abs]
- WorkshopTICO-19: the Translation Initiative for COvid-19 In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020 [Abs]
- AAAIWhat Makes A Good Story? Designing Composite Rewards for Visual Storytelling In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) 2020 [Code]
2019
- ACLDomain Adaptation of Neural Machine Translation by Lexicon Induction In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 [Abs] [Code]
- CIKMA hybrid retrieval-generation neural conversation model In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 [Code]
- EMNLPREO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
- EMNLPHandling Syntactic Divergence in Low-resource Machine Translation In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
- EMNLPUnsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019 [Abs]
- WNGTDomain Differential Adaptation for Neural Machine Translation In Proceedings of the 3rd Workshop on Neural Generation and Translation 2019 [Abs]
- NAACLcompare-mt: A Tool for Holistic Comparison of Language Generation Systems In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) 2019 [Abs] [Code] [Best Demon Nomination]
2018
- EMNLPRapid Adaptation of Neural Machine Translation to New Languages In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 [Abs] [Code]
- ACLAutomatic Estimation of Simultaneous Interpreter Performance In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018 [Abs]
- WMTContextual Encoding for Translation Quality Estimation In Proceedings of the Third Conference on Machine Translation: Shared Task Papers 2018 [Abs] [Code]
2017
- EMNLPStructural Embedding of Syntactic Trees for Machine Comprehension In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017 [Abs]
- ACLSemi-Supervised QA with Generative Domain-Adaptive Nets In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017 [Abs]
- AAAIAnswer-aware attention on grounded question answering in images In AAAI 2017 Fall Symposium on Natural Communication for Human-Robot Collaboration 2017
- IEEE TNNLSOnline nonlinear AUC maximization for imbalanced data sets IEEE transactions on neural networks and learning systems 2017 [Abs]
2016
- HCOMPLearning Lexical Entries for Robotic Commands via Paraphrasing In AAAI conference on Human Computation 2016 [Abs]
- ICLRWords or Characters? Fine-grained Gating for Reading Comprehension In International Conference on Learning Representations 2016 [Abs]
2015
- IEEE Cybern.Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems IEEE Transactions on Cybernetics 2015 [Abs]
- AAAIKernelized Online Imbalanced Learning with Fixed Budgets In Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) 2015 [Abs]
- SOSEAr-tracker: Track the dynamics of mobile apps via user review mining In 2015 IEEE Symposium on Service-Oriented System Engineering 2015 [Abs]
Preprints
2022
- arXiv
2019
- arXiv
2017
- arXivPrincipled hybrids of generative and discriminative domain adaptation arXiv preprint arXiv:1705.09011 2017
Teaching
CS 769 Advanced Natural Language Processing (Fall 2023)
Talks
Invited Talk at University of Cambridge, LTL Seminar, June 09, 2022.
Invited Talk at Lingustics Fridays Seminar at UW-Madison, April 01, 2022.
Invited Talk at Microsoft Azure Cognitive Services Research, January 20, 2022.
Invited Talk at Bay Area NLP Seminar, November 18, 2021.
Invited Talk at ICTR Seminar at UW-Madison, October 26, 2021.
Invited Talk at Microsoft Research Summit, October 21, 2021.
Invited Talk at CIBM Seminar at UW-Madison, October 19, 2021.
Invited Talk at IFDS Ideas Forum at UW-Madison, October 11, 2021.
XTREME: A Massively Multilingual Multi-task Benchmarkfor Evaluating Cross-lingual Generalization, Junjie Hu, LTI Summer Seminar Series at Carnegie Mellon University, Pittsburgh, July 2, 2020.
Pre-training of Multilingual Encoder for Crosslingual Transfer, Junjie Hu, Google Translate Team, Mountain View, August 20 2019.
Cross-Lingual and Cross Domain Transfer for Neural Machine Translation, Junjie Hu, AI Seminar at Carnegie Mellon University, Pittsburgh April 30 2019.
Transfer Learning for Multilingual Neural Machine Translation, Junjie Hu, SMART-Select Workshop on Multilingual Models and Unsupervised NMT supported by DG Connect of the European Commission, Luxembourg, June 20 2019. Facebook AI Research Lab, Paris, June 21 2019.
Rethinking Visual Storytelling: What Makes A Good Story? Junjie Hu, Microsoft 365 AI Research, Redmond, August 23 2018.
Machine Reading Comprehension via Structural Tree Embeddings, Junjie Hu, Seminar at Chinese University of Hong Kong, March 5 2018.
Lorelei: Understanding Low Resource Languages, Pat Littell, Junjie Hu, Shruti Rijhwani, and Ruochen Xu. LTI Colloquium at Carnegie Mellon University, Pittsburgh, September 8, 2017.
Natural Communication for Human-Robot Collaboration, Junjie Hu, Symposium on Natural Communication for Human-Robot Collaboration, November 9, 2017.
Selected Awards and Scholarships
CMU Graduate Student Assembly Dissertation Writing Group Grant, 2020
CMU Graduate Student Assembly Conference Travel Grant, 2020
NAACL 2019 Best Demonstration Paper Nomination, 2019
Graduate Research Scholarship, Carnegie Mellon University, 2015-2021
Postgraduate Scholarship, The Chinese University of Hong Kong, 2013-2015
Certificate of Merit for Teaching Assistantship, Department of CSE, Chinese University of Hong Kong, 2013-2014
IBM Outstanding Student Scholarship (1 of 77 winners in China), 2012-2013
Outstanding Undergraduate Awards by China Computer Federation (99 winners), 2012-2013
National Scholarship, the Ministry of Education, 2010-2011, 2011-2012