Publications | Liliang Ren

2023

NeurIPS
Sparse Modular Activation for Efficient Sequence Modeling

Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, and ChengXiang Zhai

Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023

Abs Bib Code Poster Slides Paper

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.
@article{ren2023sparse, title = {Sparse Modular Activation for Efficient Sequence Modeling}, author = {Ren, Liliang and Liu, Yang and Wang, Shuohang and Xu, Yichong and Zhu, Chenguang and Zhai, ChengXiang}, year = {2023}, month = dec, journal = {Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems,}, }
ACL DialDoc
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, and ChengXiang Zhai

Proceedings of the ACL2023 Workshop on Document-grounded Dialogue and Conversational Question Answering, Jul 2023

Abs Bib Code Slides Paper

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative loglikelihood-based scorer with our proposed CPMI scorer, we achieve a relative 60.5% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.
@article{ren2023cpmi, title = {C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation}, author = {Ren, Liliang and Sidhu, Mankeerat and Zeng, Qi and Reddy, Revanth Gangi and Ji, Heng and Zhai, ChengXiang}, year = {2023}, month = jul, journal = {Proceedings of the ACL2023 Workshop on Document-grounded Dialogue and Conversational Question Answering,}, }

2022

EMNLP
Oral
Language Model Pre-Training with Sparse Latent Typing

Liliang Ren*, Zixuan Zhang*, Han Wang, Clare Voss, Chengxiang Zhai, and Heng Ji

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs Bib Code Slides Paper

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.
@article{ren2022, title = {Language Model Pre-Training with Sparse Latent Typing}, author = {Ren*, Liliang and Zhang*, Zixuan and Wang, Han and Voss, Clare and Zhai, Chengxiang and Ji, Heng}, journal = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,}, month = dec, year = {2022}, }

2021

ACL
HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction

Liliang Ren, Chenkai Sun, Heng Ji, and Julia Hockenmaier

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021

Abs Bib Code Poster Slides Paper

Text-to-Graph extraction aims to automatically extract information graphs consisting of mentions and types from natural language texts. Existing approaches, such as table filling and pairwise scoring, have shown impressive performance on various information extraction tasks, but they are difficult to scale to datasets with longer input texts because of their second-order space/time complexities with respect to the input length. In this work, we propose a Hybrid Span Generator (HySPA) that invertibly maps the information graph to an alternating sequence of nodes and edge types, and directly generates such sequences via a hybrid span decoder which can decode both the spans and the types recurrently in linear time and space complexities. Extensive experiments on the ACE05 dataset show that our approach also significantly outperforms state-of-the-art on the joint entity and relation extraction task.
@article{ren-etal-2021-hyspa, title = {{H}y{SPA}: Hybrid Span Generation for Scalable Text-to-Graph Extraction}, author = {Ren, Liliang and Sun, Chenkai and Ji, Heng and Hockenmaier, Julia}, journal = {Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, }, month = aug, year = {2021}, address = {Online}, publisher = {Association for Computational Linguistics}, paper = {https://aclanthology.org/2021.findings-acl.356}, doi = {10.18653/v1/2021.findings-acl.356}, pages = {4066--4078}, }

2020

preprint

A Simple Fix for Convolutional Neural Network via Coordinate Embedding

Liliang Ren, and Zhuonan Hao

arXiv preprint, Aug 2020

Bib Paper

@article{ren2020simple,
  title = {A Simple Fix for Convolutional Neural Network via Coordinate Embedding},
  author = {Ren, Liliang and Hao, Zhuonan},
  journal = {arXiv preprint, },
  year = {2020},
}

2019

EMNLP
Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation

Liliang Ren, Jianmo Ni, and Julian McAuley

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov 2019

Abs Bib Code Poster Slides Paper

Existing approaches to dialogue state tracking rely on pre-defined ontologies consisting of a set of all possible slot types and values. Though such approaches exhibit promising performance on single-domain benchmarks, they suffer from computational complexity that increases proportionally to the number of pre-defined slots that need tracking. This issue becomes more severe when it comes to multi-domain dialogues which include larger numbers of slots. In this paper, we investigate how to approach DST using a generation framework without the pre-defined ontology list. Given each turn of user utterance and system response, we directly generate a sequence of belief states by applying a hierarchical encoder-decoder structure. In this way, the computational complexity of our model will be a constant regardless of the number of pre-defined slots. Experiments on both the multi-domain and the single domain dialogue state tracking dataset show that our model not only scales easily with the increasing number of pre-defined domains and slots but also reaches the state-of-the-art performance.
@article{ren-etal-2019-scalable, title = {Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation}, author = {Ren, Liliang and Ni, Jianmo and McAuley, Julian}, journal = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), }, month = nov, year = {2019}, address = {Hong Kong, China}, publisher = {Association for Computational Linguistics}, paper = {https://aclanthology.org/D19-1196}, doi = {10.18653/v1/D19-1196}, pages = {1876--1885}, }

preprint

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Liliang Ren, Gen Sun, and Jiaman Wu

arXiv preprint, Nov 2019

Bib Code Paper

@article{ren2019rongba,
  title = {RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping},
  author = {Ren, Liliang and Sun, Gen and Wu, Jiaman},
  journal = {arXiv preprint, },
  year = {2019},
}

2018

SIGDIAL
Cost-Sensitive Active Learning for Dialogue State Tracking

Kaige Xie, Cheng Chang, Liliang Ren, Lu Chen, and Kai Yu

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Jul 2018

Abs Bib Paper

Dialogue state tracking (DST), when formulated as a supervised learning problem, relies on labelled data. Since dialogue state annotation usually requires labelling all turns of a single dialogue and utilizing context information, it is very expensive to annotate all available unlabelled data. In this paper, a novel cost-sensitive active learning framework is proposed based on a set of new dialogue-level query strategies. This is the first attempt to apply active learning for dialogue state tracking. Experiments on DSTC2 show that active learning with mixed data query strategies can effectively achieve the same DST performance with significantly less data annotation compared to traditional training approaches.
@article{xie-etal-2018-cost, title = {Cost-Sensitive Active Learning for Dialogue State Tracking}, author = {Xie, Kaige and Chang, Cheng and Ren, Liliang and Chen, Lu and Yu, Kai}, journal = {Proceedings of the 19th Annual {SIG}dial Meeting on Discourse and Dialogue, }, month = jul, year = {2018}, address = {Melbourne, Australia}, publisher = {Association for Computational Linguistics}, paper = {https://aclanthology.org/W18-5022}, doi = {10.18653/v1/W18-5022}, pages = {209--213}, }
EMNLP
Oral
Towards Universal Dialogue State Tracking

Liliang Ren, Kaige Xie, Lu Chen, and Kai Yu

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Oct 2018

Abs Bib Code Slides Paper

Dialogue state tracker is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn. However, for most current approaches, it’s difficult to scale to large dialogue domains. They have one or more of following limitations: (a) Some models don’t work in the situation where slot values in ontology changes dynamically; (b) The number of model parameters is proportional to the number of slots; (c) Some models extract features based on hand-crafted lexicons. To tackle these challenges, we propose StateNet, a universal dialogue state tracker. It is independent of the number of values, shares parameters across all slots, and uses pre-trained word vectors instead of explicit semantic dictionaries. Our experiments on two datasets show that our approach not only overcomes the limitations, but also significantly outperforms the performance of state-of-the-art approaches.
@article{ren-etal-2018-towards, title = {Towards Universal Dialogue State Tracking}, author = {Ren, Liliang and Xie, Kaige and Chen, Lu and Yu, Kai}, journal = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, }, month = oct, year = {2018}, address = {Brussels, Belgium}, publisher = {Association for Computational Linguistics}, paper = {https://aclanthology.org/D18-1299}, doi = {10.18653/v1/D18-1299}, pages = {2780--2786}, }