Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.
ACL DialDoc
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation
Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, and ChengXiang Zhai
Proceedings of the ACL2023 Workshop on Document-grounded Dialogue and Conversational Question Answering, Jul 2023
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative loglikelihood-based scorer with our proposed CPMI scorer, we achieve a relative 60.5% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.
2022
EMNLP Oral
Language Model Pre-Training with Sparse Latent Typing
Liliang Ren*, Zixuan Zhang*, Han Wang, Clare Voss, Chengxiang Zhai, and Heng Ji
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022
Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.
2021
ACL
HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction
Liliang Ren, Chenkai Sun, Heng Ji, and Julia Hockenmaier
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021
Text-to-Graph extraction aims to automatically extract information graphs consisting of mentions and types from natural language texts. Existing approaches, such as table filling and pairwise scoring, have shown impressive performance on various information extraction tasks, but they are difficult to scale to datasets with longer input texts because of their second-order space/time complexities with respect to the input length. In this work, we propose a Hybrid Span Generator (HySPA) that invertibly maps the information graph to an alternating sequence of nodes and edge types, and directly generates such sequences via a hybrid span decoder which can decode both the spans and the types recurrently in linear time and space complexities. Extensive experiments on the ACE05 dataset show that our approach also significantly outperforms state-of-the-art on the joint entity and relation extraction task.
2020
preprint
A Simple Fix for Convolutional Neural Network via Coordinate Embedding
Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation
Liliang Ren, Jianmo Ni, and Julian McAuley
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov 2019
Existing approaches to dialogue state tracking rely on pre-defined ontologies consisting of a set of all possible slot types and values. Though such approaches exhibit promising performance on single-domain benchmarks, they suffer from computational complexity that increases proportionally to the number of pre-defined slots that need tracking. This issue becomes more severe when it comes to multi-domain dialogues which include larger numbers of slots. In this paper, we investigate how to approach DST using a generation framework without the pre-defined ontology list. Given each turn of user utterance and system response, we directly generate a sequence of belief states by applying a hierarchical encoder-decoder structure. In this way, the computational complexity of our model will be a constant regardless of the number of pre-defined slots. Experiments on both the multi-domain and the single domain dialogue state tracking dataset show that our model not only scales easily with the increasing number of pre-defined domains and slots but also reaches the state-of-the-art performance.
preprint
RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping
Dialogue state tracking (DST), when formulated as a supervised learning problem, relies on labelled data. Since dialogue state annotation usually requires labelling all turns of a single dialogue and utilizing context information, it is very expensive to annotate all available unlabelled data. In this paper, a novel cost-sensitive active learning framework is proposed based on a set of new dialogue-level query strategies. This is the first attempt to apply active learning for dialogue state tracking. Experiments on DSTC2 show that active learning with mixed data query strategies can effectively achieve the same DST performance with significantly less data annotation compared to traditional training approaches.
EMNLP Oral
Towards Universal Dialogue State Tracking
Liliang Ren, Kaige Xie, Lu Chen, and Kai Yu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Oct 2018
Dialogue state tracker is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn. However, for most current approaches, it’s difficult to scale to large dialogue domains. They have one or more of following limitations: (a) Some models don’t work in the situation where slot values in ontology changes dynamically; (b) The number of model parameters is proportional to the number of slots; (c) Some models extract features based on hand-crafted lexicons. To tackle these challenges, we propose StateNet, a universal dialogue state tracker. It is independent of the number of values, shares parameters across all slots, and uses pre-trained word vectors instead of explicit semantic dictionaries. Our experiments on two datasets show that our approach not only overcomes the limitations, but also significantly outperforms the performance of state-of-the-art approaches.