Figure 2 - available via license: Creative Commons Attribution 3.0 Unported
Content may be subject to copyright.
Source publication
Spoken language understanding is an important part of the human-machine dialogue system, intent detection is a sub-task of spoken language understanding, and it is very important. The accuracy of intent detection is directly related to the performance of semantic slot filling, and it is helpful to the following research of the dialogue system. Cons...
Context in source publication
Context 1
... the dialogue system, only when the user's topic domain is clearly defined, can the specific needs of the user be correctly analyzed, otherwise it will lead to wrong intent detection behind. Figure 2 is an instance diagram of the application of three tasks in spoken language understanding. When a user enters a query, it first needs to clear the user's input text belonging to a topic domain to "train" or "flight", due to the intent category is finer-grained than the topic domain, so we need to determine the user's intent, which is booking the ticket or refunding the ticket or querying time, according to the user's specific semantic information. ...
Similar publications
Scientific literature summarization aims to summarize related papers into a survey. It can help researchers, especially newcomers, to quickly know the current situation of their professional area from massive literature. Since the structure of documents is very important for scientific literature summarization, and it is obviously observed that the...
Citations
... Despite the significant progress in unimodal intent detection, there are still many problems and challenges in the real world, which include the following: users have a wide variety of expressions, and intentions may be expressed in different manners, making it difficult for the model to capture all possible forms of expressions; unimodal models may perform well on specific types of data, but their generalizability may be limited when encountering new or unseen data; unimodal models usually need to understand the user's context, but unimodal data may not provide enough contextual information, resulting in reduced accuracy of recognition [17]; the generalizability of data may be limited [18]; and models may struggle to effectively handle complex and variable data within modalities and enhance their generalizability [19]. In addition, with the increasing popularity of multimodal data, unimodal intent detection is gradually transitioning to multimodal intent detection. ...
... Subsequently, we utilize the transformer [42] as our final fusion module, which has demonstrated high effectiveness in performing modality fusion [43][44][45]. We first integrate all the features from the previous module into a corresponding (19) Fig. 7 MSAF input feature and output feature distribution Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Multimodal intent detection integrates various types of information to identify user intent, which is crucial for developing dialog systems that function effectively in complex, real-world environments. Current methods show potential for improving the exploration of connections between patterns and extracting key semantic features from nontextual data. Many researchers opt to fuse data at a single level. In this paper, the gradual fusion intent detection framework (GFIDF), which consists of two main modules, is proposed. The first module, the conical multilayer convolutional attention (CMCA) module, uses a conical multilayer convolutional architecture. This architecture allows the module to capture both local and global contextual information, refining feature representations. The CMCA module is designed to eliminate noise and enhance feature quality by leveraging adaptive convolutional operations. These operations produce a clearer characterization of multimodal data that facilitates alignment and fusion in subsequent processing stages. The second module, the multimodal split and recombination attention (MSRA) module, matches and integrates augmented features from the CMCA module with textual information. This module segments multimodal features into distinct blocks to focus attention on individual segments. By utilizing a block-level attention mechanism, the MSRA module captures interdependencies between modalities, aiding in the understanding of user intent. Four performance metrics are employed for evaluation: accuracy (ACC), F1 score, precision (P), and recall (R). Compared with the baseline model, all the metrics show improvements ranging from 1% to 3%. Experiments validate the CMCA module’s noise reduction effects when processing video and audio modalities. Additionally, the results demonstrate the effectiveness of the MSRA module in fusing the three modal features.
... Intent classification techniques are essential for accurately identifying user intents across various fields, from Chatbots to customer service applications. Common methods in intent classification include supervised learning approaches, ranging from traditional Machine Learning approaches such as SVMs to deep learning-based methods such as CNNs [7] and RNNs, where labeled training data is used to classify intents into predefined categories [8] [9]. However, recent advances include transformer-based models like BERT [10] and GPT, which provide contextual embeddings that improve classification accuracy, especially in multi-intent and domain-specific applications [9]. ...
... Common methods in intent classification include supervised learning approaches, ranging from traditional Machine Learning approaches such as SVMs to deep learning-based methods such as CNNs [7] and RNNs, where labeled training data is used to classify intents into predefined categories [8] [9]. However, recent advances include transformer-based models like BERT [10] and GPT, which provide contextual embeddings that improve classification accuracy, especially in multi-intent and domain-specific applications [9]. Intent classification is increasingly used to meet specialized needs, leveraging advanced techniques to handle unique challenges in domains such as Customer Service and Finance. ...
... In customer service, for instance, intent classifiers analyze a broad range of customer queries, while in finance, they often handle sensitive tasks such as identifying transactionrelated intents. Few-shot and zero-shot learning methods are particularly used in these contexts [11], as they enable models to distinguish between existing intents more accurately, as well as to adapt to new intents with minimal labeled data, ensuring relevance even in fast-changing environments [9]. ...
... However, there are issues that NLP encounters in intent detection, asides comprehending colloquial language, dealing with ambiguous claims, and distinguishing between similar intents [Al-Garadi et al., 2021;Chang et al., 2020;Collins et al., 2021;Mageto, 2021;Mayrhofer et al., 2019] it also includes Lack of data sources, irregularity of user expression, implicit intent detection, and multiple intent detection [Liu, Li, Lin, 2019]. ...
... Technologies such as conversational agents, chatbots, the IoT, and virtual assistants, among others, need to do a good job of intent recognition, and science is rushing to do better in that sector. Word embedding and DL are among the recent technologies employed in the NLU process, achieving very promising results and adapting better to intelligent interfacing with humans, not only in text but also in voice and soon in videos and images (Liu et al. 2019a;Weld et al. 2021). ...
... Some difficulties in intent detection are listed in Liu et al. (2019a): lack of data sources, irregularity of user expression, implicit intent detection, and multiple intent detection. The methods for intent detection can be traditional, such as rule-based template semantic recognition or classification algorithms based on statistical features. ...
... The current state-of-the-art methods include text representation via embedding, CNNs, RNNs, long short-term memory (LSTM) networks, gated recurrent unit (GRU), attention mechanism, and capsule networks. These DL models greatly improve detection performance (Liu et al. 2019a). ...
It is increasingly common to use chatbots as an interface to services. One of the main components of a chatbot is the Natural Language Understanding (NLU) model, which is responsible for interpreting the text and extracting the intent and entities present in that text. It’s possible to focus only on one of these tasks of NLU, such as intent classification. To train an NLU intent classification model, it’s generally necessary to use a considerable amount of annotated data, where each sentence of the dataset receives a label indicating an intent. Performing manually labeling data is arduous and impracticable, depending on the data volume. Thus, an unsupervised machine learning technique, such as data clustering, could be applied to find and label patterns in the data. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. This paper extensively evaluates different text embedding models for clustering and labeling. We also apply some operations to improve the dataset’s quality, such as removing sentences and establishing various strategies for distance thresholds (cosine similarity) for the clusters’ centroids. Then, we trained some intent classification Models with two different architectures, one built with the Rasa framework and the other with a neural network (NN) using the attendance text from the Coronavirus Platform Service of Ceará, Brazil. We also manually annotated a dataset to be used as validation data. We conducted a study on semiautomatic labeling, implemented through clustering and visual inspection, which introduced some labeling errors to the intent classification models. However, it would be unfeasible to annotate the entire dataset manually. Nevertheless, results of competitive accuracy were still achieved with the trained models.
... As can be seen, employing a pseudo-labeling approach makes the chatbot model more adept at understanding user queries, even when users phrase their requests in unexpected or less common ways (Liu et al., 2019). The enhanced intent recognition capability leads to a more interactive user experience, increasing customer satisfaction and engagement on the platform. ...
... The second, novel approach, utilizes deep learning models based on neural networks and transformers, and offers significant improvements. The self-attention transformer model can capture various semantic features of sentences, making it particularly useful for detecting multiple intents within a single input (Liu et al., 2019;Manik et al., 2021). The transformer model DistilBERT is a version of the BERT model that has been simplified (or "distilled") to run more efficiently while still maintaining good performance (Hinton, 2015;Sanh, 2019). ...
Purpose of the study: This study presents an approach for improving the performance of natural language processing (NLP) models in pseudo-labeling tasks, with a particular focus on enhancing chatbot model intent recognition for business use cases. Methodology: The employed case study approach explores the pseudo-labeling technique and demonstrates a practical and efficient way to iteratively expand the original set of labeled data for the purpose of refining model training to achieve superior intent recognition accuracy in chatbots. Main Findings: The approach results in notable increases in macro-averaged F1 score and overall accuracy, particularly by iteratively re-training the model with progressively larger datasets. While enhancing the model's ability to generalize through difficult cases was effective, the study found that incorporating a full range of examples, including easy ones, yielded the best results. This comprehensive approach made the model better suited for real-world applications. Applications of the study: As chatbots are increasingly deployed in various sectors, including business, customer service, healthcare, and education, it becomes crucial for research to examine their long-term impact, scalability, and adaptability to ensure their effectiveness and sustainability in diverse contexts. Therefore, building more accurate chatbots, capable of understanding a wide range of user intents, is particularly valuable in real-world applications where chatbots need to respond to diverse, often complex and unpredictable user queries. Novelty/Originality of the study: Unlike traditional approaches, this study introduces a novel strategy of filling low-density regions in the dataset with pseudo-labels, allowing the model to better separate classes and handle semantically similar but varied messages. These advancements contribute to a more effective and scalable chatbot solution across diverse industries.
... In this paper, we focus on the SNIPS dataset, trying to avoid possible side effects due to imbalance or a high number of intents. Intent detection is usually approached as a supervised classification task that associates the entire input sentence with a label (or intent) of a finite set of classes [28]. Given the ability of RNNs to capture temporal dependencies, RNNs have been widely used to solve intent detection problems [34]. ...
Intent detection is a text classification task whose aim is to recognize and label the semantics behind a users query. It plays a critical role in various business applications. The output of the intent detection module strongly conditions the behavior of the whole system. This sequence analysis task is mainly tackled using deep learning techniques. Despite the widespread use of these techniques, the internal mechanisms used by networks to solve the problem are poorly understood. Recent lines of work have analyzed the computational mechanisms learned by RNNs from a dynamical systems perspective. In this work, we investigate how different RNN architectures solve the SNIPS intent detection problem. Sentences injected into trained networks can be interpreted as trajectories traversing a hidden state space. This space is constrained to a low-dimensional manifold whose dimensionality is related to the embedding and hidden layer sizes. To generate predictions, RNN steers the trajectories towards concrete regions, spatially aligned with the output layer matrix rows directions. Underlying the system dynamics, an unexpected fixed point topology has been identified with a limited number of attractors. Our results provide new insights into the inner workings of networks that solve the intent detection task.
... Todos os trabalhos listados nesta seção referem-se a estudos secundários, com temáticas próximas ao estudo realizado. [5], [33], [51] e [17], além de ser abordado neste MSL. Apesar de [33] ter o seu escopo mais próximo a este MSL em relação aos demais trabalhos relacionados, seu foco está mais nos algoritmos e comparação dos resultados experimentais e menos no mapeamento da área, tipos de ferramentas, datasets e áreas de negócio. ...
... [5], [33], [51] e [17], além de ser abordado neste MSL. Apesar de [33] ter o seu escopo mais próximo a este MSL em relação aos demais trabalhos relacionados, seu foco está mais nos algoritmos e comparação dos resultados experimentais e menos no mapeamento da área, tipos de ferramentas, datasets e áreas de negócio. ...
... Liu et al. [33] apresentam a dificuldade do processo de detecção de intenção em sistemas de diálogos humano-computador, mostrando os métodos tradicionais de detecção de intenção (única), em comparação com métodos de aprendizagem profunda. O estudo considera ainda como aplicar este tipo de modelo à tarefa de detecção multi-intenção, visando promover a pesquisa de métodos desta natureza baseados em redes neurais profundas. ...
... SpeechIC [7] is a medical classification model which extracts intent from text and audio domains. Liu et al. [8] design a model that uses both textual features and acoustic features for medical intent detection, and learns the embedding from textual and acoustic features through a convolutional neural network (CNN), then directly fuses the features of the two domains. But the success of these methods is due to the powerful feature extraction capabilities of CNN, which does not specifically analyze intent features related to medical symptoms speech. ...
Intent is defined for understanding spoken language
in existing works. Both textual features and acoustic features
involved in medical speech contain intent, which is important
for symptomatic diagnosis. In this paper, we propose a medical
speech classification model named DRSC that automatically
learns to disentangle intent and content representations from
textual-acoustic data for classification. The intent representations
of the text domain and the Mel-spectrogram domain are extracted
via intent encoders, and then the reconstructed text feature and
the Mel-spectrogram feature are obtained through two exchanges.
After combining the intent from two domains into a joint
representation, the integrated intent representation is fed into
a decision layer for classification. Experimental results show that
our model obtains an average accuracy rate of 95% in detecting
25 different medical symptoms.
... A comparison between different intent detection schemes has been researched in the work proposed by Liu et al. [7]. They have covered various deep learning techniques to emulate the heuristic-based techniques in terms of multi-intent detection. ...
Natural Language Processing (NLP) is one of the Artificial Intelligence applications that is entitled to allow computers to process and understand human language. These models are utilized to analyze large volumes of text and also support aspects like text summarization, language translation, context modeling, and sentiment analysis. Natural language, a subset of Natural Language Understanding (NLU), turns natural language into structured data. NLU accomplishes intent classification and entity extraction. The paper focuses on a pipeline to maximize the coverage of a conversational AI (chatbot) by extracting maximum meaningful intents from a data corpus. A conversational AI can best answer queries with respect to the dataset if it is trained on the maximum number of intents that can be gathered from the dataset which is what we focus on getting in this paper. The higher the intent we gather from the dataset, the more of the dataset we cover in training the conversational AI. The pipeline is modularized into three broad categories - Gathering the intents from the corpus, finding misspellings and synonyms of the intents, and finally deciding the order of intents to be picked up for training any classifier ML model. Several heuristic and machine-learning approaches have been considered for optimum results. For finding misspellings and synonyms, they are extracted through text vector neural network-based algorithms. Then the system concludes with a suggestive priority list of intents that should be fed to a classification model. In the end, an example of three intents from the corpus is picked, and their order is suggested for the optimum functioning of the pipeline. This paper attempts to pick intents in descending order of their coverage in the corpus in the most optimal way possible.
... Utilizing the ATIS benchmark dataset yielded a 20.84% ASR error rate reduction and a 37.48% SLU robustness increase. Liu et al. (2019) present human action and sign language recognition (HASLR), a novel system designed for the visually impaired individuals. HASLR combines human activity recognition (HAR) and sign language recognition (SLR) interfaces, promoting improved communication and comprehension. ...
... HASLR combines human activity recognition (HAR) and sign language recognition (SLR) interfaces, promoting improved communication and comprehension. This amalgamation enhances the recognition and interpretation of human actions and sign language gestures, fostering more efficient communication between the visually impaired individuals and their counterparts (Liu et al., 2019). ...
Purpose
Assistive technology has been developed to assist the visually impaired individuals in their social interactions. Specifically designed to enhance communication skills, facilitate social engagement and improve the overall quality of life, conversational assistive technologies include speech recognition APIs, text-to-speech APIs and various communication tools that are real. Enable real-time interaction. Using natural language processing (NLP) and machine learning algorithms, the technology analyzes spoken language and provides appropriate responses, offering an immersive experience through voice commands, audio feedback and vibration alerts.
Design/methodology/approach
These technologies have demonstrated their ability to promote self-confidence and self-reliance in visually impaired individuals during social interactions. Moreover, they promise to improve social competence and foster better relationships. In short, assistive technology in conversation stands as a promising tool that empowers the visually impaired individuals, elevating the quality of their social engagement.
Findings
The main benefit of assistive communication technology is that it will help visually impaired people overcome communication barriers in social contexts. This technology helps them communicate effectively with acquaintances, family, co-workers and even strangers in public places. By enabling smoother and more natural communication, it works to reduce feelings of isolation and increase overall quality of life.
Originality/value
Research findings include successful activity recognition, aligning with activities on which the VGG-16 model was trained, such as hugging, shaking hands, talking, walking, waving and more. The originality of this study lies in its approach to address the challenges faced by the visually impaired individuals in their social interactions through modern technology. Research adds to the body of knowledge in the area of assistive technologies, which contribute to the empowerment and social inclusion of the visually impaired individuals.