ArticlePublisher preview available

Hierarchical LSTM network for text classification

To read the full-text of this research, you can request a copy directly from the authors.


Text classification has always been an important and practical issue so that we need to use the computer to classify and discover the information in the text. If we want to recognize the offending words in a text without human intervention, we should use this. In this article we will compare recurrent neural networks, convolutional neural networks and hierarchical attention networks with detailed information about each of which. We will represent a HAN model using Theano framework, which indicates more accurate validation for large datasets. For text classification problem in large datasets, we will use hierarchical attention networks to get a better result.
SN Applied Sciences (2019) 1:1124 |
Research Article
Hierarchical LSTM network fortext classication
KeivanBorna1 · RezaGhanbari1
© Springer Nature Switzerland AG 2019
Text classication has always been an important and practical issue so that we need to use the computer to classify and
discover the information in the text. If we want to recognize the oending words in a text without human intervention,
we should use this. In this article we will compare recurrent neural networks, convolutional neural networks and hierar-
chical attention networks with detailed information about each of which. We will represent a HAN model using Theano
framework, which indicates more accurate validation for large datasets. For text classication problem in large datasets,
we will use hierarchical attention networks to get a better result.
Keywords Computer science· Machine learning· Text classication· Hierarchical attention network
1 Introduction
Text classication is an important and practical issue that
can be used in many cases, like spam detection, smart
automatic customer reply, sentiment analysis. These are
commonly known as the most important topics in natural
language processing (NLP) and natural language genera-
tion (NLG). The main goal in text classication is to assign
text to one or more categories. Suppose in a profanity
check problem we have to nd the oensive words in
document. Nowadays, machine learning is the outstand-
ing way to create such classiers. These classiers are upon
classication rules. So with the help of labeled documents
we can create classiers. There are a lot of traditional meth-
ods for text classication, such as n-grams with a linear
model. Recent researches are using supervised and unsu-
pervised machine learning methods, such as convolutional
neural network (CNN) [1], recurrent neural network (RNN)
or hierarchical neural network (HAN). In this article we
benchmark these three methods with creating a general
text classier using these three methods on GloVe d-300
dataset. Our primary contribution is benchmark these
methods and building a Hierarchical LSTM network, which
the input tensor is 3D rather than 2D to demonstrate doc-
uments as a hierarchical model and retrieve categories.
The key dierence to previous works is that our algorithm
uses tokens that are taken from context (not just ltering
sequences of tokens). In order to check the performance of
our model, we looked at three datasets, to compare CNN,
RNN and HAN. Our model uses hierarchical LSTM network.
2 Convolutional neural networks
Convolutional neural networks are group of neurons with
weights and biases that we can learn them. With the score
function, for example for a classication problem, from raw
text to categories, it receives inputs calculate a dierenti-
able score. For a common 3-layer neural network, a convo-
lutional neural network put its neurons in 3 dimensions (x,
y, z), in a Euclidean space. The duty of every layer in a CNN
is converting a 3-dimension input to a 3-dimension output
set of neurons. Actually the input layer is according to the
problem, which means the input layer value is 2D document
(rows, columns) and the other layers will hold characteristic
values for input properties. We can nd that every CNN is a
Received: 21 May 2019 / Accepted: 26 August 2019 / Published online: 30 August 2019
* Keivan Borna,; Reza Ghanbari, | 1Department ofComputer Science, Faculty ofMathematics
andComputer Science, Kharazmi University, Tehran, Iran.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... In addition, LSTM uses a structure called "cel state" to store and transmit information. The cell state can be regarded as the "memory" of LSTM, as it can retain previous information and pass it on to subsequent time steps During the training phase, LSTM updates the parameters in the model using the back propagation algorithm to minimize the error between predicted values and actual values During the prediction phase, LSTM takes the current input and the state of the previous time step as inputs, and outputs the predicted value for the current time step [46][47][48]. The architecture of the LSTM network is depicted in Figure 10. ...
... During the training phase, LSTM updates the parameters in the model using the backpropagation algorithm to minimize the error between predicted values and actual values. During the prediction phase, LSTM takes the current input and the state of the previous time step as inputs, and outputs the predicted value for the current time step [46][47][48]. The architecture of the LSTM network is depicted in Figure 10. ...
Full-text available
The utilization of solid waste for filling mining presents substantial economic and environmental advantages, making it the primary focus of current filling mining technology development. To enhance the mechanical properties of superfine tailings cemented paste backfill (SCPB), this study conducted response surface methodology experiments to investigate the impact of various factors on the strength of SCPB, including the composite cementitious material, consisting of cement and slag powder, and the tailings’ grain size. Additionally, various microanalysis techniques were used to investigate the microstructure of SCPB and the development mechanisms of its hydration products. Furthermore, machine learning was utilized to predict the strength of SCPB under multi-factor effects. The findings reveal that the combined effect of slag powder dosage and slurry mass fraction has the most significant influence on strength, while the coupling effect of slurry mass fraction and underflow productivity has the lowest impact on strength. Moreover, SCPB with 20% slag powder has the highest amount of hydration products and the most complete structure. When compared to other commonly used prediction models, the long-short term memory neural network (LSTM) constructed in this study had the highest prediction accuracy for SCPB strength under multi-factor conditions, with root mean square error (RMSE), correlation coefficient (R), and variance account for (VAF) reaching 0.1396, 0.9131, and 81.8747, respectively. By optimizing the LSTM using the sparrow search algorithm (SSA), the RMSE, R, and VAF improved by 88.6%, 9.4%, and 21.9%, respectively. The research results can provide guidance for the efficient filling of superfine tailings.
... The rise of artificial intelligence has greatly promoted the sustainable development of robots. In the field of retail customer service, intelligent customer service robots constructed using AI technologies can significantly reduce enterprises' labor costs [1][2][3]. Natural Language Processing (NLP) is also the fastest growing and most widely used field in AI. The processing of natural language uses linguistics, computer, mathematics and other sciences to understand, transform, produce and other operations to carry out information exchanges between humans and computers [4][5][6]. ...
Full-text available
Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, F1 value, Ma_F and Mi_F are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm's performance is better than that of the other three comparison algorithms.
... As the CNN network 16 might lose the position of a word and the order information of the work-order text during convolution and pooling operations, it cannot capture the global information of the work-order text well. Hence, we employ a Bi-LSTM network 17 to capture long-distance relationships in the contextual information of the representation of the word. We encode event extraction as a sequence classification problem and employ a CRF (Conditional Random Field) 18 classifier to assign tags in the BIO scheme. ...
Full-text available
Government hotline plays a significant role in meeting the demands of the people and resolving social conflicts in China. In this paper, we propose an automatic work‐order assignment method based on event extraction and external knowledge to address the problem of low efficiency with manual assignment for Chinese government hotline. Our proposed assignment method is composed of four parts: (1) Semantic encoding layer, which extracts semantic information from the work‐order text and obtains semantic representation vectors with contextual feature information. (2) Event extraction layer which extracts the local features and global features from the semantic representation vectors with the help of the CRF network to enhance event extraction effect. (3) External knowledge embedding layer, which integrates ‘rights and responsibilities lists’ with the historical information of the work‐order to assist assignment. (4) Assignment layer which completes work‐order assignment by combining two output vectors from event extraction layer and external knowledge embedding layer. Experimental results show our proposed method can achieve better assignment performance compared with several baseline methods.
... The LSTM model based on the attention mechanism refers to introducing the attention mechanism [19] into the LSTM model [20]. It is different from the traditional LSTM network because it introduces attention into output of the hidden layer, that is, the weight, which can be expressed as: ...
... The root mean square error (RMSE), correlation coefficient (R), variance account for (VAF) and running time of model (T/s) are used as evaluation metrics [66,67]. The equations of the first three metrics are as follows. ...
Full-text available
Cemented paste backfill (CPB) is wildly used in mines production practices around the world. The strength of CPB is the core of research which is affected by factors such as slurry concentration and cement content. In this paper, a research on the UCS is conducted by means of a combination of laboratory experiments and machine learning. BPNN, RBFNN, GRNN and LSTM are trained and used for UCS prediction based on 180 sets of experimental UCS data. The simulation results show that LSTM is the neural network with the optimal prediction performance (The total rank is 11). The trial-and-error, PSO, GWO and SSA are used to optimize the learning rate and the hidden layer nodes for LSTM. The comparison results show that GWO-LSTM is the optimal model which can effectively express the non-linear relationship between underflow productivity, slurry concentration, cement content and UCS in experiments (R=0.9915, RMSE = 0.0204, VAF = 98.2847 and T = 16.37 s). The correction coefficient (k) is defined to adjust the error between predicted UCS in laboratory (UCSM) and predicted UCS in actual engineering (UCSA) based on extensive engineering and experimental experience. Using GWO-LSTM combined with k, the strength of the filling body is successfully predicted for 153 different filled stopes with different stowing gradient at different curing times. This study provides both effective guidance and a new intelligent method for the support of safety mining.
Recently, the fine‐grained geolocalization of User‐Generated Short Text (UGST) has been increasingly attracting much attention. Accurate geolocation can benefit many applications, especially for the location‐based services. However, since the majority of UGSTs are short, noisy and not geotagged, existing methods still suffer from two issues. One is the heavy reliance on the manual features not fully exploiting the semantic information. Another is the free writing style of social media resulting in extremely few useful geo‐indicative information. To address these issues, we propose a novel Fine‐grained Geolocalization method for UGSTs with Preprocessing, location‐entity consistency Replacing, Filtering, Convolutional neural network (FG‐PRFC), which only relies on UGST itself. Compared to existing methods, FG‐PRFC has four unique characteristics: (1) We present a UGST‐oriented preprocessing method to obtain more semantic information. (2) To tackle the abbreviation issue, we develop a replacing method to allow geo‐indicative words behaving in the same way. (3) Following the idea of TFIDF, we weight the words in UGST and then develop a location‐free UGST filtering method. (4) We employ convolutional neural network to model the relationship between words and locations. Extensive experiments on three ground‐truth datasets demonstrate that our method has a significant improvement compared to state‐of‐art methods. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
This research proposes Long Short-Term Memory Network models with different stacked LSTM layers for farmer queries classification. The 23 different classes of 105,724 farmer queries were used to train the proposed LSTM models. Word to vector (Word2Vec), Global Vectors for Word Representation (GloVe), and FastText embedding techniques were compared on the query classification. The Word2vec embedding technique produced a better result than the GloVe and FastText embedding techniques in the farmer query classification. After an extensive simulation, the DLSTM network with three stacked LSTM layers achieved a testing accuracy of 90.35%. Classification performance of the proposed DLSTM network with three stacked LSTM layers in farmer queries was superior to Convolutional Neural Network (CNN), LSTM and other DLSTM models.
Full-text available
Long short-term memory (LSTM) models based on specialized deep neural network-based architecture have emerged as an important model for forecasting time-series. However, the literature does not provide clear guidelines for design choices, which affect forecasting performance. Such choices include the need for pre-processing techniques such as deseasonalization, ordering of the input data, network size, batch size, and forecasting horizon. We detail this in the context of short-term forecasting of global horizontal irradiance, an accepted proxy for solar energy. Particularly, short-term forecasting is critical because the cloud conditions change at a sub-hourly having large impacts on incident solar radiation. We conduct an empirical investigation based on data from three solar stations from two climatic zones of India over two seasons. From an application perspective, it may be noted that despite the thrust given to solar energy generation in India, the literature contains few instances of robust studies across climatic zones and seasons. The model thus obtained subsequently outperformed three recent benchmark methods based on random forest, recurrent neural network, and LSTM, respectively, in terms of forecasting accuracy. Our findings underscore the importance of considering the temporal order of the data, lack of any discernible benefit from data pre-processing, the effect of making the LSTM model stateful. It is also found that the number of nodes in an LSTM network, as well as batch size, is influenced by the variability of the input data.
Conference Paper
Full-text available
Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.
Full-text available
Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks.
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from insufficient training data. In this paper, we use the multi-task learning framework to jointly learn across multiple related tasks. Based on recurrent neural network, we propose three different mechanisms of sharing information to model text with task-specific and shared layers. The entire network is trained jointly on all these tasks. Experiments on four benchmark text classification tasks show that our proposed models can improve the performance of a task with the help of other related tasks.
Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at .
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We first show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static word vectors. The CNN models discussed herein improve upon the state-of-the-art on 4 out of 7 tasks, which include sentiment analysis and question classification.
Deep unordered composition rivals syntactic methods for text classification
  • M Lyyer
  • V Manjunatha
  • J Boyd-Garber
  • H Daume