Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Clinical named entity recognition (CNER) is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on recurrent neural networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a residual dilated convolutional neural network with conditional random field (RD-CNN-CRF) for the Chinese CNER, which makes the model asynchronous in computation and thus speeding up the training period dramatically. To be more specific, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags and obtain the optimal tag sequence for the entire sequence. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time. Index Terms-Electronic health records, clinical named entity recognition, residual dilated convolutional neural network, conditional random field.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This illustrates that DL research in the context of myocardial infarction primarily involves the use of machine learning techniques to extract ECG features for predicting and diagnosing risks and prognosis. CiteSpace 6.3.R1 was employed to create an insightful keyword clustering diagram where thematic labels for each cluster were meticulously selected utilizing the log-likelihood ratio (LLR) method ensuring that each cluster's essence is accurately captured and presented (21). A total of 21 primary clusters were extracted with the system selecting the top 18 clusters labeled as #0-#17 ( Figure 5B). ...
... In this clustering the modularity score (Q) is 0.7686 and the average silhouette score (S) is 0.8959. With an average silhouette score (S) greater than 0.3 and a modularity score (Q) exceeding 0.5 this clustering is deemed credible and effective (21). The top 10 clusters in descending order are: #0 feature extraction #1 classification #2 percutaneous coronary intervention #3 air pollution #4 deep learning #5 cardiac MRI #6 heart failure #7 machine learning #8 diagnostic model and #9 cardiovascular disease ( Figure 5B). ...
... With the emergence of deep learning technology, the field of mental infarction research has ushered in a major breakthrough. Deep learning, especially the convolutional neural network (CNN) and the recurrent neural network (RNN), has the ability to automatically learn and extract data features (21,24), thus greatly reducing the dependence on feature engineering. This technique can capture complex patterns and relationships from raw medical data, in turn, providing more accurate and comprehensive disease analysis. ...
Article
Full-text available
Purpose This study analyzed the research trends in machine learning (ML) pertaining to myocardial infarction (MI) from 2008 to 2024, aiming to identify emerging trends and hotspots in the field, providing insights into the future directions of research and development in ML for MI. Additionally, it compared the contributions of various countries, authors, and agencies to the field of ML research focused on MI. Method A total of 1,036 publications were collected from the Web of Science Core Collection database. CiteSpace 6.3.R1, Bibliometrix, and VOSviewer were utilized to analyze bibliometric characteristics, determining the number of publications, countries, institutions, authors, keywords, and cited authors, documents, and journals in popular scientific fields. CiteSpace was used for temporal trend analysis, Bibliometrix for quantitative country and institutional analysis, and VOSviewer for visualization of collaboration networks. Results Since the emergence of research literature on medical imaging and machine learning (ML) in 2008, interest in this field has grown rapidly, particularly since the pivotal moment in 2016. The ML and MI domains, represented by China and the United States, have experienced swift development in research after 2015, albeit with the United States significantly outperforming China in research quality (as evidenced by the higher impact factors of journals and citation counts of publications from the United States). Institutional collaborations have formed, notably between Harvard Medical School in the United States and Capital Medical University in China, highlighting the need for enhanced cooperation among domestic and international institutions. In the realm of MI and ML research, cooperative teams led by figures such as Dey, Damini, and Berman, Daniel S. in the United States have emerged, indicating that Chinese scholars should strengthen their collaborations and focus on both qualitative and quantitative development. The overall direction of MI and ML research trends toward Medicine, Medical Sciences, Molecular Biology, and Genetics. In particular, publications in “Circulation” and “Computers in Biology and Medicine” from the United States hold prominent positions in this study. Conclusion This paper presents a comprehensive exploration of the research hotspots, trends, and future directions in the field of MI and ML over the past two decades. The analysis reveals that deep learning is an emerging research direction in MI, with neural networks playing a crucial role in early diagnosis, risk assessment, and rehabilitation therapy.
... I N recent years, artificial intelligence techniques have become a hot topic for research in the medical field. Chinese medical Named Entity Recognition [1] is a task to extract key information from texts of the Chinese medical domain, such as symptoms, diseases, and anatomy. This task is usually solved as a sequence labeling problem, where the input text is the clinical records and the output is a labeled sequence of the input text. ...
... In the experiments of CCKS2017, batch-size is set to 16, lr is set to 1e-4, and epoch is set to 30, and in the experiments of CCKS2019, batch-size is set to 8, lr is set to 3e-4. After the experiments, the local interval length list of CCKS2017 takes the value [1,3,5], and the local interval length list of CCKS2019 takes the value [1,5,7]. ...
... In the experiments of CCKS2017, batch-size is set to 16, lr is set to 1e-4, and epoch is set to 30, and in the experiments of CCKS2019, batch-size is set to 8, lr is set to 3e-4. After the experiments, the local interval length list of CCKS2017 takes the value [1,3,5], and the local interval length list of CCKS2019 takes the value [1,5,7]. ...
Article
Full-text available
Chinese medical Named Entity Recognition (NER) is a task of Natural Language Processing (NLP), which aims to extract key information from Chinese medical texts. Recently, Transformer becomes the mainstream approach for NLP because of its powerful global feature extraction capability. However, entities usually appear in the form of subsequences in NER, therefore the local features are not negligible, and the uncertainty of Chinese word segmentation increases the difficulty of this task. In this paper, we propose a network structure that combines global feature extraction and multi-local feature extraction to enhance the performance of Chinese medical NER. Based on the global feature extraction by Transformer, we propose to use Bi-LSTM with a context integration mechanism to extract multi-local features to enhance the local semantic information of sequences, which integrates contextual information from the future and the past into each cell, and generates different weights through the gate mechanism to enhance the representational ability of each cell and thus the semantic information of the local sequences. And a feature fusion method based on attention mechanism is proposed, which allows the decoder to better focus on the more important information for predicting the current character. During the global feature extraction, the flat-lattice structure is introduced to generate all the potential results of Chinese word segmentation, and a span-based relative position coding is generated to capture the sequence characteristics. Finally, a CRF with conditional constraints is used as the decoder of the model. Experimental results on two benchmark datasets show the effectiveness of our model, and the method significantly outperforms the state-of-the-art methods in the medical NER task, achieving F 1 value of 93.64% on CCKS2017 and 85.01% on CCKS2019.
... On the other side, geoscience vocabulary has more complicated word information characteristics and distribution rules than general vocabulary. Therefore, Chinese word segmentation in geoscience is a key problem to be resolved in information mining from geoscience text (Qiu et al. 2019). ...
... In this method, the sentences to be segmented are matched with dictionaries prepared in advance according to the rules. If the word in the dictionary is successfully matched, the consecutive characters in the sentence will be separated into one word (Qiu et al. 2019). According to different matching directions when scanning strings, strings can be divided using the forward maximum matching algorithm (Lei et al. 2014), backward maximum matching algorithm (Zhang et al. 2006), bidirectional scanning algorithm (Gai et al. 2014) and N-shortest path algorithm (Ke et al. 2019). ...
... Zhao et al. (2018) designed a dictionary structure that improved the recognition rate of unknown words through a dictionary loading function, and obtained the best segmentation result using the bidirectional matching algorithm, thus also improving ambiguous word recognition. However, these methods rely heavily on dictionaries and are unable to deal with new word detection very well; moreover, Chinese is a relatively complex and diverse language with a huge vocabulary, and its usage is too flexible and changeable, resulting in high cost for constructing domain dictionaries (Qiu et al. 2019). ...
Article
Full-text available
For geoscience text, rich domain corpora have become the basis of improving the model performance in word segmentation. However, the lack of domain-specific corpus with annotation labelled has become a major obstacle to professional information mining in geoscience fields. In this paper, we propose a corpus augmentation method based on Levenshtein distance. According to the technique, a geoscience dictionary of 20,137 words was collected and constructed by crawling the keywords from published papers in China National Knowledge Infrastructure (CNKI). The dictionary was further used as the main source of synonyms to enrich the geoscience corpus according to the Levenshtein distance between words. Finally, a Chinese word segmentation model combining the BERT, Bi-gated recurrent neural network (Bi-GRU), and conditional random fields (CRF) was implemented. Geoscience corpus composed of complex long specific vocabularies has been selected to test the proposed word segmentation framework. CNN-LSTM, Bi-LSTM-CRF, and Bi-GRU-CRF models were all selected to evaluate the effects of Levenshtein data augmentation technique. Experiments results prove that the proposed methods achieve a significant performance improvement of more than 10%. It has great potential for natural languages processing tasks like named entity recognition and relation extraction.
... However, in Chinese applications many existing algorithms struggle to achieve satisfactory results, leading to low recognition accuracy. This is primarily due to the absence of natural word segmentation in Chinese, 1. ...
... While this method yielded high precision, it lacked generalization, often missing multiple entities and incurring high costs. With the development of machine learning, models like Conditional Random Fields (CRFs) [1] were widely applied, enhancing both training speed and model portability. Qiu et al. demonstrated improved recognition accuracy using CRF models. ...
Article
Full-text available
Chinese Entity Recognition (CER) aims to extract key information entities from Chinese text data, supporting subsequent natural language processing tasks such as relation extraction, knowledge graph construction, and intelligent question answering. However, CER faces several challenges, including limited training corpora, unclear entity boundaries, and complex entity structures, resulting in low accuracy and a call for further improvements. To address issues such as high annotation costs and ambiguous entity boundaries, this paper proposes the SEMFF-CER model, a CER model based on semantic enhancement and multi-feature fusion. The model employs character feature extraction algorithms, SofeLexicon semantic enhancement for vocabulary feature extraction, and deep semantic feature extraction from pre-trained models. These features are integrated into the entity recognition process via gating mechanisms, effectively leveraging diverse features to enhance contextual semantics and improve recognition accuracy. Additionally, the model incorporates several optimization strategies: an adaptive loss function to balance negative samples and improve the F1 score, data augmentation to enhance model robustness, and dropout and Adamax optimization algorithms to refine training. The SEMFF-CER model is characterized by a low dependence on training corpora, fast computation speed, and strong scalability. Experiments conducted on four Chinese benchmark entity recognition datasets validate the proposed model, demonstrating superior performance over existing models with the highest F1 score.
... NER aims to extract valuable information from the text. While NER has achieved considerable success in English, Chinese Named Entity Recognition (CHNER) [5] poses greater complexity and difficulties. The complexity of CHNER is largely attributed to the abundance of homophones and the absence of clear boundaries in the language. ...
... Additionally, we compared our model with the latest models, as shown in Tables 6 and 7. The comparison models we selected include ELMo-lattice-LSTM-CRF [28], ACNN [23], RD-CNN-CRF [5], MKRGCN [29], MUSA-BiLSTM-CRF [24], AT-LatticeLSTM-CRF [21], FT-BERT-BiLSTM-CRF [30], ELMo-ET-CRF [31], RGT-CRF [32]. ...
Article
Full-text available
Chinese Clinical Named Entity Recognition (CNER) is a crucial step in extracting medical information and is of great significance in promoting medical informatization. However, CNER poses challenges due to the specificity of clinical terminology, the complexity of Chinese text semantics, and the uncertainty of Chinese entity boundaries. To address these issues, we propose an improved CNER model, which is based on multi-feature fusion and multi-scale local context enhancement. The model simultaneously fuses multi-feature representations of pinyin, radical, Part of Speech (POS), word boundary with BERT deep contextual representations to enhance the semantic representation of text for more effective entity recognition. Furthermore, to address the model’s limitation of focusing just on global features, we incorporate Convolutional Neural Networks (CNNs) with various kernel sizes to capture multi-scale local features of the text and enhance the model’s comprehension of the text. Finally, we integrate the obtained global and local features, and employ multi-head attention mechanism (MHA) extraction to enhance the model’s focus on characters associated with medical entities, hence boosting the model’s performance. We obtained 92.74%, and 87.80% F1 scores on the two CNER benchmark datasets, CCKS2017 and CCKS2019, respectively. The results demonstrate that our model outperforms the latest models in CNER, showcasing its outstanding overall performance. It can be seen that the CNER model proposed in this study has an important application value in constructing clinical medical knowledge graph and intelligent Q&A system.
... DL can further have many algorithms, where most of the time people use-Convolutional Neural Networks(CNN) and Recurrent Neural Networks(RNN) approaches [1]. CNN is very famous approach for solving the challenges in computer vision such as classification of images, face recognition [2] and voice identification, whereas, RNN deals with the sequence challenges [3]. A CNN extracts the characteristics from the videos and images, which could generate the result of compression and hence it can give birth to the solution for image classification issues. ...
... A residual dilated CNN with conditional random field(RD-CNNCRF) has been proposed to identify Chinese clinical named entities job [3]. A task called dictionary characteristic was used to identify hidden entities and a Character sequence labeling was directed to prevent the presence of noise in Chinese. ...
Article
Full-text available
Deep learning is the subfield of machine learning which performs data interpretation and integrates several layers of features to produce prediction outcomes. It has a significant performance in a wide range of sectors, specifically in the realm of image classification, object identification and segmentation. Deep learning algorithms have significantly enhanced the effectiveness of fine-grained classification tasks, which aims to distinguish among the sub-classes. In this review, a detailed analysis of the various deep learning models, comparative analysis and their frameworks, as well as model descriptions have been presented. Convolutional Neural Networks, have been found as the standard method for object recognition, computer vision, image classification, and other applications. However, as input data becomes more intricate, traditional convolutional neural network is no longer capable of delivering adequate results. As an outcome, the goal of this review article is to put several deep learning models along with their methodologies back to prominence and to present their findings on a wide range of popular databases.
... To address these issues, some scholars have proposed methods based on neural network learning. The most representative is the CNN-CRF model [17,18] and the BiLSTM-CRF model [12,19], which use neural networks and deep learning neural networks to autonomously pull out features at the character, word, and sentence tiers, lessening the subjectivity in feature selection and consequently enhancing the accuracy of recognition results [19]. Nonetheless, this technique demands high-quality annotated data in the medical field to ascertain the model's identification efficacy. ...
Article
Full-text available
To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision , macro avg Recall , weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score , macro avg F1-score , weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics ( micro average , macro average , weighted average ) under the same dataset conditions. In the case of weighted average, the Precision , Recall , and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision , Recall , and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision , Recall , and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision , Recall , and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER . Graphical Abstract Illustration of the proposed MF-MNER. The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented. (2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model. (3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network. (4) the multi-task entity recognition is decoded and output by conditional random field (CRF).
... Most named entity identi¯cation approaches leverage convolutional neural networks (CNNs) and recurrent neural networks (RNNs), 15,34 with RNN excelling in sequencebased modeling and bidirectional long-short memory models (BiLSTM) providing e®ective access to contextual information. The BiLSTM+CRF model 1,19,33,38,45 has become a prevalent infrastructure for NER using deep learning techniques. ...
Article
Full-text available
To improve the recognition ability of clinical named entity recognition (CNER) in a limited number of Chinese electronic medical records, it provides meaningful support for clinical advanced knowledge extraction. In this paper, using CCKS2019 Chinese electronic medical record as an experimental data source, a fusion model enhanced by knowledge graph (KG) is proposed, and the model is applied to specific Chinese CNER tasks. This study consists of three main parts: single-mode model construction and comparison experiment, KG enhancement experiment, and model fusion experiment. The model has achieved good performance in CNER from the results. The accuracy rate, recall rate, and F1 value are 83.825%, 84.705%, and 84.263%, respectively, which is the global optimal, which proves the effectiveness of the model. This provides a good help for further research of medical information.
... The above several articles have studied the grammar, structure, and other characteristics of Chinese text, and integrated these features into the deep learning model, and the overall recognition effect is good. More in-depth research on text grammar, structure and other features, conversion of these features into computer language, making them more suitable for deep learning models for recognition, and further improving the recognition rate of electronic medical record named entities is now a direction in this field.Qiu et al.[25] proposed a residual expansion convolutional neural network (RD-CNN-CRF) with conditional random fields, which improved the training speed of the recurrent neural network model, and the training speed and F1 value were improved compared with the BiLSTM-CRF model. Tang et al.[26] proposed a deep learning method (LM-Att-BiGRU-CRF) that integrates language model and attention mechanism, and achieved good results by using language model features and multi-head attention mechanism, with an F1 value of 91.34%. ...
... Relationship extraction refers to the extraction of specific relationships among entities from unstructured textual datasets. The methods include template-based relationship extraction, supervised learning-based relationship extraction, and weak learning-based relationship extraction [27]. The relationship extraction is the entity relationship extraction in the limited domain. ...
Article
Full-text available
The construction industry is characterized by long production cycles, poor mobility of workers, various kinds of outdoor operations and complex construction processes, leading to frequent safety accidents. To explore the occurrence rule of the construction accidents in building construction, this paper applied knowledge graph technology in the field of artificial intelligence to analyze construction accidents. Firstly, defining the conceptual architecture of the domain knowledge graph. Secondly, extracting key knowledge elements from construction accident data. The knowledge graph of construction accidents has been established by using the Neo4j graph database. Further, a construction accident analysis process based on the knowledge graph has been proposed. The intelligent analysis, such as query, statistical analysis and correlation path analysis for accident information have been conducted. The results shows that based on knowledge graph technology, construction accidents in visual graphics or tables could be visualized. The accident information in the form of knowledge could be saved and queried quickly. The study can provide knowledge support for accident prevention and improve the efficiency of accident analysis. Besides, it can provide innovative ideas as well as decision support for safety management.
... Deep learning methods are becoming increasingly sophisticated in the NER, which is largely attributed to the huge training parameters of neural networks, but this leads to an increase in the difficulty of model training and the need for device hardware. Qiu et al. [17] integrate CNN with CRF to ensure accurate entity recognition and at the same time greatly reduce the training time of the model. Wang et al. [18] proposed using token or text embedding as input into a hierarchical pyramid model for bidirectional interaction, resulting in positive results in solving nested named entity recognition problems. ...
Article
Full-text available
Meteorological reports are one of the most important means of recording the weather conditions of a place over a period of time, and the existence of a large number of meteorological reports creates a huge demand for text processing and information extraction. However, valuable data and information are still buried deep in the mountain of meteorological reports, and there is an urgent need for an automated information extraction technique to help people integrate data from multiple meteorological reports and perform data analysis for a more comprehensive understanding of a specific meteorological topic or domain. Named entity recognition (NER) technique can extract useful entity information from meteorological reports. By analyzing the characteristics of nested entities in meteorological reports, this paper further proposes to introduce Multi-Conditional Random Fields (Multi-CRF), which uses each layer of CRF to output the recognition results of each type of entities, which helps to solve the problem of identifying nested entities in meteorological reports. The experimental results show that our model achieves state-of-the-art results. The final recognition results provide effective data support for automatic text verification recognition in the meteorological domain and provide important practical value for the construction of knowledge graphs of related meteorological reports.
... IntelliMetric is a review system based on text content and shallow linguistic features. It is an automatic composition review system based on artificial intelligence [33]. The system integrates a series of advantages such as artificial intelligence, natural language processing and statistical technology, which can realize the evaluation and scoring of the content quality, chapter organization structure, writing style and writing habits of the essays to be reviewed [34]. ...
Article
Full-text available
As a subjective behaviour relying on expert experience, automatic evaluation of writing quality always remains a technical issue. It requires both effective semantic understanding and structure analysis towards writing contents. To deal with this challenge, this paper combines speed superiority of granular computing and the effective approximation ability of deep neural network towards nonlinear mapping relationships. On this basis, a granular computing-based deep neural network approach for automatic evaluation of writing quality, is developed in this paper. Specifically, the granular computing is used as the front-end processor of deep neural network, so as to reduce the following information density. Then, the deep neural network serves as the main backbone structure to extract semantic features of writing contents. Such combination of two modules can improve processing speed in large-scale textual analysis scenes, under insurance of evaluation performance. The simulation experiments are also conducted to test performance of the proposed technical framework, and the results show that both high accuracy and proper running speed are endowed with the proposal.
... From the results of the experimental data in Tables 5, 6 and 7, there is a difference in the effectiveness of NER based on different pretrained BERT models on the three EMR datasets. Using the same Tables 8, 9 and 10, the RD + CNN + CRF model uses a residual expansion convolutional neural network with dictionary features and conditional random fields [31]. Compared with the basic BiL-STM + CRF model, the three evaluation indicators improved, and the three F1 scores increased by 2.06%, 1.16% and 1.87%, respectively. ...
Article
Full-text available
Background Named entity recognition (NER) of electronic medical records is an important task in clinical medical research. Although deep learning combined with pretraining models performs well in recognizing entities in clinical texts, because Chinese electronic medical records have a special text structure and vocabulary distribution, general pretraining models cannot effectively incorporate entities and medical domain knowledge into representation learning; separate deep network models lack the ability to fully extract rich features in complex texts, which negatively affects the named entity recognition of electronic medical records. Methods To better represent electronic medical record text, we extract the text’s local features and multilevel sequence interaction information to improve the effectiveness of electronic medical record named entity recognition. This paper proposes a hybrid neural network model based on medical MC-BERT, namely, the MC-BERT + BiLSTM + CNN + MHA + CRF model. First, MC-BERT is used as the word embedding model of the text to obtain the word vector, and then BiLSTM and CNN obtain the feature information of the forward and backward directions of the word vector and the local context to obtain the corresponding feature vector. After merging the two feature vectors, they are sent to multihead self-attention (MHA) to obtain multilevel semantic features, and finally, CRF is used to decode the features and predict the label sequence. Results The experiments show that the F1 values of our proposed hybrid neural network model based on MC-BERT reach 94.22%, 86.47%, and 92.28% on the CCKS-2017, CCKS-2019 and cEHRNER datasets, respectively. Compared with the general-domain BERT-based BiLSTM + CRF, our F1 values increased by 0.89%, 1.65% and 2.63%. Finally, we analyzed the effect of an unbalanced number of entities in the electronic medical records on the results of the NER experiment.
... A BiLSTM-CRF model with a selfattention mechanism was proposed to integrate part-of-speech labeling information to capture the semantic features of input sequences for Chinese clinical NER (Wu et al., 2019). A residual dilated CNN (Convolution Neural Network) with CRF was also presented to enhance Chinese clinical NER in terms of computational performance and training time (Qiu et al., 2019). A BERT-BiLSTM-CRF model was proposed to use BERT embedding for character representation and to train the BiLSTM-CRF model to recognize complex named entities (Lee et al., 2022). ...
Conference Paper
Full-text available
This paper describes the ROCLING-2022 shared task for Chinese healthcare named entity recognition, including task description, data preparation, performance metrics, and evaluation results. Among ten registered teams, seven participating teams submitted a total of 20 runs. This shared task reveals present NLP techniques for dealing with Chinese named entity recognition in the healthcare domain. All data sets with gold standards and evaluation scripts used in this shared task are publicly available for future research.
... Therefore, it is the focus of this study to propose a method that can be computed in parallel and still has the function of LSTM to capture contextual information effectively. Therefore researchers started to focus on the use of CNN in CNER, proposing RD-CNN-CRF [28], ID-CNN [29]. However, CNN need to incorporate more convolutional layers in order to obtain contextual information, leading to many hyperparameters of the network and still requiring longer computation time. ...
Article
Full-text available
Objective Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time. Method Inspired by Transformer, we combine Transformer with Soft Term Position Lattice to form soft lattice structure Transformer, which models long-distance dependencies similarly to LSTM. Our model consists of four components: the WordPiece module, the BERT module, the soft lattice structure Transformer module, and the CRF module. Result Our experiments demonstrated that this approach increased the F1 by 1–5% in the CCKS NER task compared to other models based on LSTM with CRF and consumed less training time. Additional evaluations showed that lattice structure transformer shows good performance for recognizing long medical terms, abbreviations, and numbers. The proposed model achieve 91.6% f-measure in recognizing long medical terms and 90.36% f-measure in abbreviations, and numbers. Conclusions By using soft lattice structure Transformer, the method proposed in this paper captured Chinese words to lattice information, making our model suitable for Chinese clinical medical records. Transformers with Mutilayer soft lattice Chinese word construction can capture potential interactions between Chinese characters and words.
... A BiLSTM-CRF model with a self-attention mecha- nism was proposed to integrate part-of-speech labeling information to capture the semantic features of input sequences for Chinese clinical NER (Wu et al., 2019). A residual dilated CNN (Convolution Neural Network) with CRF was also presented to enhance Chinese clinical NER in terms of computational performance and training time (Qiu et al., 2019). An ME-MGNN (Multiple Embeddings enhanced Multi-Graph Neural Network) model was proposed to derive a character representation based on multiple embeddings at different granularities from the radical, character to word levels. ...
Conference Paper
Full-text available
This study describes the model design of the NCUEE-NLP system for the Chinese track of the SemEval-2022 MultiCoNER task. We use the BERT embedding for character representation and train the BiLSTM-CRF model to recognize complex named entities. A total of 21 teams participated in this track, with each team allowed a maximum of six submissions. Our best submission, with a macro-averaging F1-score of 0.7418, ranked the seventh position out of 21 teams.
... Research on building knowledge graphs has become very popular in recent years-researchers complete entity recognition and entity-relationship recognition by constructing novel computational architectures (Uzuner et al., 2010;Weng et al., 2017;Zhao et al., 2017;Cheng et al., 2019;Qiu et al., 2019;Wu et al., 2021). Related research on medical data to build knowledge graphs is continually emerging. ...
Article
Full-text available
As a typical knowledge-intensive industry, the medical field uses knowledge graph technology to construct causal inference calculations, such as “symptom-disease”, “laboratory examination/imaging examination-disease”, and “disease-treatment method”. The continuous expansion of large electronic clinical records provides an opportunity to learn medical knowledge by machine learning. In this process, how to extract entities with a medical logic structure and how to make entity extraction more consistent with the logic of the text content in electronic clinical records are two issues that have become key in building a high-quality, medical knowledge graph. In this work, we describe a method for extracting medical entities using real Chinese clinical electronic clinical records. We define a computational architecture named MLEE to extract object-level entities with “object-attribute” dependencies. We conducted experiments based on randomly selected electronic clinical records of 1,000 patients from Shengjing Hospital of China Medical University to verify the effectiveness of the method.
... Emotion is a temporally changing behavior, and its evolution will go through a certain period of time, so it is necessary to consider the pre-and post-dependency of emotional information [14]. Traditional dynamic models such as hidden Markov model (HMM) and conditional random field (CRF) have achieved better recognition performance than static models due to their inherent property of modeling temporal context information [15,16]. However, before and after temporal information considered by these models is relatively short, so the effect achieved is limited [17]. ...
Article
Full-text available
The increasing number of users on the internet and the massive digital music require efficient music retrieval means and a satisfactory retrieval experience for users. The objectives are to enable users to have a common emotional interaction with the emotional representation of the music itself in the process of experiencing music. Firstly, starting from the principle of multi-modal technology, Internet of Things sensors are used to recognize the emotional representation of music. Secondly, the Naive Bayes Classifier based on deep learning is used to classify the emotion of music works, as well as the emotion classification of users' feelings of music. Finally, the analysis of the emotional representation of music and emotional experience analysis of the user is carried out, and the accuracy of music emotion classification is also studied from different classification methods. The results show that under the Naive Bayes classification, the highest recognition rate of music in the emotionally exciting category is 0.53, the healing category is 0.32, the relaxation category is 0.16, the romance category is 0.30, the nostalgic category is 0.35, the loneliness category is 0.28, and the quiet category is 0.21. The highest recognition rate of loneliness in the user's emotions is 0.63, followed by nostalgia, excitement and romance, which are 0.55, 0.45 and 0.31, respectively. According to the analysis of user experience and the use of listening to songs, music representations such as loneliness are in line with user experience. Naive Bayes has the highest accuracy in music emotion classification in multi-source data, which is 86.64%. It has important reference significance for multimodal music emotional analysis and emotional resonance analysis between music and listeners.
... Entity recognition and relation extraction algorithms based on deep learning are hot issues of the current research. On entity recognition, Qiu et al. (2019) proposed a model combining the residual dilatation convolutional neural network and CRF, in which RD-CNN is used to capture context features, and CRF is used to capture the correlation of adjacent tags. Kong et al. (2021) combined the attention mechanism with the CNN; different convolutional kernels are integrated to capture context information, and the attention mechanism is applied to enhance the capture ability. ...
Article
Full-text available
This article presents a triple extraction technique for a power transformer fault information process based on a residual dilate gated convolution and self-attention mechanism. An optimized word input sequence is designed to improve the effectiveness of triple extraction. A residual dilate gated convolution is used to capture the middle-long distance information in the literature. A self-attention mechanism is applied to learn the internal information and capture the internal structure of input sequences. An improved binary tagging method with position information is presented to mark the start and the end of an entity, which improves the extraction accuracy. An object entity is obtained by a specific relationship r for a given subject. The nearest start-end pair matching the principle and probability estimation is applied to acquire the optimal solution of the set of triples. Testing results showed that the F1 score of the presented method is 91.98%, and the triple extraction accuracy is much better than the methods of BERT and Bi-LSTM-CRF.
... In the early time, many researches mainly focused on the traditional rule methods [4][5]. In recent years, with the development and maturity of artificial intelligence technology, many artificial intelligence technologies have been applied to the named entities and relations recognition, which include maximum entropy Markov model(MEMM) [6][7], conditional random field(CRF) [8][9][10], convolutional neural network(CNN) [11][12][13] , recurrent neural network(RNN) [14][15][16] and its improvement long short term memory(LSTM) model [17][18][19], as well as LSTM-CRF [20][21] model based on the combination of LSTM and conditional random field, etc. ...
Preprint
Full-text available
During the process of mechanical design, there exit a large amount of knowledge and information that needs to be obtained by consulting and referencing past cases. This type of knowledge is called ostensive knowledge in philosophy, and belongs to a typical relational tacit knowledge. On the one hand, ostensive knowledge cannot be expressed through words, numbers, scientific formulas, and coding procedures. On the other hand, knowledge is parasitic in cases and cannot exist independently from them. This paper introduces ostensive knowledge to the mechanical design field for the first time, and points out that the learning model of ostensive knowledge must have the memories of cases and the ability of case-based reasoning. On this basis, this paper focuses on the shaft parts, and proposes an ostensive knowledge learning model for shaft parts based on the conditional random field model. Finally, an experiment is carried out using actual engineering cases to analyze and verify the model, which shows that the model has good memory and reasoning capabilities for shaft parts cases.
... CCKS2017_2/) to perform a pretraining of the proposed HDL module. The test evaluation shows the accuracy of the pretrained HDL module is up to 92%, which is comparable in accuracy with other algorithms [39,40]. Then, the pretrained HDL is transferred using labeled CCNER data from CHPS. ...
Article
Full-text available
Objective: The high incidence of respiratory diseases has dramatically increased the medical burden under the COVID-19 pandemic in the year 2020. It is of considerable significance to utilize a new generation of information technology to improve the artificial intelligence level of respiratory disease diagnosis. Methods: Based on the semi-structured data of Chinese Electronic Medical Records (CEMRs) from the China Hospital Pharmacovigilance System, this paper proposed a bi-level artificial intelligence model for the risk classification of acute respiratory diseases. It includes two levels. The first level is a dedicated design of the “BiLSTM+Dilated Convolution+3D Attention+CRF” deep learning model that is used for Chinese Clinical Named Entity Recognition (CCNER) to extract valuable information from the unstructured data in the CEMRs. Incorporating the transfer learning and semi-supervised learning technique into the proposed deep learning model achieves higher accuracy and efficiency in the CCNER task than the popular “Bert+BiLSTM+CRF” approach. Combining the extracted entity data with other structured data in the CEMRs, the second level is a customized XGBoost to realize the risk classification of acute respiratory diseases. Results: The empirical study shows that the proposed model could provide practical technical support for improving diagnostic accuracy. Conclusion: Our study provides a proof-of-concept for implementing a hybrid artificial intelligence-based system as a tool to aid clinicians in tackling CEMR data and enhancing the diagnostic evaluation under diagnostic uncertainty.
... Furthermore, the entity type coverage was relatively small, and the effect of corpus set size on the experimental results and the effect of heterogeneous data on the experimental results were not illustrated in the experimental session. At present, entity recognition based on pretrained models and attention mechanism in the field of generic entity recognition is the mainstream [15][16][17][18][19][20], which gives important insight into the direction of entity recognition technology development in the military field. ...
Article
Full-text available
With respect to the fuzzy boundaries of military heterogeneous entities, this paper improves the entity annotation mechanism for entity with fuzzy boundaries based on related research works. This paper applies a BERT-BiLSTM-CRF model fusing deep learning and machine learning to recognize military entities, and thus, we can construct a smart military knowledge base with these entities. Furthermore, we can explore many military AI applications with the knowledge base and military Internet of Things (MIoT). To verify the performance of the model, we design multiple types of experiments. Experimental results show that the recognition performance of the model keeps improving with the increasing size of the corpus in the multidata source scenario, with the F-score increasing from 73.56% to 84.53%. Experimental results of cross-corpus cross-validation show that the more types of entities covered in the training corpus and the richer the representation type, the stronger the generalization ability of the trained model, in which the recall rate of the model trained with the novel random type corpus reaches 74.33% and the F-score reaches 76.98%. The results of the multimodel comparison experiments show that the BERT-BiLSTM-CRF model applied in this paper performs well for the recognition of military entities. The longitudinal comparison experimental results show that the F-score of the BERT-BiLSTM-CRF model is 18.72%, 11.24%, 9.24%, and 5.07% higher than the four models CRF, LSTM-CRF, BiLSTM-CR, and BERT-CRF, respectively. The cross-sectional comparison experimental results show that the F-score of the BERT-BiLSTM-CRF model improved by 6.63%, 7.95%, 3.72%, and 1.81% compared to the Lattice-LSTM-CRF, CNN-BiLSTM-CRF, BERT-BiGRU-CRF, and BERT-IDCNN-CRF models, respectively.
... In addition to the above, unlike the English syntactic structure, there are two approaches for Chinese named entity recognition based on characters [23] and words [24]. The character-based approach reduces the influence of unfamiliar words, but individual characters contain insufficient semantic information and are generally represented in combination by adding pinyin, radicals, etc [25]. ...
Article
Full-text available
Named entities are the main carriers of relevant medical knowledge in Electronic Medical Records (EMR). Clinical electronic medical records lead to problems such as word segmentation ambiguity and polysemy due to the specificity of Chinese language structure, so a Clinical Named Entity Recognition (CNER) model based on multi-head self-attention combined with BILSTM neural network and Conditional Random Fields is proposed. Firstly, the pre-trained language model organically combines char vectors and word vectors for the text sequences of the original dataset. The sequences are then fed into the parallel structure of the multi-head self-attention module and the BILSTM neural network module, respectively. By splicing the output of the neural network module to obtain multi-level information such as contextual information and feature association weights. Finally, entity annotation is performed by CRF. The results of the multiple comparison experiments show that the structure of the proposed model is very reasonable and robust, and it can effectively improve the Chinese CNER model. The model can extract multi-level and more comprehensive text features, compensate for the defect of long-distance dependency loss, with better applicability and recognition performance.
... Clinical NER has become an important research field in medical information extraction and healthcare data mining [11,12]. The development of clinical NER has basically undergone a transformation from rules to deep learning technology, mainly including the following methods. ...
Article
Full-text available
Background Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities. Methods To tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition. Results In order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset. Conclusions The comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.
... At present, Chinese medical entity recognition [9] still faces significant challenges for the following main reasons. First, considering the specificity of medical texts, there is no uniform set of nomenclature for clinical medical language [10]. ...
Article
Full-text available
The medical information carried in electronic medical records has high clinical research value, and medical named entity recognition is the key to extracting valuable information from large-scale medical texts. At present, most of the studies on Chinese medical named entity recognition are based on character vector model or word vector model. Owing to the complexity and specificity of Chinese text, the existing methods may fail to achieve good performance. In this study, we propose a Chinese medical named entity recognition method that fuses character and word vectors. The method expresses Chinese texts as character vectors and word vectors separately and fuses them in the model for features. The proposed model can effectively avoid the problems of missing character vector information and inaccurate word vector partitioning. On the CCKS 2019 dataset for the named entity recognition task of Chinese electronic medical records, the proposed model achieves good performance and can effectively improve the accuracy of Chinese medical named entity recognition compared with other baseline models.
... At the same time, they introduce three types of external information as inputs to the model. Qiu et al. [35] use Chinese characters and dictionary features as input and then feed them into the [36] propose a method that combines language model and multihead attention. Firstly, the sentence vectors are fed into BiGRU and the pretrained model. ...
Article
Full-text available
Clinical named entity recognition (CNER) identifies entities from unstructured medical records and classifies them into predefined categories. It is of great significance for follow-up clinical studies. Most of the existing CNER methods fail to give enough thought to Chinese radical-level characteristics and the specialty of the Chinese field. This paper proposes the Ra-RC model, which combines radical features and a deep learning structure to fix this problem. A bidirectional encoder representation of transformer (RoBERTa) is utilized to learn medical features thoroughly. Simultaneously, we use the bidirectional long short-term memory (BiLSTM) network to extract radical-level information to capture the internal relevance of characteristics and stitch the eigenvectors generated by RoBERTa. In addition, the relationship between labels is considered to obtain the optimal tag sequence by applying conditional random field (CRF). The experimental results demonstrate that the proposed Ra-RC model achieves F1 score 93.26% and 82.87% on the CCKS2017 and CCKS2019 datasets, respectively.
... Parameter estimation for the CRF was performed via maximum likelihood estimates. These ideas were further advanced by Qiu, et al [171] who used CRFs for clinical entity recognition in Chinese. Speech tagging from voice recordings was performed using a CRF devised by Khan and collaborators [172]. ...
Article
Full-text available
A random field is the representation of the joint probability distribution for a set of random variables. Markov fields, in particular, have a long standing tradition as the theoretical foundation of many applications in statistical physics and probability. For strictly positive probability densities, a Markov random field is also a Gibbs field, i.e., a random field supplemented with a measure that implies the existence of a regular conditional distribution. Markov random fields have been used in statistical physics, dating back as far as the Ehrenfests. However, their measure theoretical foundations were developed much later by Dobruschin, Lanford and Ruelle, as well as by Hammersley and Clifford. Aside from its enormous theoretical relevance, due to its generality and simplicity, Markov random fields have been used in a broad range of applications in equilibrium and non-equilibrium statistical physics, in non-linear dynamics and ergodic theory. Also in computational molecular biology, ecology, structural biology, computer vision, control theory, complex networks and data science, to name but a few. Often these applications have been inspired by the original statistical physics approaches. Here, we will briefly present a modern introduction to the theory of random fields, later we will explore and discuss some of the recent applications of random fields in physics, biology and data science. Our aim is to highlight the relevance of this powerful theoretical aspect of statistical physics and its relation to the broad success of its many interdisciplinary applications.
Patent
Full-text available
This patent presents an advanced approach to enhancing disease Named Entity Recognition (NER) in clinical and biomedical texts using a fine-tuned BERT model with transfer learning. Disease NER plays a pivotal role in extracting valuable insights from medical literature, yet challenges such as identifying rare diseases and addressing inconsistencies in tagging persist. Current NER models often struggle with novel and emerging disease entities within rapidly expanding biomedical datasets, emphasizing the need for more robust solutions. The proposed transformer-based BERT model demonstrates significant improvements, especially in low-resource settings, compared to existing methods. Evaluated using benchmark datasets, including the NCBI Disease Corpus and BioCreative V Chemical Disease Relation (BC5CDR) dataset, the model achieves F-1 scores of 0.864 and 0.917, respectively. By leveraging BERT’s contextual representation, this model addresses critical issues such as the variability in medical language and the ambiguity of disease entities, creating a more reliable and accurate framework for clinical NER systems. This research not only advances Natural Language Processing (NLP) applications in healthcare but also enhances the efficiency of information extraction from clinical literature. By improving the precision of disease recognition, this model has the potential to streamline data-driven decision-making in clinical practice and biomedical research, ultimately supporting national healthcare goals. The model’s ability to accurately process complex clinical texts contributes to improving patient outcomes, accelerating biomedical research, and enhancing the utility of healthcare data, aligning with U.S. efforts to drive innovation in medical technology and improve the quality of healthcare delivery.
Article
Full-text available
Named entity recognition (NER) is one of the preprocessing stages in natural language processing (NLP), which functions to detect and classify entities in the corpus. NER results are used in various NLP applications, including sentiment analysis, text summarization, chatbot, machine translation, and question answering. Several previous reviews partially discussed NER, for instance, NER reviews in specific domains, NER classification, and NER deep learning. This paper provides a comprehensive and systematic review on NER topic studies published from 2011 to 2020. The main contribution of this review is to present a comprehensive systematic literature review on NER from preprocessing techniques, datasets, application domains, feature extraction techniques, approaches, methods, and evaluation techniques. The result concludes that the deep learning approach and the Bi-directional long short-term memory with a conditional random field (Bi-LSTM-CRF) method are the most interesting methods among NER researchers. At the same time, medical and health are NER researchers' most popular domains. These developments have also led to an increasing number of public datasets in the medical and health fields. At the end of this review, we recommend some opportunities and challenges for NER research going forward.
Chapter
The field of electrical power encompasses a vast array of diverse information modalities, with textual data standing as a pivotal constituent of this domain. In this study, we harness an extensive corpus of textual data drawn from the electrical power systems domain, comprising regulations, reports, and other pertinent materials. Leveraging this corpus, we construct an Electrical Power Systems Corpus and proceed to annotate entities within this text, thereby introducing a novel Named Entity Recognition (NER) dataset tailored specifically for the electrical power domain. We employ an end-to-end deep learning model, the BERT-BiLSTM-CRF model, for named entity recognition on our custom electrical power domain dataset. This NER model integrates the BERT pre-trained model into the traditional BiLSTM-CRF model, enhancing its ability to capture contextual and semantic information within the text. Results demonstrate that the proposed model outperforms both the BiLSTM-CRF model and the BERT-softmax model in NER tasks across the electrical power domain and various other domains. This study contributes to the advancement of NER applications in the electrical power domain and holds significance for furthering the construction of knowledge graphs and databases related to electrical power systems.
Article
Cardiopulmonary and cardiovascular diseases are fatal factors that threaten human health and cause many deaths worldwide each year, so it is essential to screen cardiopulmonary disease more accurately and efficiently. Auscultation is a non-invasive method for physicians' perception of the disease. The Heart Sounds (HS) and Lung Sounds (LS) recorded by an electronic stethoscope consist of acoustic information that is helpful in the diagnosis of pulmonary conditions. Still, inter-interference between HS and LS presented in both the time and frequency domains blocks diagnostic efficiency. This paper proposes a blind source separation (BSS)strategy that first classifies Heart-Lung-Sound (HLS) according to its LS features and then separates it into HS and LS. Sparse Non-negative Matrix Factorization (SNMF) is employed to extract the LS features in HLS, then proposed a network constructed by Dilated Convolutional Neural Network (DCNN) to classify HLS into five types by the magnitude features of LS. Finally, Multi-Channel UNet (MCUNet) separation model is utilized for each category of HLS. This paper is the first to propose the HLS classification method SNMF-DCNN and apply UNet to the cardiopulmonary sound separation domain. Compared with other state-of-the-art methods, the proposed framework in this paper has higher separation quality and robustness.
Chapter
The aim of NER is to extract entities with actual meaning from massive unstructured text (Zhang et al. in Procedia Comput Sci 183:212–220, 2021 [1]). In the clinical and medical domain, clinical NER recognizes and classifies medical terms in unstructured medical text records, including symptoms, examinations, diseases, drugs, treatments, operations, and body parts. As a combination of structured and unstructured texts, the rapidly growing biomedical literature contains a significant amount of useful biomedical information.
Chapter
Named entity recognition (NER) is a commonly followed standard approach in natural language processing. For example, properly recognize the names of people, sites and establishments in a sentence; or some domain-specific nouns, specific objects, etc. Consequently, the natural language processing domain can more effectively tackle complex tasks, such as question answering, information extraction and machine transformation. The deep learning models which are already proposed are quite complicated and need high execution time. In this research work, the clinical entity recognition technique is proposed which is based on the three phases: pre-processing, boundary detection and classification. The approach of U-Net is applied proposed for the classification for the named entity recognition. The proposed model is implemented in Python, and results are analyzed in terms of accuracy, precision and recall. The proposed model shows approx. 5% improvements in the results for the entity recognition.KeywordsClinical entity recognitionBoundary detectionCNNUNetNLPDeep learning
Article
Full-text available
The outbreak of COVID-19 (coronavirus disease 2019) has generated a large amount of spatiotemporal data. Using a knowledge graph can help to analyze the transmission relationship between cases and locate the transmission path of the pandemic, but researchers have paid little attention to the spatial relationships between geographical entities related to the pandemic. Therefore, we propose a method for constructing a pandemic situation knowledge graph of COVID-19 that considers spatial relationships. First, we created an ontology design of the pandemic data in which spatial relationships are considered. We then constructed a non-spatial relationships extraction model based on BERT and a spatial relationships extraction model based on spatial analysis theory. Second, taking the pandemic and geographic data of Guangzhou as an example, we modeled a pandemic corpus. We extracted entities and relationships based on this model, and we constructed a pandemic situation knowledge graph that considers spatial relationships. Finally, we verified the feasibility of using this method as a visualization exploratory tool in the analysis of spatial characteristics, pandemic development situation, case sources, and case relationships analysis of pandemic-related areas.
Article
Medical named entity recognition (MNER) is a fundamental component of understanding the unstructured medical texts in electronic health records, and it has received widespread attention in both academia and industry. However, the previous approaches of MNER do not make full use of hierarchical semantics from morphology to syntactic relationships like word dependency. Furthermore, extracting entities from Chinese medical texts is a more complex task because it usually contains for example homophones or pictophonetic characters. In this paper, we propose a multi-level semantic fusion network for Chinese medical named entity recognition, which fuses semantic information on morphology, character, word and syntactic level. We take radical as morphology semantic, pinyin and character dictionary as character semantic, word dictionary as word semantic, and these semantic features are fused by BiLSTM to get the contextualized representation. Then we use a graph neural network to model word dependency as syntactic semantic to enhance the contextualized representation. The experimental results show the effectiveness of the proposed model on two public datasets and robustness in real-world scenarios.
Chapter
The automatic generation of electronic medical record (EMR) data aims to create EMRs from raw medical text (e.g., doctor-patient interrogation dialog text) without human efforts. A critical problem is how to accurately locate the medical entities mentioned in the doctor-patient interrogation text, as well as identify the state of each clinical entity (e.g., whether a patient genuinely suffers from the mentioned disease). Such precisely extracted medical entities and their states can facilitate clinicians to trace the whole interrogation process for medical decision-making. In this work, we annotate and release an online clinical dialog NER dataset that contains 72 types of clinical items and 3 types of states. Existing conventional named entity recognition (NER) methods only take a candidate entity’s surrounding context information into consideration. However, identifying the state of a clinical entity mentioned in a doctor-patient dialog turn requires the information across the whole dialog rather than only the current turn. To bridge the gap, we further propose CLINER, a CLinical Interrogation NER model, which exploits both fine-grained and coarse-grained information for each dialog turn to facilitate the extraction of entities and their corresponding states. Extensive experiments on the medical dialog information extraction (MIE) task and clinical interrogation named entity recognition task show that our approach shows significant performance improvement (3.72 on NER F1 and 6.12 on MIE F1) over the state-of-art on both tasks.KeywordsClinical named entity recognitionInformation extractionCoarse-grained and fine-grained contextHistorical pattern memoryBERT
Article
Clinical named entity recognition (CNER) is a fundamental step for many clinical Natural Language Processing (NLP) systems, which aims to recognize and classify clinical entities such as diseases, symptoms, exams, body parts and treatments in clinical free texts. In recent years, with the development of deep learning technology, deep neural networks (DNNs) have been widely used in Chinese clinical named entity recognition and many other clinical NLP tasks. However, these state-of-the-art models failed to make full use of the global information and multi-level semantic features in clinical texts. We design an improved character-level representation approach which integrates the character embedding and the character-label embedding to enhance the specificity and diversity of feature representations. Then, a multi-head self-attention based Bi-directional Long Short-Term Memory Conditional Random Field (MUSA-BiLSTM-CRF) model is proposed. By introducing the multi-head self-attention and combining a medical dictionary, the model can more effectively capture the weight relationships between characters and multi-level semantic feature information, which is expected to greatly improve the performance of Chinese clinical named entity recognition. We evaluate our model on two CCKS challenge (CCKS2017 Task 2 and CCKS2018 Task 1) benchmark datasets and the experimental results show that our proposed model achieves the best performance competing with the state-of-the-art DNN based methods.
Article
Objective External knowledge, such as lexicon of words in Chinese and domain knowledge graph (KG) of concepts, has been recently adopted to improve the performance of machine learning methods for named entity recognition (NER) as it can provide additional information beyond context. However, most existing studies only consider knowledge from one source (i.e., either lexicon or knowledge graph) in different ways and consider lexicon words or KG concepts independently with their boundaries. In this paper, we focus on leveraging multi-source knowledge in a unified manner where lexicon words or KG concepts are well combined with their boundaries for Chinese Clinical NER (CNER). Material and Methods We propose a novel method based on relational graph convolutional network (RGCN), called MKRGCN, to utilize multi-source knowledge in a unified manner for CNER. For any sentence, a relational graph based on words or concepts in each knowledge source is constructed, where lexicon words or KG concepts appearing in the sentence are linked to the containing tokens with the boundary information of the lexicon words or KG concepts. RGCN is used to model all relational graphs constructed from multi-source knowledge, and the representations of tokens from multi-source knowledge are integrated into the context representations of tokens via an attention mechanism. Based on the knowledge-enhanced representations of tokens, we deploy a conditional random field (CRF) layer for named entity label prediction. In this study, a lexicon of words and a medical knowledge graph are used as knowledge sources for Chinese CNER. Results Our proposed method achieves the best performance on CCKS2017 and CCKS2018 in Chinese with F1-scores of 91.88% and 89.91%, respectively, significantly outperforming existing methods. The extended experiments on NCBI-Disease and BC2GM in English also prove the effectiveness of our method when only considering one knowledge source via RGCN. Conclusion The MKRGCN model can integrate knowledge from the external lexicon and knowledge graph effectively for Chinese CNER and has the potential to be applied to English NER.
Conference Paper
Full-text available
Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a Residual Dilated Convolutional Neural Network with Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time.
Conference Paper
Full-text available
Article
Full-text available
Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/ . Contact: habibima@informatik.hu-berlin.de.
Conference Paper
Full-text available
Clinical named entity recognition (CNER) that identifies boundaries and types of medical entities, is a fundamental and crucial task in clinical natural language processing. Recent years have witnessed considerable progress in deep learning based algorithms, such as RNN, CNN and their integrated methods, which show the effectiveness in CNER. In this work, we propose a deep learning model for CNER that adopts bidirectional RNN-CRF architecture using concatenated n-gram character representation to capture rich context information. Second, we incorporate word segmentation results, part-of-speech (POS) tagging and medical vocabulary as features into our model. Further, the final output is delivered by the comparison between the separated models and the overall model. The proposed framework has been evaluated in CCKS2017 task2 dataset, achieving 90.10 F1-score for CNER.
Article
Full-text available
Drug-Named Entity Recognition (DNER) for biomedical literature is a fundamental facilitator of Information Extraction. For this reason, the DDIExtraction2011 (DDI2011) and DDIExtraction2013 (DDI2013) challenge introduced one task aiming at recognition of drug names. State-of-the-art DNER approaches heavily rely on hand-engineered features and domain-specific knowledge which are difficult to collect and define. Therefore, we offer an automatic exploring words and characters level features approach: a recurrent neural network using bidirectional long short-term memory (LSTM) with Conditional Random Fields decoding (LSTM-CRF). Two kinds of word representations are used in this work: word embedding, which is trained from a large amount of text, and character-based representation, which can capture orthographic feature of words. Experimental results on the DDI2011 and DDI2013 dataset show the effect of the proposed LSTM-CRF method. Our method outperforms the best system in the DDI2013 challenge.
Article
Full-text available
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
Conference Paper
Full-text available
State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction problems such as semantic segmentation are structurally different from image classification. In this work, we develop a new convolutional network module that is specifically designed for dense prediction. The presented module uses dilated convolutions to systematically aggregate multi-scale contextual information without losing resolution. The architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage. We show that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems. In addition, we examine the adaptation of image classification networks to dense prediction and show that simplifying the adapted network can increase accuracy.
Article
Full-text available
The evaluation of classifiers' performances plays a critical role in construction and selection of classification model. Although many performance metrics have been proposed in machine learning community, no general guidelines are available among practitioners regarding which metric to be selected for evaluating a classifier's performance. In this paper, we attempt to provide practitioners with a strategy on selecting performance metrics for classifier evaluation. Firstly, the authors investigate seven widely used performance metrics, namely classification accuracy, F-measure, kappa statistic, root mean square error, mean absolute error, the area under the receiver operating curve, and the area under the precision-recall curve. Secondly, the authors resort to using Pearson linear correlation and Spearman rank correlation to analyses the potential relationship among these seven metrics. Experimental results show that these commonly used metrics can be divided into three groups, and all metrics within a given group are highly correlated but less correlated with metrics from different groups.
Article
Full-text available
Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.
Article
Full-text available
Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.
Article
Full-text available
Bidirectional maximum matching algorithm (BMM) combined positive maximal matching and reverse maximal matching algorithm, it was a more commonly used word segmentation method now, but it was low efficient and couldn’t solve the ambiguity. Therefore, an improved method was proposed combining with improved dictionary structure, and changing maximal matching word length dynamically to improve the efficiency of word segmentation. In order to get the correct segmentation results, we also proposed several rules. Compared with traditional segmentation methods, it proves that bidirectional maximal matching word segmentation with rules has higher speed and precision.
Conference Paper
Full-text available
The correct selection of performance metrics is one of the most key issues in evaluating classifier's performance. Although many performance metrics have been proposed and used in machine learning community, there is not any common conclusions among practitioners regarding which metric to choose for evaluating a classifier's performance. In this paper, we attempt to investigate the potential relationship among some common used performance metrics. Based on definitions, We first classify seven most widely performance metrics into three groups, namely threshold metrics, rank metrics, and probability metrics. Then, we focus on using Pearson linear correlation and Spearman rank correlation to investigate the relationship among these metrics. Experimental results show the reasonableness of classifying seven common used metrics into three groups. This can be useful for helping practitioners enhance understanding about the different relationships and groupings among the performance metrics.
Article
Full-text available
In this paper, we propose a novel neural network model called RNN Encoder--Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder--Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
Article
Full-text available
Figure optionsDownload full-size imageDownload high-quality image (78 K)Download as PowerPoint slide
Article
Full-text available
Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been carried out on clinical notes written in Chinese. The goal of this study was to systematically investigate features and machine learning algorithms for NER in Chinese clinical text. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital in China. For each note, four types of entity-clinical problems, procedures, laboratory test, and medications-were annotated according to a predefined guideline. Two-thirds of the 400 notes were used to train the NER systems and one-third for testing. We investigated the effects of different types of feature including bag-of-characters, word segmentation, part-of-speech, and section information, and different machine learning algorithms including conditional random fields (CRF), support vector machines (SVM), maximum entropy (ME), and structural SVM (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset and evaluated on the test set, and micro-averaged precision, recall, and F-measure were reported. Our evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited. The system achieved the highest performance by combining word segmentation and section information, indicating that these two types of feature complement each other. When the same types of optimized feature were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM achieved the highest performance of the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries, respectively.
Article
Full-text available
We describe a machine learning system for the recognition of names in biomedical texts. The system makes extensive use of local and syntactic features within the text, as well as external resources including the web and gazetteers. It achieves an F-score of 70% on the Coling 2004 NLPBA/BioNLP shared task of identifying five biomedical named entities in the GENIA corpus.
Conference Paper
Full-text available
This paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply and integrate four types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature. In this way, the NER problem can be resolved effectively. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 96.6% and 94.1% respectively. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover, the performance is even consistently better than those based on handcrafted rules.
Article
Full-text available
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
Article
Full-text available
Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms. The natural-language processor provides three phases of processing, all of which are driven by different knowledge sources. The first phase performs the parsing. It identifies the structure of the text through use of a grammar that defines semantic patterns and a target form. The second phase, regularization, standardizes the terms in the initial target structure via a compositional mapping of multi-word phrases. The third phase, encoding, maps the terms to a controlled vocabulary. Radiology is the test domain for the processor and the target structure is a formal model for representing clinical information in that domain. The impression sections of 230 radiology reports were encoded by the processor. Results of an automated query of the resultant database for the occurrences of four diseases were compared with the analysis of a panel of three physicians to determine recall and precision. Without training specific to the four diseases, recall and precision of the system (combined effect of the processor and query generator) were 70% and 87%. Training of the query component increased recall to 85% without changing precision.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
Full-text available
EDGAR (Extraction of Drugs, Genes and Relations) is a natural language processing system that extracts information about drugs and genes relevant to cancer from the biomedical literature. This automatically extracted information has remarkable potential to facilitate computational analysis in the molecular biology of cancer, and the technology is straightforwardly generalizable to many areas of biomedicine. This paper reports on the mechanisms for automatically generating such assertions and on a simple application, conceptual clustering of documents. The system uses a stochastic part of speech tagger, generates an underspecified syntactic parse and then uses semantic and pragmatic information to construct its assertions. The system builds on two important existing resources: the MEDLINE database of biomedical citations and abstracts and the Unified Medical Language System, which provides syntactic and semantic information about the terms found in biomedical abstracts.
Article
Full-text available
The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease. The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard. The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded. We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.
Article
Neural networks (NNs) have become the state of the art in many machine learning applications, such as image, sound (LeCun et al., 2015) and natural language processing (Young et al., 2017; Linggard et al., 2012). However, the success of NNs remains dependent on the availability of large labelled datasets, such as in the case of electronic health records (EHRs). With scarce data, NNs are unlikely to be able to extract this hidden information with practical accuracy. In this study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge (Uzuner et al., 2010), 4.3 above the architecture that won the competition. To achieve this, we bootstrap our NN models through transfer learning by pretraining word embeddings on a secondary task performed on a large pool of unannotated EHRs and using the output embeddings as a foundation of a range of NN architectures. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms using attention-based seq2seq models bootstrapped in the same manner.
Article
Clinical named entity recognition aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other natural language processing tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we propose a new model which combines data-driven deep learning approaches and knowledge-driven dictionary approaches. Specifically, we incorporate dictionaries into deep neural networks. In addition, two different architectures that extend the bi-directional long short-term memory neural network and five different feature representation schemes are also proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
Most existing methods for biomedical entity recognition task rely on explicit feature engineering where many features either are specific to a particular task or depends on output of other existing NLP tools. Neural architectures have been shown across various domains that efforts for explicit feature design can be reduced. In this work we propose an unified framework using bi-directional long short term memory network (BLSTM) for named entity recognition (NER) tasks in biomedical and clinical domains. Three important characteristics of the framework are as follows - (1) model learns contextual as well as morphological features using two different BLSTM in hierarchy, (2) model uses first order linear conditional random field (CRF) in its output layer in cascade of BLSTM to infer label or tag sequence, (3) model does not use any domain specific features or dictionary, i.e., in another words, same set of features are used in the three NER tasks, namely, disease name recognition (Disease NER), drug name recognition (Drug NER) and clinical entity recognition (Clinical NER). We compare performance of the proposed model with existing state-of-the-art models on the standard benchmark datasets of the three tasks. We show empirically that the proposed framework outperforms all existing models. Further our analysis of CRF layer and word-embedding obtained using character based embedding show their importance.
Article
Biomedical named entity recognition (BNER), which extracts important named entities such as genes and proteins, is a challenging task in automated systems that mine knowledge in biomedical texts. The previous state-of-the-art systems required large amounts of task-specific knowledge in the form of feature engineering, lexicons and data pre-processing to achieve high performance. In this paper, we introduce a novel neural network architecture that benefits from both word- and character-level representations automatically, by using a combination of bidirectional long short-term memory (LSTM) and conditional random field (CRF) eliminating the need for most feature engineering tasks. We evaluate our system on two datasets: JNLPBA corpus and the BioCreAtIvE II Gene Mention (GM) corpus. We obtained state-of-the-art performance by outperforming the previous systems. To the best of our knowledge, we are the first to investigate the combination of deep neural networks, CRF, word embeddings and character-level representation in recognising biomedical named entities.
Article
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.
Conference Paper
State-of-the-art systems of Chinese Named Entity Recognition (CNER) require large amounts of hand-crafted features and domain-specific knowledge to achieve high performance. In this paper, we apply a bidirectional LSTM-CRF neural network that utilizes both character-level and radical-level representations. We are the first to use character-based BLSTM-CRF neural architecture for CNER. By contrasting the results of different variants of LSTM blocks, we find the most suitable LSTM block for CNER. We are also the first to investigate Chinese radical-level representations in BLSTM-CRF architecture and get better performance without carefully designed features. We evaluate our system on the third SIGHAN Bakeoff MSRA data set for simplfied CNER task and achieve state-of-the-art performance 90.95% F1.
Article
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge
Article
Objective: Named entity recognition (NER) is one of the fundamental tasks in natural language processing (NLP). In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been done on clinical notes written in Chinese. The goal of this study is to develop corpora, methods, and systems for NER in Chinese clinical text. Materials and methods: To study entities in Chinese clinical text, we started with building annotated clinical corpora in Chinese. We developed an NER annotation guideline in Chinese by extending the one used in the 2010 i2b2 NLP challenge. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital (PUMCH) in China. For each note, four types of entities including clinical problems, procedures, labs, and medications were annotated according to the developed guideline. In addition, an annotation tool was developed to assist two MD students to annotate Chinese clinical documents. A comparison of entity distribution between Chinese and English clinical notes (646 English and 400 Chinese discharge summaries) was performed using the annotated corpora, to identify the important features for NER. In the NER study, two-thirds of the 400 notes were used for training the NER systems and one-third were used for testing. We investigated the effects of different types of features including bag-of-characters, word segmentation, part-of-speech, and section information, with different machine learning (ML) algorithms including Conditional Random Fields (CRF), Support Vector Machines (SVM), Maximum Entropy (ME), and Structural Support Vector Machines (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset, evaluated on the test set, and microaveraged precision, recall, and F-measure were reported. Results: Our evaluation on the independent test set showed that most types of features were beneficial to Chinese NER systems, although the improvements were limited. By combining word segmentation and section information, the system achieved the highest performance, indicating that these two types of features are complementary to each other. When the same types of optimized features were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM reached the highest performance among the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries respectively. Conclusions: In this study, we created large annotated datasets of Chinese admission notes and discharge summaries and then systematically evaluated different types of features (e.g., syntactic, semantic, and segmentation information) and four ML algorithms including CRF, SVM, SSVM, and ME for clinical NER in Chinese. To the best of our knowledge, this is one of the earliest comprehensive effort in Chinese clinical NER research and we believe it will provide valuable insights to NLP research in Chinese clinical text. Our results suggest that both word segmentation and section information improves NER in Chinese clinical text, and SSVM, a recent sequential labelling algorithm, outperformed CRF and other classification algorithms. Our best system achieved F-measures of 90.01% and 93.52% on Chinese discharge summaries and admission notes, respectively, indicating a promising start on Chinese NLP research.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image classification are structurally different. In this work, we develop a new convolutional network module that is specifically designed for dense prediction. The presented module uses dilated convolutions to systematically aggregate multi-scale contextual information without losing resolution. The architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage. We show that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems. In addition, we examine the adaptation of image classification networks to dense prediction and show that simplifying the adapted network can increase accuracy.
Article
In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. We show that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag information thanks to a CRF layer. The BI-LSTM-CRF model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, it is robust and has less dependence on word embedding as compared to previous observations.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
We describe a machine learning system for the recognition of names in biomedical texts. The system makes extensive use of local and syntactic features within the text, as well as external resources including the web and gazetteers. It achieves an F-score of 70% on the Coling 2004 NLPBA/BioNLP shared task of identifying five biomedical named entities in the GENIA corpus.
Article
This paper studies on features of the Chinese Named Entity Recognition (CNER) based on the Conditional Random Fields (CRFs). These Features which include common attributes, feature templates varying in window size and sequence label sets are very important for CNER. Taking advantages of these features or their combination can greatly improve the performance of CNER. The paper aims to provide a reference for selecting features of CNER through a series of experiments. The experiment results show that appropriate features or their combinations, such as single Chinese character, part-of-speech (POS), prefix and suffix can contribute to the score of F-measure based on CRFs. Meanwhile, the results indicate that selecting suitable feature templates and sequence label sets, not only can improve the performance of CNER, but also shorten the model-training process and reduce the system resource consumption.
Article
The ability to accurately represent sentences is central to language understanding. We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. The network uses Dynamic k-Max Pooling, a global pooling operation over linear sequences. The network handles input sentences of varying length and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations. The network does not rely on a parse tree and is easily applicable to any language. We test the DCNN in four experiments: small scale binary and multi-class sentiment prediction, six-way question classification and Twitter sentiment prediction by distant supervision. The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.
Article
Nowadays biomedical research is developing rapidly. A large number of biomedical knowledge exists in the form of unstructured text documents in various files. Named Entity Recognition (NER) from biomedical text is one of the basic task s of biomedical text mining, of which purpose is to recognize the name of the specified type from the collection of biomedical text. NER result is usually the processing object of other text mining. NER from biological text is the foundation of bioinformatics research. At present, the best f-measure of biological named entity recognition system has reached more than 80%, but is lower than general NER system which can reach about 90%. Here we use support vector machine (SVM), which is an effective and efficient tool to analyze data and recognize patterns, to recognize biomedical named entity. We get data set from GENIA corpus which is a collection of Medline abstracts. In the experiment, we get precision rate= 84.24% and recall rate=80.76% finally.
Conference Paper
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Conference Paper
Identifying proper names, like gene names, DNAs, or proteins is useful to help researchers to mining the text information. Learning to extract proper names in natural language text is a named entity recognition (NER) task. Previous studies focus on combining abundant human made rules, trigger words, to enhance the system performance. However these methods require domain experts to build up these rules and word set which relies on lots of human efforts. In this paper, we present a robust named entity recognition system based on support vector machines (SVM). By integrating with rich feature set and the proposed mask method, the system performance is satisfactory on the MUC-7 and biology named entity recognition tasks which outperforms famous machine learning-based method, such as hidden markov model (HMM), and maximum entropy model (MEM). We compare our method to previous systems that were performed on the same data set. The experiments show that when training with the MUC-7 data set, our system achieves 86.4 in F(β= 1) rate and 81.57 for the biology corpus. Besides, our named entity system is able to handle real time processing applications, the turn around time on a 63 K words document set is less than 30 seconds.
Conference Paper
This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase loglikelihood, the approach builds on that of Della Pietra et al (1997), but is altered to work with conditional rather than joint probabilities, and with a mean-field approximation and other additional modifications that improve efficiency specifically for a sequence model. In comparison with traditional approaches, automated feature...
Article
To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.