Lung-Hao Lee

Lung-Hao Lee
Verified
Lung-Hao verified their affiliation via an institutional email.
Verified
Lung-Hao verified their affiliation via an institutional email.
  • Ph.D.
  • Professor (Associate) at National Yang Ming Chiao Tung University

About

82
Publications
29,379
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,315
Citations
Current institution
National Yang Ming Chiao Tung University
Current position
  • Professor (Associate)
Additional affiliations
February 2024 - present
National Yang Ming Chiao Tung University
Position
  • Professor (Associate)
August 2018 - July 2022
National Central University
Position
  • Professor (Assistant)
April 2018 - July 2018
National Taiwan University
Position
  • PostDoc Position
Education
September 2009 - March 2015
National Taiwan University
Field of study
  • Computer Science and Information Engineering
September 2003 - July 2005
Yuan Ze University
Field of study
  • Information Management
September 1998 - July 2003
National Taipei University
Field of study
  • Statistics

Publications

Publications (82)
Conference Paper
Full-text available
An increasing amount of research has recently focused on representing affective states as continuous numerical values on multiple dimensions , such as the valence-arousal (VA) space. Compared to the categorical approach that represents affective states as several classes (e.g., positive and negative), the dimensional approach can provide more fine-...
Article
Named Entity Recognition (NER) is a natural language processing task for recognizing named entities in a given sentence. Chinese NER is difficult due to the lack of delimited spaces and conventional features for determining named entity boundaries and categories. This study proposes the ME-MGNN (Multiple Embeddings enhanced Multi-Graph Neural Netwo...
Article
Full-text available
An increasing amount of research has recently focused on dimensional sentiment analysis that represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space. Compared to the categorical approach that represents affective states as distinct classes (e.g., positive and negative), the dimensional a...
Article
Entity linking is the task of assigning a unique identity to named entities mentioned in a text, a sort of word sense disambiguation that focuses on automatically determining a pre-defined sense for a target entity to be disambiguated. This study proposes the DGE (Dual Gloss Encoders) model for Chinese entity linking in the biomedical domain. We se...
Article
Full-text available
Background Support vector machines (SVMs) based on brain-wise functional connectivity (FC) have been widely adopted for single-subject prediction of patients with schizophrenia, but most of them had small sample size. This study aimed to evaluate the performance of SVMs based on a large single-site dataset and investigate the effects of demographic...
Article
The main challenge of aspect-level sentiment classification (ASC) is associating target aspect terms with relevant contextual words. Existing methods improve ASC performance by incorporating syntactic dependencies through a graph convolution layer on top of BERT. However, these approaches often assign a fixed weight to edges with the same dependenc...
Article
Full-text available
Several motor imagery classification methods have been developed and achieve higher accuracy. Machine learning (ML) based algorithms utilizing manually designed features often encounter robustness issues, leading to diminished accuracy. While deep learning (DL) based algorithms exhibit promising accuracy, their extensive computational requirements...
Article
Full-text available
BERT (Bidirectional Encoder Representations from Transformers) uses an encoder architecture with an attention mechanism to construct a transformer-based neural network. In this study, we develop a Chinese word-level BERT to learn contextual language representations and propose a transformer fusion framework for Chinese sentiment intensity predictio...
Article
Full-text available
Many deep-learning-based seizure detection algorithms have achieved good classification, which usually outperformed traditional machine-learning-based algorithms. However, the hand-engineered features increase the computational complexity and potentially have an ineffectiveness problem for the category. Therefore, this paper proposes a novel end-to...
Article
This study explores URL click-through behaviour to predict the category of users’ online information accesses and applies the results to progressively filter objectionable accesses during web surfing. Each clicked URL is represented by the embedding technique and fed into the Bidirectional Long Short-Term Memory neural network cascaded with a Condi...
Article
Full-text available
This study proposes a novel bidirectional-switched-capacitor-based interleaved converter. In view of the shortcomings of the two well-known unidirectional-switched-capacitor-based interleaved converters, this study improves such converters through combining the novel structure of a switched capacitor circuit. The first effort was to overcome the dr...
Article
Full-text available
In recent years, many studies have proposed epilepsy detection algorithms, but most of them require high computing resources and a large amount of memory, which are difficult to implement in wearable devices. This paper proposes an epilepsy detection algorithm that uses a small number of features to reduce the memory requirements of the algorithm....
Article
Full-text available
Phrase-level sentiment intensity prediction is difficult due to the inclusion of linguistic modifiers (e.g., negators, degree adverbs, and modals) potentially resulting in an intensity shift or polarity reversal for the modified words. This study develops a graph-based Chinese parser based on the deep biaffine attention model to obtain dependency s...
Conference Paper
In this paper, we describe the process of building a benchmark data set for Chinese multi-label grammatical error detection tasks, comparing the performance of 10 representative neural network models. Experimental results reveal that no matter which deep learning model is used, the performance is still limited which confirms the difficulty of the m...
Conference Paper
Full-text available
This paper describes the ROCLING-2022 shared task for Chinese healthcare named entity recognition, including task description, data preparation, performance metrics, and evaluation results. Among ten registered teams, seven participating teams submitted a total of 20 runs. This shared task reveals present NLP techniques for dealing with Chinese nam...
Conference Paper
Full-text available
This paper describes a proposed system design for Style Change Detection (SCD) tasks for PAN at CLEF 2022. We propose a unified architecture of ensemble neural networks to solve three SCD-2022 edition tasks. We fine-tune the BERT, RoBERTa and ALBERT transformers and their connecting classifiers to measure the similarity of two given paragraphs or s...
Conference Paper
Full-text available
This study describes the model design of the NCUEE-NLP system for the Chinese track of the SemEval-2022 MultiCoNER task. We use the BERT embedding for character representation and train the BiLSTM-CRF model to recognize complex named entities. A total of 21 teams participated in this track, with each team allowed a maximum of six submissions. Our b...
Article
Full-text available
A huge and growing number of scientific papers are authored by non-native English speakers, driving increased demand for effective computer-based writing tools to help writers composing scientific articles. The Automated Evaluation of Scientific Writing (AESW) shared task promotes the use of natural language processing tools to improve the quality...
Article
Full-text available
Steady-state visual evoked potential (SSVEP) has been used to implement brain-computer interface (BCI) due to its advantages of high information transfer rate (ITR) and high accu-racy. In recent years, owing to the developments of head-mounted device (HMD), the HMD has become a popular device to imple-ment SSVEP–based BCI. However, an HMD with fixe...
Conference Paper
We explore transformer-based neural networks for Chinese grammatical error detection. The TOCFL learner corpus is used to measure the model capability of indicating whether a sentence contains errors or not. Experimental results show that ELECTRA transformers which take into account both transformer architecture and adversarial learning technique c...
Conference Paper
Full-text available
This study describes the model design of the NCUEE-NLP system for the MEDIQA challenge at the BioNLP 2021 workshop. We use the PEGASUS transformers and fine-tune the downstream summarization task using our collected and processed datasets. A total of 22 teams participated in the consumer health question summarization task of MEDIQA 2021. Each parti...
Conference Paper
Full-text available
This study describes our proposed model design for SMM4H 2021 shared tasks. We fine-tune the language model of RoBERTa transformers and their connecting classifier to complete the classification tasks of tweets for adverse pregnancy outcomes (Task 4) and potential COVID-19 cases (Task 5). The evaluation metric is F1-score of the positive class for...
Conference Paper
Full-text available
This study describes our proposed model design for the SMM4H 2020 Task 1. We fine-tune ELECTRA transformers using our trained SVM filter for data augmentation, along with decision trees to detect medication mentions in tweets. Our best F1-score of 0.7578 exceeded the mean score 0.6646 of all 15 submitting teams.
Conference Paper
Full-text available
In this paper, we proposed a Multi-Channel Convolutional Neural Network with Bidirectional Long Short-Term Memory (MC-CNN-BiLSTM) model for Chinese grammatical error detection. The TOCFL learner corpus is adopted to measure the system capability of indicating whether a sentence contains errors or not. Our model performs better than a previous CNN-L...
Article
Remote sensing of life detection or a non-contact monitor of vital signals is an important application for Ultra-wideband (UWB) radar, such as health monitoring of a vehicle driver. Using the UWB radar to detect physiological signals of a dynamic human, three kind movement features (body motion, breathing, and heartbeat) must be considered generall...
Conference Paper
Full-text available
In this paper, we describe the construction details of a confused character set for Chinese spell checking. The SIGHAN 2013-2015 bakeoff datasets are adopted to measure the performance of correct character suggestions. Our confusion set significantly outperforms the existing confusion set in candidate selection for automatic spelling checkers.
Article
In this paper, we describe the construction details of a confused character set for Chinese spell checking. The SIGHAN 2013-2015 bakeoff datasets are adopted to measure the performance of correct character suggestions. Our confusion set significantly outperforms the existing confusion set in candidate selection for automatic spelling checkers.
Conference Paper
Full-text available
This study describes the model design of the NCUEE system for the MEDIQA challenge at the ACL-BioNLP 2019 workshop. We use the BERT (Bidirectional Encoder Representations from Transformers) as the word embedding method to integrate the BiLSTM (Bidirectional Long Short-Term Memory) network with an attention mechanism for medical text inferences. A t...
Chapter
Chinese as a foreign language (CFL) learners may, in their language production, generate inappropriate linguistic usages, including character-level confusions (or commonly known as spelling errors) and word-/sentence-/discourse-level grammatical errors. Chinese spelling errors frequently arise from confusions among multiple-character words that are...
Conference Paper
Full-text available
This study describes the construction of a TOCFL learner corpus and its usage for Chinese grammatical error diagnosis. We collected essays from the Test Of Chinese as a Foreign Language (TOCFL) and annotated grammatical errors using hierarchical tagging sets. Two kinds of error classifications were used simultaneously to tag grammatical errors. The...
Chapter
A service-oriented architecture called as HANS is proposed to facilitate Chinese natural language processing. This unified framework seamlessly integrates fundamental NLP tasks including word segmentation, part-of-speech tagging, named entity recognition, chunking, paring, and semantic role labeling to enhance Chinese language processing functional...
Conference Paper
Full-text available
In this paper, we proposed a Convolution Neural Network with Long Short-Term Memory (CNN-LSTM) model for Chinese grammatical error detection. The TOCFL learner corpus is adopted to measure the system performance of indicating whether a sentence contains errors or not. Our model performs better than other neural network based methods in terms of acc...
Conference Paper
Full-text available
This paper presents the IJCNLP 2017 shared task on Dimensional Sentiment Analysis for Chinese Phrases (DSAP) which seeks to identify a real-value sentiment score of Chinese single words and multi-word phrases in the both valence and arousal dimensions. Valence represents the degree of pleasant and unpleasant (or positive and negative) feelings, and...
Conference Paper
Full-text available
This study describes the design of the NTNU system for the ScienceIE task at the SemEval 2017 workshop. We use self-defined feature templates and multiple conditional random fields with extracted features to identify keyphrases along with categorized labels and their relations from scientific publications. A total of 16 teams participated in evalua...
Conference Paper
Full-text available
This paper presents the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 15 teams re...
Article
Text clustering is a powerful information retrieval technique to detect topics from document corpora, so as to provide information browsing, analysis, and organization. On the other hand, the Instant Response System (IRS) has been widely used in recent years to enhance student engagement in class and thus improve their learning effectiveness. Howev...
Conference Paper
This study describes the construction of the TOCFL (Test Of Chinese as a Foreign Language) learner corpus, including the collection and grammatical error annotation of 2,837 essays written by Chinese language learners originating from a total of 46 different mother-tongue languages. We propose hierarchical tagging sets to manually annotate grammati...
Conference Paper
This paper presents the IALP 2016 shared task on Dimensional Sentiment Analysis for Chinese Words (DSAW) which seeks to identify a real-value sentiment score of Chinese words in the both valence and arousal dimensions. Valence represents the degree of pleasant and unpleasant (or positive and negative) feelings, and arousal represents the degree of...
Conference Paper
Full-text available
This study describes the design of the NTNU-YZU system for the automated evaluation of scientific writing shared task. We employ a convolutional neural network with the Word2Vec/GloVe embedding representation to predict whether a sentence needs language editing. For the Boolean prediction track, our best F-score of 0.6108 ranked second among the te...
Article
Near-synonyms are fundamental and useful knowledge resources for computer-assisted language learning (CALL) applications. For example, in online language learning systems, learners may have a need to express a similar meaning using different words. However, it is usually difficult to choose suitable near-synonyms to fit a given context because the...
Conference Paper
Full-text available
In this paper, we describe the development of a retrieval system that is designed for analyzing the interlanguage. We adopt the annotated TOCFL learner corpus as the target to explore the language acquisition for leaners of learning Chinese as a foreign language. An illustrative scenario is presented to demonstrate the functionalities of implemente...
Article
This special issue contains four articles based on and expanded from systems presented at the SIGHAN-7 Chinese Spelling Check Bakeoff. We provide an overview of the approaches and designs for Chinese spelling checkers presented in these articles. We conclude this introductory article with a summary of possible future directions.
Conference Paper
Full-text available
This paper introduces the NLP-TEA 2015 shared task for Chinese grammatical error diagnosis. We describe the task, data preparation, performance metrics, and evaluation results. The hope is that such an evaluation campaign may produce more advanced Chinese grammatical error diagnosis techniques. All data sets with gold standards and evaluation tools...
Conference Paper
Full-text available
This paper introduces the SIGHAN 2015 Bake-off for Chinese Spelling Check, including task description, data preparation, performance metrics, and evaluation results. The competition reveals current state-of-the-art NLP techniques in dealing with Chinese spelling checking. All data sets with gold standards and evaluation tool used in this bake-off a...
Article
Full-text available
This introduction paper describes the research trends of Chinese as a second/foreign language along with related studies. We also overview the research papers included in this special issue. Finally, we conclude the findings and offer the suggestions.
Article
This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Co...
Conference Paper
Full-text available
In this paper, we describe the development of the tagging editor for learner corpora annotation and computer-aided error analysis. We collect essays written by learners of Chinese as a foreign language for grammatical error annotation and correction. Our tagging editor is effective and enables the annotated corpus to be used in a shared task in ICC...
Conference Paper
Full-text available
We organize a shared task on grammatical error diagnosis for learning Chinese as a Foreign Language (CFL) in the ICCE-2014 workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). In this paper, we describe all aspects of this shared task, including task description, data preparation, evaluation metrics, and testing...
Article
In this paper, we describe the development of the tagging editor for learner corpora annotation and computer-aided error analysis. We collect essays written by learners of Chinese as a foreign language for grammatical error annotation and correction. Our tagging editor is effective and enables the annotated corpus to be used in a shared task in ICC...
Conference Paper
This study explores the existing blacklists to discover suspected URLs that refer to on-the-fly phishing threats in real time. We propose a PhishTrack framework that includes redirection tracking and form tracking components to update the phishing blacklists. It actively finds phishing URLs as early as possible. Experimental results show that our p...
Conference Paper
Full-text available
This paper introduces a Chinese Spelling Check campaign organized for the SIGHAN 2014 bake-off, including task description, data preparation, performance metrics, and evaluation results based on essays written by Chinese as a foreign language learners. The hope is that such evaluations can produce more advanced Chinese spelling check techniques.
Conference Paper
Full-text available
This study develops a sentence judgment system using both rule-based and n-gram statistical methods to detect grammatical errors in Chinese sentences. The rule-based method provides 142 rules developed by linguistic experts to identify potential rule violations in input sentences. The n-gram statistical method relies on the n-gram scores of both co...
Conference Paper
Full-text available
This study presents the Chinese Open Relation Extraction (CORE) system that is able to extract entity-relation triples from Chinese free texts based on a series of NLP techniques, i.e., word segmentation, POS tagging, syntactic parsing, and extraction rules. We employ the proposed CORE techniques to extract more than 13 million entity-relations fo...
Conference Paper
This study explores the users' web browsing behaviors that confront phishing situations for context-aware phishing detection. We extract discriminative features of each clicked URL, i.e., domain name, bag-of-words, generic Top-Level Domains, IP address, and port number, to develop a linear chain CRF model for users' behavioral prediction. Large-sca...
Conference Paper
Full-text available
In this paper, we handcraft a set of linguistic rules with syntactic information to detect errors occurred in Chinese sentences written by SLL. Experimental results come the similar conclusions with well-known ALEK system used by ETS for English Learning. Our developed Chinese sentence error detection system will be helpful for Chinese self-learner...
Conference Paper
This paper explores users’ browsing intents to predict the category of a user’s next access during web surfing, and applies the results to objectionable content filtering. A user’s access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., ho...
Conference Paper
Full-text available
This paper introduces an overview of Chinese Spelling Check task at SIGHAN Bake-off 2013. We describe all aspects of the task for Chinese spelling check, consisting of task description, data preparation, performance metrics, and evaluation results. This bake-off contains two subtasks, i.e., error detection and error correction. We evaluate the syst...
Conference Paper
Full-text available
This paper presents a context-aware phishing threat detection model from users’ behavioral perspectives. The context of users’ information accesses is investigated to explore the users’ browsing behaviors that confront phishing situations. Large-scale experiments show that our approach achieves an accuracy of 0.9973 and an F1 score of 0.9311 for pr...
Conference Paper
Full-text available
This paper presents the overview of traditional Chinese parsing task at SIGHAN Bake-offs 2012. On behalf of task organizers, we de-scribe all aspects of the task for traditional Chinese parsing, i.e., task description, data preparation, performance metrics, and evalua-tion results. We summarize the performance results of all participant teams in th...
Conference Paper
This paper studies the feasibility of an early warning system that prevents users from the dangerous situations they may fall into during web surfing. Our approach adopts behavioral Hidden Markov Models to explore collective intelligence embedded in users' browsing behaviors for context-aware category prediction, and applies the results to web secu...
Conference Paper
Full-text available
This paper proposes a method to construct an evaluation dataset from microblogs for the development of recommendation systems. We extract the relationships among three main entities in a recommendation event, i.e., who recommends what to whom. User-to-user friend relationships and user-to-resource interesting relationships in social media and resou...
Article
This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constru...
Conference Paper
This paper presents an intent conformity model to collaboratively generate blacklists for cyberporn filtering. A novel porn detection framework via searches-and-clicks is proposed to explore collective intelligence embedded in query logs. Firstly, the clicked pages are represented in terms of the weighted queries to reflect the degrees related to p...
Conference Paper
This paper presents a user intent method to generate blacklists for collaborative cyberporn filtering. A novel porn detection framework that finds new pornographic web pages by mining user search behaviors is proposed. It employs users' clicks in search query logs to select the suspected web pages without extra human efforts to label data for train...
Conference Paper
Full-text available
We bootstrapped Chinese WordNet with semantic domain labels of WordNet Domains for constructing a language resource called Chinese WordNet Domains. The bootstrapping methods work from three aspects: 1) Princeton WordNet alignment, 2) lexical semantic relations and 3) domain taxonomy mapping. Experimental results of our proposed bootstrapping based...
Conference Paper
Full-text available
Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for the construction of natural language processing lexicons. LMF facilitates data exchange among computational linguistic resources, and also promises a convenient uniformity for future application. This study describes the design and imple...
Conference Paper
Full-text available
This study proposes an approach to extract domain-specific words, and to distinguish the word senses with the aim of extending current WordNet architecture for domain applications. The domain-specific lexicon is compiled with a Wordnet-LMF format in compliance with 180 1643 for the internationally collaborative KYOTO project. The findings and resul...
Conference Paper
Full-text available
This paper proposes a method to automatically classify texts from different varieties of the same language. We show that similarity measure is a robust tool for studying comparable corpora of language variations. We take LDC's Chinese Gigaword Corpus composed of three varieties of Chinese from Mainland China, Singapore, and Taiwan, as the comparabl...
Article
This study presented an inverse chi-square based web content classification system that works along with an incremental update mechanism for incremental generation of pornographic blacklist. The proposed system, as indicated from the experimental results, can classify bilingual (English and Chinese) web pages at an average precision rate of 97.11%;...
Conference Paper
This study proposed early decision heuristics for objectionable content classification using an inverse chi-square classifier. The experimental results indicated that only examining the title plus 10% of a Web pagepsilas content can cost-effectively achieve an average precision of 93%. More importantly, the F<sub>1</sub> measure achieved its best w...
Conference Paper
Full-text available
We propose a set of heuristics for improving annotation quality of very large corpora efficiently. The Xinhua News portion of the Chinese Gigaword Corpus was tagged independently with both the Peking University ICL tagset and the Academia Sinica CKIP tagset. The corpus-based POS tags mapping will serve as the basis of the possible contrast in gramm...

Network

Cited By