Kuang-hua Chen

Kuang-hua Chen
National Taiwan University | NTU · Department of Library & Information Science

Ph.D.

About

83
Publications
12,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
684
Citations
Additional affiliations
August 1996 - present
National Taiwan University
Position
  • Professor (Full)

Publications

Publications (83)
Chapter
Full-text available
This paper describes research activities for exploring techniques of cross-language information retrieval (CLIR) during the NACSIS Test Collection for Information Retrieval/NII Testbeds and Community for Information access Research (NTCIR)-1 to NTCIR-6 evaluation cycles, which mainly focused on Chinese, Japanese, and Korean (CJK) languages. First,...
Article
Uncovering research topics, manifesting the relationships, and revealing the structure in a discipline are major and important research issues in library and information science (LIS). To understand the evolution of research subfields in LIS during two periods, 2009 to 2013 and 2014 to 2018, this study proposes and applies a novel method, word bibl...
Conference Paper
This study investigates how to measure subject relationship based on bibliographic coupling strength. Since the 1960s, researchers use citation analysis methods to discover the relationship between different works and authors. However, how to apply citation‐based methods in measuring the relationships between various subjects remains unknown. We pr...
Article
How to differentiate citations is continuously an important research question of citation analysis. Recently, some researchers analyze full‐text academic articles with tools and techniques of natural language processing to find out the characteristics of in‐text citations. In this study, we analyze 4,255 articles published during 2007 to 2016 to ex...
Chapter
Full-text available
This chapter focuses on the development of digital humanities (DH) in Taiwan. A bibliographic methodology was adopted where the scholarly publications in DH were collected and their bibliographic information retrieved and analyzed. Both co-authorship and article similarity networks were generated so social network analysis can be used to characteri...
Conference Paper
Full-text available
How to differentiate citations is continuously an important research question of citation analysis. Recently, some researchers analyze full-text academic articles with tools and techniques of natural language processing to find out the characteristics of in-text citations. In this study, we analyze 4,255 articles published during 2007 to 2016 to ex...
Article
Full-text available
As digital humanities continues to expand and become more inclusive, little is known about the extent to which its knowledge is integrated. A bibliometric analysis of published literature in digital humanities was conducted to examine the degree of its intellectual cohesion over time (1989–2014). Co-authorship, article co-citation, and bibliographi...
Article
Full-text available
In this study, we used the citation data from four databases (THCI, ACI, WOS and Scopus) and one social media, Mendeley, to examine the composite traces of humanities and social sciences scholars’ research outputs. Using the researchers of the Institute for Advanced Studies in Humanities and Social Sciences in National Taiwan University as subjects...
Article
Institutional repository (IR) has been widely deployed in a lot of universities and research institutions. Researchers all over the world share their research outputs in an open-access way. Not only does it speed up the dissemination of scholarly information, but it can also decrease the cost in acquiring commercial databases. This article investig...
Article
This study analyzed reference and source papers of Proceedings of 2009-2012 International Conference of Digital Archives and Digital Humanities (DADH) which was held annually in Taiwan. Totally 1,104 references and 59 sources were investigated based on descriptive analysis and subject analysis of library practices on cataloguing. Preliminary result...
Conference Paper
This paper presents an iterative approach to extracting Chinese terms. Unlike the traditional approach to extracting Chinese terms, which requires the assistance of a dictionary, the proposed approach exploits the Support Vector Machine classifier which learns the extraction rules from the occurrences of a single popular term in the corpus. Additio...
Conference Paper
Translation is an activity to transform linguistic information to another language. Translation product is a written text in a target-language (TL), which represents the result of a translation process, has been described by a comparison with the respective source-language (SL) text. The relation between the SL text and the TL text is a kind of the...
Article
Research output and impact metrics derived from commercial citation databases such as Web of Science and Scopus have become commonly used indicators of predominantly English language scholarly performance. Yet it has been pointed out that existing metrics are largely inadequate to reflect scholars' overall peer-mediated performance, especially in t...
Article
Full-text available
Accompanying with the growing Internet, DL/M has become an important researches issue. Metadata as a concrete foundation for Digital Libraries and Museums (DL/M) researches and systems, its role has being recognized by different research fields. Due to the essence of Internet, the paper not only describes the research and development of metadata fo...
Article
Full-text available
This study investigates Literature Researches in Taiwan in a viewpoint of outside researcher. The methodology of Informetrics will be used to analyze citations of Chinese Literature and ForeignLiterature’s research papers which had been published in Taiwan from 1996 to 2006. The citation types, citation disciplines, citation languages and citing ha...
Article
Purpose – The purpose of this paper is to present the practical and unique approach to construct an institutional repository (IR) at the National Taiwan University (NTU). Design/methodology/approach – In general, IR systems are used to preserve the research outputs of academic organizations. The preserved contents as a whole will demonstrate the ac...
Article
Full-text available
This paper presents an information retrieval system for the NTCIR-7 information retrieval for question answering task. This system is com- posed by three parts: (1) Query processing (2) Retrieval model (3) Re-rank module. Query processing filters stop-words and selects query terms to generate a required term set for further retrieval. Threes retrie...
Article
This study investigated the influences of structured query on retrieval performance from textual, linguistic, and fielded characteristics. The search runs of NCTIR-6 CLIR Task have been used as the targeted data for this study. The results concluded that only subject out of other textual characteristics demonstrated significant effects on retrieval...
Article
Full-text available
Purpose – There is an active effort by major libraries in Taiwan to offer integrated searching as part of their information services. The purpose of this paper is to report a low-cost and high-flexibility system, <?LIPS-DOI>, which can carry out integrated searching with respect to resource management. Design/methodology/approach – The paper first...
Article
Recently there are many organizations conducting projects on ranking world universities from different perspectives. These ranking activities have made impacts and caused controversy. This study does not favor using bibliometric indicators to evaluate universities’ performances, but not against the idea either. We regard these ranking activities as...
Article
Full-text available
Internet has become one of the important channels for retrieval of academic resources in the recent years. The roles and service models of academic libraries has been changed accordingly. Most academic libraries have been providing subject directory (subject gateway) for users to browse the highly selected academic resources. On the contrary, few a...
Article
With the development of Internet, digital libraries/museums have received worldwide attention and many developed countries are doing extensive researches on digital libraries/museums. In Taiwan, many institutions have digitized their collections. In addition, major research projects such as Digital Museum Project, Digital Archive Program, Digital L...
Article
Full-text available
This paper gives an overview of the NTCIR-5 Cross-Lingual Question Answering Task (CLQA1), an evaluation campaign for Cross-Lingual Question An-swering technology. This evaluation was carried out in June 2005. In CLQA1, we aimed to promote re-search on cross-lingual Question Answering technol-ogy mainly for East Asian languages. As the first attemp...
Article
Full-text available
The Taiwan Humanities Citation Index (THCI) is Taiwan's effort to construct a search, research, and evaluation tool for research in the arts and humanities. This article describes the design, framework, features, and policies and rules of the THCI. Citation analysis has been regarded as a systematic way to investigate research developments and tren...
Conference Paper
Full-text available
This paper reports experimental results of cross-language information retrieval (CLIR) from German to French. The authors focus on CLIR in cases where available language resources are very limited. Thus transitive translation of queries using English as a pivot language was used to search French document collections for German queries without any d...
Article
Full-text available
This study investigates the indexed papers dated from 1996 to 2002, included in the Taiwan Humanities Citation Index (THCI). The goal is to explore disciplinary interflow of Library & Information Science (LIS) studies in Taiwan. The results show that the researchers of LIS mostly cooperate with researchers and scholars in the fields of social scien...
Article
Full-text available
We report the outline of Text Summarization Challenge 2 (TSC2 hereafter), a sequel text summarization evaluation conducted as one of the tasks at the NTCIR Workshop 3. First, we describe briefly the previous evaluation, Text Summarization Challenge (TSC1) ...
Article
Full-text available
This paper reviews research efforts in the NTCIR-4 CLIR task, which is a project involving large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has four sub-tasks, multi-lingual IR (MLIR), bilingual IR (BLIR), pivot bilingual IR (PLIR) and single language IR (SLIR), i...
Article
Full-text available
This report is an overview of Cross-Language Information Retrieval Task (CLIR) at the third NTCIR Workshop. There are 3 tracks in CLIR: Single Language IR (SLIR), Bilingual CLIR (BLIR), and Multilingual CLIR (MLIR). The scope, schedule, test collections, search results, relevance judgment, scoring results, and the preliminary analyses are described...
Article
Full-text available
IreposiLional Phrase is tho key issue in strucLuraJ a. lnbiguity. [l.ecently, researches in corpora provide Lhc lexical cue of prepositions with other words and the in[brmation could be used to pro'ely resolve ambiguil, y resulted fi'om prcposiHonal phrases. Two possible ,t- l. achmcnts are considorcd in the litera- l, ufo: eil.}wr noun aLl,ae[ttue...
Article
Full-text available
To align bilingual texts becomes a crucial issue recently. Rather than usiug length-based or translation-based criterion, a part-of-speech-based criterion is proposed. We postnlate that source lexis and target texts sbonld share the same concepts, ideas, entities, and evenIs. Simulated anneallug approach is used to implement this alignment algorifi...
Article
Full-text available
Metadata plays a crucial role in a digital library/museum environment. However, the development of metadata is not an easy task. Its formulation starts with analyzing the attributes of collections as well as understanding the user information needs and information seeking behavior. The issue of interoperability also needs to be considered in terms...
Article
Full-text available
To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure rule-based noun phrases grammar to tackle this problem. In this paper, we apply a probabilistic chunker to deciding the implicit boundaries of constituents...
Article
Full-text available
This paper proposes a corpus-based language model for topic identification. We analyze the association of noun-noun and noun-verb pairs in LOB Corpus. The word association norms are based on three factors: 1) word importance, 2) pair co-occurrence, and 3) distance. They are trained on the paragraph and sentence levels for noun-noun and noun- verb p...
Article
This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for cons...
Article
Full-text available
Library and information science (LIS) education of Japan has been established in the early 20th century, but destroyed during the World War II. Thanks to the help provided by the United States, the LIS education revived. However, it influenced a lot, especially the thoughts of public librarianship in the Library Law of Japan. At present, 8 universi...
Article
Full-text available
This article describes a task for construction of a subject classification framework under the background of analyses of the researches of Library and Information Science (LIS). The proposed framework covers the possible research issues including those influenced by the information technology and network techniques. We first investigate the existin...
Article
Full-text available
This article reports the results of Chinese Text Retrieval (CHTR) tasks in NTCIR Workshop 2 and the future plan of NTCIR workshop. CHTR tasks fall into two categories: Chinese-Chinese IR (CHIR) and English-Chinese IR (ECIR). The definitions, schedules, test collection (CIRB010), search results, evaluation, and initial analyses of search results of...
Article
Full-text available
p>頁次:51-77 本研究主要為建立「臺灣人文學引用文獻資料庫 」 ( THCI ) ,資料庫收錄之期刊共計 314 種,本期計畫先以民國 87 年出版之期刊,開始進行引用文獻的建檔工作,至於被引文獻方面,則主要取自於來源文獻的註釋或參考書目,為了採用一致的方式建置資料庫,我們對於不同類型的文獻資料建立個別之政策規範。 THCI 的建置系統分為前端輸入系統及後踹資料庫系統兩部份。後端資料庫為資料實際儲存之處,依據資料的特性將所有的資料劃分成五個資料表,並依照關聯式資料庫管理系統的原理建立資料表之闊的關聯。前端介面系統則為工作人員建量資料時所使用之輸入系統,透過人文學研究中心的內部網路與後端資料庫進行連結。 THCI 亦可提供外界透過 www 瀏覽器查詢,其基本之檢索功能包括 : 篇名關鍵字...
Article
Full-text available
The amount of electronic documents in the Internet grows very quickly. How to effectively identify subjects for documents becomes an important issue. In past, the researches focus on the behavior of nouns in documents. Although subjects are composed of nouns, the constituents that determine which nouns are subjects are not only nouns. Based on the...
Article
Full-text available
The IR society has made efforts in free-term indexing for a long time. By contrast, few efforts are made in controlled-vocabulary indexing. A new model for controlled-vocabulary indexing is proposed in this paper. This proposed model, TF×OSDF×CSIDF, distinguishes subject-specific words from common words and domain-specific words in documents. 60,40...
Article
Full-text available
Digital Libraries and Museums (DL/M) have become one of the important research issues of Library and Information Science as well as other related fields. This paper describes the basic concepts of DL/M and briefly introduces the development of Taiwan Digital Museum Project. Based on the features of various collections, wediscuss how to maintain, to...
Article
Automatic summarization and information extraction are two important Internet services. MUC and SUMMAC play their appropriate roles in the next generation Internet. This paper focuses on the automatic summarization and proposes two different models to extract sentences for summary generation under two tasks initiated by SUMMAC-1. For categorization...
Article
Full-text available
Accompanying fast development of the Internet, the concept of digital libraries is widely accepted and discussed by researchers. The experiences of physical libraries say that it is necessary to apply some means like bibliographic control to fulfill high-quality Internet services. Metadata is a key approach based on this line. From the computer sci...
Article
Full-text available
Similarity analysis is a substantial issue in both corpus-based researches and language usages. This paper focuses on the semantic usages of adjectives, and analyzes the similarities among adjectives. The adjective and the semantic tag of the head noun that it modifies in a noun phrase form a co-occurrence. A two-stage algorithm is applied to clust...
Article
Full-text available
Due to the increasingly growing Internet, many countries take a serious look at the new information carrier. Few years ago, National Science Foundation (NSF) of USA initiated research projects on Digital Libraries (DL). Other countries around the world also initiated many DL research projects recently. There are some DL projects undergoing now in T...
Article
Full-text available
The development of Internet makes the researches on information retrieval more changeable. Actually, the so-called "information retrieval" is "text retrieval." It is necessary for users to find out the needed information from the retrieved texts. A higher-level task is information extraction, which extracts the information based on pre-defined temp...
Article
Full-text available
An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, a...
Article
Full-text available
It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based machine translation systems have many human costs in formulating rules and introduce inconsistencies when the number of rules increases. Integration of thes...
Article
Full-text available
A pure statistics-based machine translation system is usually incapable of processing long sentences and is usually domain dependent. A pure rule-based machine translation system involves many costs in formulating rules. In addition, it is easy to introduce inconsistencies in a rule-based system, when the number of rules increases. Integrating both...
Article
Full-text available
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk version of Brown Corpus, underlying bi-gram language...
Article
Rather than using a length-based or translation-based criterion to align bilingual texts, this paper proposes a part-of-speech-based (POS-based) criterion. The postulation is that bilingual texts should share the same concepts, ideas, entities, and events. In addition, these are usually represented by some critical POSes. Thus, the numbers of criti...
Article
Full-text available
A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. Three factors are considered: 1) repetition of words, 2) importance of words, and 3) collocational semantics. A window is moved...
Conference Paper
Full-text available
To align bilingual texts becomes a crucial issue recently. Rather than using length-based or translation-based criterion, a part-of-speech-based criterion is proposed. We postulate that source texts and target texts should share the same concepts, ideas, entities, and events. Simulated annealing approach is used to implement this alignment algorith...
Article
Full-text available
This paper is a report of Chinese Text Retrieval (CHTR) tasks in NTCIR Workshop 2. CHTR tasks fall into two categories: Chinese-Chinese IR (CHIR) and English-Chinese IR (ECIR). The definitions, schedules, test collection (CIRB010), search results, evaluation, and initial analyses of search results of CHIR and ECIR are discussed in this paper.
Article
Full-text available
The purpose of this paper is to overview research efforts at the NTCIR-6 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has three sub-tasks, multi-lingual IR (MLIR), bilingual IR (BLIR), and single language IR (SLIR), in which ma...
Article
Full-text available
Prepositional Phrase is the key issue in structrual ambiguity. Recently, researches in corpora provide the lexical cue of association among prepositions and other words and these information could be used to resolve partly ambiguity resulted from prepositional phrases. Two possible attachments are considered in the existing approaches: either noun...
Article
Full-text available
This paper first discusses the coverage of knowledge discovery. Secondly, the related techniques of knowledge discovery for unstructured data and structured data are described, respectively. The last but not the least, this paper identifies a few possible applications of knowledge discovery for government information. The variant relationships and...
Article
Full-text available
The meaning of Institutional Repository (IR) is to preserve the research outputs of research institutes. The preserved contents as a whole will demonstrate the achievements and influences of research institutes. Many investigations pointed out that an open-access IR system can decrease the cost in dissemination of scholarly information and increase...
Article
Full-text available
This article investigates the consistency of subject cataloging for Taiwan academic journal articles of Library and Information Science. We utilize a subject framework to analyze 956 articles of Bulletin of The Library Association of China from No. 1 to No. 65. After preliminary analyses, we find that the average consistency of main categories is 8...
Article
Full-text available
資訊檢索研究的目的在解決人類對於資訊的需求,發展至今不斷地消除一道道的資 訊藩籬。隨著電腦網路的普及,網際網路快速地深入世界的每一個角落,普羅大眾 對於"世界村"觀念感同身受的同時,語言的藩籬變得具體而殘酷,使用者很難檢 索"不同文"的文獻資料。本文說明資訊檢索的語言藩籬,討論目前語言技術用於 處理語言藩籬的可能方案,並且比較現有超越語言藩籬的資訊檢索系統與傳統資訊 檢索系統之間的系統績效。 (Abstract) The purpose of researches for information retrieval is to fulfill the information need. The IR has eradicated many information barriers since...
Article
Full-text available
Web archiving is an emerging concept whose main purpose is to preserve websites with cultural or historical significance. This paper discusses the development and implementation of the National Taiwan University Web Archiving System (NTUWAS), which was developed by the National Taiwan University Library. In order to help readers unfamiliar with web...
Article
Full-text available
A controlled vocabulary is a consistent set of words A controlled vocabulary is the terms or classification groups that have been created in order to make indexing consistent A natural language uncontrolled vocabulary used the words directly from the text written by the authors or the words from the indexer's mind.
Article
Full-text available
Subcategorization frames are useful for many applications. Due to many ambiguities, to extract them is not straightforward. In this paper, a probabilistic chunker is used to determine the plausible phrase boundaries and a finite state mechanism, SUBCAT-TRACTOR, is proposed to extract 23 subcategorization frames. In order to get rid of the problems...
Article
Full-text available
This paper describes an overview of the NTCIR-6 Cross-Lingual Question Answering (CLQA) Task, an evaluation campaign for Cross-Lingual Question Answering technology. In NTCIR-5, the first CLQA task targeting Chinese, English, and Japanese languages was carried out. Following the success of NTCIR-5 CLQA, NTCIR-6 hosted the second campaign on the CLQ...
Article
Full-text available
This paper presents an overview of the IR4QA (In-formation Retrieval for Question Answering) Task of the NTCIR-7 ACLIA (Advanced Cross-lingual Infor-mation Access) Task Cluster. IR4QA evaluates tra-ditional ranked retrieval of documents using well-studied metrics such as Average Precision, but the re-trieval task is embedded in the context of cross...

Network

Cited By