
Chu-Ren HuangThe Hong Kong Polytechnic University | PolyU · Department of Chinese and Bilingual Studies
Chu-Ren Huang
PhD, Cornell, 1987
About
689
Publications
204,547
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,810
Citations
Introduction
I am intrigued by what we can learn from language,
How language reflects our interaction with the living environment through cognition,
How language links, clusters, and excludes different people,
Not just by what is said, how and why, but also what is expressed or not said,
As well as what is heard, understood, or how does it affect the listener,
And, in sum, how language big data reflects collective human behaviors in time and space.
語言是人類在時空逆旅中留下的足跡
Additional affiliations
August 2008 - present
July 2008 - present
September 2003 - June 2009
Education
August 1982 - August 1986
August 1976 - June 1980
Publications
Publications (689)
Interactions among the environment, humans and language underlie many of the most pressing challenges we face today. This study investigates the use of different verbs to encode various weather events in Sinitic languages, a language family spoken over a wide range of climates and with 3000 years of continuous textual documentation. We propose to s...
This paper adopts models from epidemiology to account for the development and decline of neologisms based on internet usage. The research design focuses on the issue of whether a host-driven epidemic model is well-suited to explain human behavior regarding neologisms. We extracted the search frequency data from Google Trends that covers the ninety...
This paper proposes a textual analytics approach to the discovery of trends and variations in social development. Specifically, we have designed a linguistic index that measures the marked usage of gendered modifiers in the Chinese language; this predicts the degree of occupational gender segregation by identifying the unbalanced distribution of ma...
This paper investigates the emergence of COVID-19 neologisms. It focuses on the strategies used to coin emerging neologisms, the relationship between the strategies and the usage preferences, as well as the correlation between internet usage data and epidemiological data. The internet usage data were collected from December 2019 to June 2020 from t...
Sentiment analysis is helpful to bestow ability of understanding human’s attitude in texts on artificial intelligence systems. In this area, text sentiment is usually signaled by a few indicative words that convey affective meanings and arouse readers’ collective emotions. However, most existing sentiment analysis models have predominantly featured...
This study investigates the general public’s concerns about COVID-19 vaccination by their comments in social media (YouTube) with NLP techniques and time series analysis. A set of keywords are traced in order to better understand the changes in public opinion and responses at different stages of the pandemic, as well as the influences of fake news....
Understanding the nature of meaning and its extensions (with metaphor as one typical kind) has been one core issue in figurative language study since Aristotle’s time. This research takes a computational cognitive perspective to model metaphor based on the assumption that meaning is perceptual, embodied, and encyclopedic. We model word meaning repr...
Linguistic synesthesia links two concepts from two distinct sensory domains and creates conceptual conflicts at the level of embodied cognition. Previous studies focused on constraints on the directionality of synesthetic mapping as a way to establish the conceptual hierarchy among the five senses (i.e., vision, hearing, taste, smell, and touch). T...
Nouns in human languages mostly profile concrete and abstract entities. But how much eventive information can be found in nouns? Will such eventive information found in sensory nouns have anything to do with the cognitive representation of the basic human senses? Importantly, is there any ontological and/or cognitive motivation that can account for...
In this study, we examine the variations in the alternative pattern of light verb construction between Taiwan and Mainland Mandarin, based on a large-scale comparable corpora statistical approach. The results show that these two variants display significant differences in preference for introducing the theme of the taken complement of the light ver...
This study draws on corpus methodology to investigate people’s reactions to COVID-19 vaccination using the data of Macau netizens’ comments on a YouTube channel. Four main topics under discussion were identified based on the word lists. Meanwhile, people were concerned about the activity of vaccines and were also engaged in heated debates on both d...
The proliferation of COVID-19 fake news on social media poses a severe threat to the health information ecosystem. We show that affective computing can make significant contributions to combat this infodemic. Given that fake news is often presented with emotional appeals, we propose a new perspective on the role of emotion in the attitudes, percept...
The present paper explores the synchronic variations and diachronic changes in political discourses in Hong Kong (HK) and in Mainland of People’s Republic of China (PRC). The relationship between lengths of linguistic constructs and their immediate constituents (including sentences and clauses, and clauses and words) are fitted using the function y...
The Coronavirus Disease 2019 (COVID-19) pandemic has shifted the focus of research worldwide, and more than 10 000 new articles per month have concentrated on COVID-19–related topics. Considering this rapidly growing literature, the efficient and precise extraction of the main topics of COVID-19–relevant articles is of great importance. The manual...
This chapter explores the morphological poverty of the Chinese from an empirical perspective. Until recently, the nature of affixation in Chinese is still not well recognized and has been one of the hotly debated topics in Chinese morphology. Based on the CKIP Morphological Database (incl. 4025 “affixes” in Chinese), this chapter covers the issue o...
The concepts being discussed in this chapter are bound morpheme, free morpheme, root, affix, semi-affix, inflection, derivation and their application in the analysis of Chinese word-formation process. The inflectional affixes include aspectual markers, plural marker, potential infixes as well as those involved in reduplication. Two major approached...
Sentence-final particles are normally assumed to occur in the CP domain, i.e., the domain of the complementizer phrase. Their exact syntactic position varies given the heterogeneity of these elements. The position of these particles usually depends on how they are categorized semantically, and also on how they conform to different syntactic princip...
This chapter shows that treating the Chinese classifier system as a lexicalized semantic system based on shared ontology predicts both the agreement patterns that motivate the structure-based accounts, and the semantic selection patterns that motivate the cognition-based accounts. In the chapter, different perspectives toward classifiers are introd...
In this chapter, we present a general picture of how topicalization and topicality can be defined in morphosyntactic terms. We start with an overview of the syntax and semantics of topics from the vantage point of generative syntax and formal logic. We then show that the notion of topic prominence can be defined by typological correlations through...
This chapter aims to provide a careful examination of Mandarin Chinese classifiers from a syntactic perspective. A comprehensive overview of the distribution of classifiers is provided along with their syntactic analyses. A central conclusion of this chapter, following much recent work, is that there are two distinct structural configurations that...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
Tone in Chinese languages is distinct in two aspects: (i) the complexity in the tonal make-up and (ii) widespread sandhi. The former is often attributed to underlying complexity in tonal inventories and the latter to triggers immediately adjacent to the sandhi site. Morphosyntax, though highly relevant, is often left unarticulated in the descriptio...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
This chapter synthesizes the alternation patterns in the morphophonology of Chinese affixation, and analyzes rime change (变韵 biànyuùn) mutation-like phenomena in terms of featural affixation. The main goal is to demonstrate that the paucity of affixation in Chinese languages/dialects does not render Chinese alternations less relevant to typological...
Despite the complexity and variation of physical signals, human perception of a speech sound uttered by different talkers or in diverse contexts is amazingly constant. Nonetheless, the neurocognitive mechanisms of this fundamental human perceptual ability are not well understood. Even less is known about the neural bases of phonetic constancy. We p...
This chapter revisits the character-based approach to Chinese grammar and the ongoing debate about how to define the concept of a word in Chinese. The authors provide a variety of evidence, including distributional generalizations in corpus and Chinese word-level and phrase-level rules, such as Mandarin alphabetic words, replaceable idioms, and abb...
Current studies of semantic word formation of Chinese compounds aim to work out collocational patterns and semantic patterns. In addition to investigating the surface semantic relations among the composing morphemes of compound words, researchers adopt the methods of semantic-syntactic analyses of sentences and study the predicative relation among...
The primary function of language is to convey what we mean for communication. Semantics, a subfield of linguistics, aims to understand how meanings are encoded and operate in different levels of linguistic forms (such as morphemes, words, phrases, sentences, and discourses). The cumulative evidence thus far has mainly been based on native speaker i...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
Phonological awareness refers to a speaker’s knowledge of the phonological structure of the language. The study of phonological awareness has traditionally been associated with the study of reading. As reading Chinese involves an orthography that does not directly encode phonology, the issue of phonological awareness in Chinese speakers and learner...
The canonical word order of Modern Chinese is SVO, and yet Modern Chinese does not demonstrate the patterns of a typical VO language. This chapter reviews representative arguments that have contributed to our closer understanding of what pragmatic and semantic factors condition word order variation in Modern Chinese. Discourse analyses in relation...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
Case theory is a theoretical tool in the generative grammar to capture generalizations regarding categorial distribution, particularly the nominal category in relation to others. The notion of case can describe the close relation between grammatical categories, such as a verb/preposition and its object, or the subject of a sentence and the tense or...
Words pose a theoretical challenge in Chinese, but words pose a challenge in any language. Even though Chinese is written with monosyllabic, monomorphemic characters and no overt word boundaries, there is as much evidence here as there is in English or any other language for a level between the morpheme and the phrase, interfacing between the lexic...
The debate on modern Chinese being SVO or SOV is facing a dilemma: the word order is SVO in an unmarked declarative sentence in Chinese, while Chinese exhibits many features shared by SOV languages. To tackle this difficult situation, researchers should focus on language types, but not the relative orders of subject, verb, and object. Based on the...
The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey...
The primary goal of this chapter is to present the state of the art on Chinese intonation research, with a focus on how tone and intonation interact. To this end, the general functions and forms of intonation observed in (Mandarin) Chinese are first introduced. This is followed by a detailed discussion of the multiplexing of the f0 channel for tone...
The question of what canonical word order Chinese has is not a controversial issue at the present time; most of the studies within the last 30 years assume that the canonical word order of Modern Chinese is SVO. However, during the 1970s and 80s there was a lively debate on Chinese word order concerning its historical development as well as its sta...
In this chapter, we address the issues related to sentence grammaticality and acceptability. We begin with a discussion of the relationship between the two notions, and point out that despite the differences in theoretical conceptualization, the two notions, grammaticality and acceptability, are often confluent and that grammaticality is usually me...
After critically reviewing the conflicting theories of word-formation, the integral model in Li 2005 is presented which is shown not only to make a minimal number of postulations but also to cover a wide range of cross-linguistic facts: morphological causativization in Bantu and Semitic, compounding in Chinese and English, the resultative construct...
There is a common view that English has word stress but Chinese does not. I examine perceived stress in disyllabic lexical entries and show two similarities between the languages: (i) when both syllables carry a designated tone, such as such as bamboo or Red Cross in English, or 北京 Beijing ‘Beijing’ in Chinese, main stress is unclear to native spea...
While Chinese is widely considered a topic-prominent language and 'topic' may be a useful notion for describing some of the unique grammatical features of Chinese, natural text/speech data call for a re-examination of its nature and the ways in which it is manifested and deployed in discourse. My multiple genre-based investigation shows that at a r...
This chapter reviews the descriptive patterns of tone sandhi in Chinese dialects along with the experimental investigations of what generalizations native speakers make regarding these patterns, how they process them in production and perception, and how children acquire these patterns. Theoretical issues that tone sandhi sheds light on, including...
Background
The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is...
This study seeks to clarify the nature of linguistic synesthesia using a lexical-conceptual account. Based on a lexical analysis of Mandarin synesthetic usages, we find that (1) linguistic synesthesia maps the metaphorical meaning between two domains; and (2) linguistic synesthetic mappings and conceptual metaphoric mappings have similar behaviors...
The verbs indicating the occurrence of frost in Chinese have undergone a diachronic change. Ancient Chinese chiefly uses non-volitional verbs with downward movement meanings, while Sinitic languages widely adopt 打 dǎ ‘to hit’, an action verb with high transitivity. This modern usage develops from the transitive verb 打 dǎ ‘to hit’ denoting frost dam...
The rampant of COVID-19 infodemic has almost been simultaneous with the outbreak of the pandemic. Many concerted efforts are made to mitigate its negative effect to information credibility and data legitimacy. Existing work mainly focuses on fact-checking algorithms or multi-class labeling models that are less aware of the intrinsic characteristics...
This study investigates the comments posted in two popular channels on YouTube (February to July 2020) that reveal Macau people’s concerns and feelings under the COVID-19 pandemic in terms of the themes elaborated and sentiments expressed. By themes, Macau people showed their concerns on the epidemic situation, economy, the problems it caused, and...
This paper reports the preliminary findings of a cross-disciplinary research on emotions, insomnia and mental health with joint efforts from linguistic, computer science, and medical researchers. We take a computational linguistic approach to analyze a corpus of over 400 posts crawled from online psychological consultation platforms in China that c...
This paper examines several reform-related Chinese near synonyms in the thousand-year long history of China, comparing them in both diachronic and synchronic dimensions. Through the enquiries of word frequency and usages in diachronic Chinese corpora and historical Chinese language databases, the results reveal the interplay between social realitie...
Most previous research has investigated how embodied cognition captures concrete notions (e.g. money), but the role sensory modalities play in more abstract concepts (e.g. time) lacks empirical research—in particular, how abstractness is grounded in perceptual experiences. In this paper, a sensorimotor strength rating study (also known as modality...
This volume documents the Proceedings of the first Workshop on Corporate Social Responsibility (CSR) using NLP methods, held on 25 June 2022 as part of the LREC 2022 conference (International Conference on Language Resources and Evaluation). This workshop is a very first attempt of bridging data resources, language theories, and NLP technologies on...
The rampant of COVID-19 infodemic has almost been simultaneous with the outbreak of the pandemic. Many concerted efforts are made to mitigate its negative effect to information credibility and data legitimacy. Existing work mainly focus on fact-checking algorithms or multi-class labeling models that are less aware of the intrinsic characteristics o...
Mandarin Alphabetical Words (MAW), such as X-光 ‘X-ray’, is a unique category of the contemporary Chinese lexicon, which is one major topic in the code-mixing research. However, in both lines of literature, an intriguingly interesting yet less explored linguistic issue is the classifier selection behaviors between the two, in particular the morpho-s...
Sentiment analysis is an important task in corpus linguistics and natural language processing. Based on statistical and machine-learning algorithms, texts’ subjective evaluations and emotional states can be detected, extracted, and classified. Sentiment analysis results are significant to the development of many different industries in the financia...
On the basis of Mey’s Pragmatic Act Theory, this paper investigates the cross-cultural and cross-language variations in the pragmemes to call for social distancing in public health campaigns to combat COVID-19. We compare the officially released posters calling for social distancing in English and Chinese in two neighboring cities with distinctive...
Sensorimotor information is vital to the conceptual representation of our knowledge system. This study collects perceptual and action ratings for 664 disyllabic nouns among 438 native speakers and creates the first and largest dataset of sensorimotor norms for nouns in Chinese. Using aggregated semantic covariates, including concreteness ratings fr...
《华文教学与研究》第1期 TCSOL Studies (1). 汉语名词性复合结构中的同音删略受到使用频率、变调和韵律等三种条件的制约,而三者的强弱差异直接影响可选性同音删略的发生倾向。本文基于亿级词汇量的两岸共同语语料库,对新闻标题中的同音删略现象进行了实证分析。研究结果表明,同音删略的制约条件在大陆新闻标题中的强弱等级是“变调>韵律>使用频率”,在台湾新闻标题中则是“变调>韵律≈使用频率”,排序都与常规语言不同,但韵律的作用力强弱在两岸常规语言和新闻标题中没有明显差异。这表明新闻标题更加遵循韵律规则的语体特点并未显著影响同音删略。本研究还发现大陆共同语在新闻标题中倾向于保留完整形式,而台湾共同语倾向于使用删略形式。该现象应由两岸汉语对新闻标题准确性和简约性的偏重不同导致。 Haplolo...
Leech’s corpus-based comparison of English modal verbs from 1961 to 1992 showed the steep decline of all modal verbs together, which he ascribed to continuing changes towards a more equal and less authority-driven society. This study inspired many diachronic and synchronic studies, mostly on English modal verbs and largely assuming the correlation...
This work addresses some questions about language processing: what does it mean that natural language sentences are semantically complex? What semantic features can determine different degrees of difficulty for human comprehenders? Our goal is to introduce a framework for argument semantic complexity, in which the processing difficulty depends on t...
English research articles (RAs) are an essential genre in academia, so the attempts to employ NLP to assist the development of academic writing ability have received considerable attention in the last two decades. However, there has been no study employing feature engineering techniques to investigate the linguistic features of RAs of different aca...
Yan Fu’s 譯事三難 has rarely been directly challenged but is frequently compared with Tytler’s Principles of Translation. These two sets of principles match both in number and in the exact order of three parallel concepts. Given the canonical status of Tytler’s principles since its publication in 1790, it is hard to imagine that Yan was not influenced...
In this article we present the Database of Word-Level Statistics for Mandarin Chinese (DoWLS-MAN). The database addresses the lack of agreement in phonological syllable segmentation specific to Mandarin by offering phonological features for each lexical item according to 16 schematic representations of the syllable (8 with tone and 8 without tone)....
Thunder and frost are said in Sinitic languages to be controlled by higher powers, or to simply occur by themselves, or even to cast severe damage on human society as agents. Such diverse linguistic behaviours and meanings pose challenges and add complexity to the ongoing debate on the unaccusativity of weather verbs. We present in this paper an in...
This chapter offers an overview of both the linguistic background and the state of the art of NLP research on varieties of Chinese as similar languages. In addition to briefly summarizing grammatical features of Mandarin Chinese, we also underline important contrasts between Chinese dialects and varieties of Mandarin Chinese. As Chinese dialects ar...
This article investigates the evolution of social distancing terms in Chinese and English in two geographically close yet culturally distinct metropolitan cities: Hong Kong and Guangzhou. This study of bilingual public health campaign posters during the COVID-19 pandemic focuses on how the evolution of neologisms and linguistic strategies in public...
Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticised for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic...
We present Scikit-talk, an open-source toolkit for processing collections of real-world conversational speech in Python. First of its kind, the toolkit equips those interested in studying or modeling conversations with an easy-to-use interface to build and explore large collections of transcriptions and annotations of talk-in-interaction. Designed...
Durative events by default are atelic. However, temporal targets are typically required for durative verbs with a rushing manner, such as ‘We are catching the 3:30 flight’ and ‘The farmer rushed to harvest before the storm’. Why and how does manner introduce delimiting temporal concepts to durative verbs? This puzzle is addressed by our current stu...
Informal short texts on the web are rich in emotions as they often reflect unfiltered immediate reactions to breaking news events. The emotion density, however, stands in contrast to its poverty of linguistic contexts and features for emotion classification. This paper tackles that challenge by proposing orthographic features based on orthographic...
Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data...
The verbs indicating the occurrence of frost in Chinese has undergone a diachronic change. Ancient Chinese chiefly uses non-volitional verbs with downward movement meanings, while Sinitic languages widely adopt 打 dǎ ‘to hit’, an action verb with high transitivity. This modern usage develops from the transitive verb 打 dǎ ‘to hit’ denoting frost dama...