ArticlePDF Available

汉语句法网络的中心节点研究

Authors:

Abstract

以两种语体的汉语依存句法树库为基础, 根据词频及分布率统计结果, 选取3 个汉语虚 词作为研究对象. 对提取的3 个虚词节点进行了节点度数、点出度、点入度、接近性、内接近 性、外接近性、中间度等网络特征的统计, 并将这3 个节点从网络中移除, 对比分析网络前后 的节点数、平均度、平均路径长度、网络直径、孤立节点数、最大范围、密度等网络特征的变 化. 结果表明, 3 个虚词均是网络的中心节点, 但地位各有不同, 它们对网络整体结构的影响也 有较大区别. 本研究不仅为汉语虚词的研究提供了新方法, 也为复杂网络中的节点特性研究提 供了新的思路.
2011 56 10 期:735 ~ 740
www.scichina.com csb.scichina.com
英文引用格式: Chen X Y, Liu H T. Central nodes of the Chinese syntactic networks (in Chinese). Chinese Sci Bull (Chinese Ver), 2011, 56: 735–740, doi:
10.1360/972010-2369
中国科学杂志社
SCIENCE CHINA PRESS
汉语句法网络的中心节点研究
陈芯莹①②, 刘海涛*
浙江大学外国语言文化与国际交流学院, 杭州 310058;
中国传媒大学应用语言学研究所, 北京 100024
* 联系人, E-mail: lhtzju@gmail.com
2010-12-16 收稿, 2011-01-19 接受
国家社会科学基金资助项目(09BYY024)
摘要 以两种语体的汉语依存句法树库为基础, 根据词频及分布率统计结果, 选取 3个汉语虚
词作为研究对象. 对提取的 3个虚词节点进行了节点度数、点出度、点入度、接近性、内接近
性、外接近性、中间度等网络特征的统计, 并将这 3个节点从网络中移除, 对比分析网络前后
的节点数、平均度、平均路径长度、网络直径、孤立节点数、最大范围、密度等网络特征的变
. 结果表明, 3 个虚词均是网络的中心节点, 但地位各有不同, 它们对网络整体结构的影响也
有较大区别. 本研究不仅为汉语虚词的研究提供了新方法, 也为复杂网络中的节点特性研究提
供了新的思路.
关键词
复杂网络
中心节点
语言网络
虚词
语言系统是一种复杂的网络结构体[1], 因此采用
复杂网络来研究语言是很有必要的一项尝试[2]. 国内
外有关语言复杂网络研究的成果不少[2~10]. 尽管语言
网络的构造原则各有千秋, 但大部分研究都偏重于
对各种网络共性的探讨, 例如小世界和无尺度特征
. 这种偏重共性的研究, 会给人天下网络一般
[11]的感觉, 这使得语言结构的差异也被淹没在语
言网络的整体特征当中. 为了从语言网络中挖掘出
语言结构的个性, 需要进一步深入网络结构的内部,
更多地去考察分析网络中局部结构以及节点的情况.
考察网络局部结构特征的首选切入点是网络的
中心节点, 因为中心节点的存在是网络表现出小世
界和无尺度特征的重要原因. 就语言网络而言, 什么
是它的中心节点? 这些节点在语言网络中又起着什
么样的作用呢?
人类目前正在使用的语言有 6800 多种[12]. 尽管
语言的类型不同, 构建语言网络的方法也有多种选
[2~10]. 但对于语言学家而言, 语言网络只是研究语
言的手段, 而非目标[8]. 用网络的方法去发现和解释
语言结构与现象, 才是语言学家的真正目的. 所以,
要构建有语言理论依据的句法和语义网络. 由于不
同语言在句法层面表现出的区别特征相较于语义层
面更明显一些, 所以构建句法网络可能是更好的选
. 对于汉语句法网络来说, 虚词很可能就是网络的
中心节点, 是我们的研究对象. 这是因为汉语是孤立
, 实词缺乏表示语法意义的形态变化, 虚词(和语
)便成了表达语法功能的主要手段, 显得尤为重
[13].
本研究不同于以往关注语言结构共性与整体特
性的研究, 将目光投向了网络结构的内部, 关注网络
的中心节点, 试图从宏观与微观相结合的角度挖掘
更多的网络结构个性. 据我们所知, 国内外现有的语
言网络研究当中还没有对中心节点的相关研究.
语言学的角度来说, 本研究为汉语虚词的研究提供
了一种新的方法. 从复杂网络的角度看, 由于语言网
络的节点具有定义明确、特性便于描述和量化的特点,
所以汉语句法网络的中心节点研究可以从理论上较
好地解释复杂网络结构的特点及构成. 这对于复杂
网络节点特性的研究及网络结构动力的研究都有一
定意义.
2011 4 56 10
736
1 资源与方法
为了减少语体对研究结果的影响, 我们选用
话实说”(以下简称 SHSS)新闻联播”(以下简称
XWLB)两类语料作为研究资源. 我们先通过词频及
分布率的统计比较, 确定了具体的 3个虚词作为研究
对象. 然后考察这 3个虚词节点的网络特征值, 分析
其在网络结构中的中心节点地位. 然后分别将 3个节
点从网络中移除, 对比分析网络前后的特征值变化.
结果表明, 3 个虚词均是网络的中心节点, 但地位各
有不同, 它们对网络整体结构的影响也有较大区别.
我们首先关注的是虚词出现的频率. 一般认为,
词的频度统计是计量语言学的基础[14]. 但频率标准
是有局限性的. 因为频率统计的准确程度与所选取
的语言材料的容量有密切关系, 因此频率的准确程
度具有相对性[15]. 由于本研究所使用的两个语料库
均不到 2万词, 为保证研究结果的准确性, 我们仅将
位列前 50 位的虚词作为研究对象. 统计 SHSS
XWLB 中出现频率最高的 50 个词并整理其中所有虚
词的数据可以得到表 1.
统计结果显示, SHSS , 词频处于前 50 的虚词
5, 包括 3个助词和 2个介词; XWLB , 词频
处于前 50 的虚词有 9, 包括 3个助词、2个连词和
4个介词.
1 虚词词频列表 a)
XWLB
Rank1 Rank2 Freq Word
1 1 930
u
2 2 273
c
3 3 223
p
4 4 202
u
5 11 81
p
6 15 64
u
7 26 48
p
8 30 45
p
9 35 43
c
SHSS
Rank1 Rank2 Freq Word
1 1 1051
u
2 6 429
u
3 21 124
p
4 43 73
u
5 48 66
p
a) Rank1 为虚词词频序号, Rank2 为词频序号
时代的不同, 地域的不同, 语言材料容量的不同,
语言材料是书面语言还是口头语言, 这些材料都会
影响到词频, 所以我们不能只以频率标准作为选择
词汇的唯一标准. 一个词在一定篇数的语言材料的
样本中出现在多少篇数中, 也是衡量该词重要与否
的标准. 这个标准, 叫做分布率标准[15]. 根据统计结
, 频率在前 50 位的虚词在两个树库中均有分布的
3: 的、了、在 p(作介词). 而根据《现代汉语频
率词典》的显示, “分列词频的第一、
二和六位, 是词频最高的 3个虚词[16]. 因此, 我们将
研究对象定为p”(以下简称为 A, B,
C)3 个虚词节点.
为了观察和分析这 3个节点的网络特性, 我们在
SHSS XWLB 两个依存树库的基础上构建了两个
汉语依存句法网络并分别统计了 3 个节点的节点度
(all degree)点出度(out-degree)点入度(in-degree)
接近性(all closeness)、内接近性(in-closeness)、外接
近性(out-closeness)、中间度(betweenness). 络构建
方法参见文献[8], 这里不再赘述. 为了消除统计误
差并方便对比, 我们将测量结果的最大值设为 1,
数据进行标准化处理.
度数指与一个点直接相连的其他点的个数.
个点的度数就是对其领域规模大小的一种数值测
. 如果某点度数高, 则称该点居于中心. 但由于度
数的测量仅仅根据与该点直接相连的点数, 忽略间
接相连的点数, 因此, 测量出来的度可以称为局部
中心度”.
一个点的点入度指直接指向该点的点数总和;
点出度指该点所直接指向的其他点的总数.
如果一个点与其他许多点的距离都很短, 这样
的点与网络中许多其他点都接近”. 一个点的接近
性与它到其他各点的距离之和(即距离和)是反向的.
一个点与其他点的距离和大, 其接近性就小.
在一个有向网络中, 根据出入不同方向计算出
来的接近性也有所不同. 这样, 可以分别测算出一
个有向网络的内接近性外接近性”.
中间度概念测量的是一个点在多大程度上位于
网络中其他点中间”: 一个度数相对比较低的点可
能起到重要的中介作用, 因而处于网络中心. 一个
点的中间度测量的是该点对应的行动者在多大程度
上成为掮客或者中间人”, 能在多大程度上控制他人.
2为用网络分析软件 PAJEK[17]统计出的数据
737
2 节点 A, B, C 的网络参数
A
B
C
网络特征
XWLB SHSS
XWLB SHSS
XWLB SHSS
节点度数 964 830
133 234
222 131
标准化节点度数 1 1
0.13797 0.28193
0.23029 0.15783
点入度 504 405
0 0
88 61
标准化点入度 1 0.81984
0 0
0.17460 0.12348
点出度 460 425
133 234
134 70
标准化点出度 1 1
0.28913 0.55059
0.29130 0.16471
接近性 0.50188 0.55770
0.35197 0.43941
0.40977 0.44158
标准化接近性 1 1
0.70130 0.78790
0.81647 0.79178
内接近性 0.37375 0.39885
0 0
0.26484 0.28302
标准化内接进行 0.91600 0.84731
0 0
0.64907 0.60124
外接近性 0.21871 0.27586
0.15682 0.22887
0.18254 0.21486
标准化外接近性 1 1
0.71705 0.82963
0.83465 0.77887
中间度 0.32098 0.27229
0 0
0.02750 0.01365
标准化中间度 1 1
0 0
0.08569 0.05012
结果.
为了考察中心节点(包括整体中心节点与局部中
心节点)在整个网络中所起的作用, 我们分别将 A, B,
C3个节点从网络中剔除, 并统计原始的 XWLB,
SHSS 网络和 3个去节点网络的节点数(number of
vertices)、平均度(average degree)、平均路径长度
(average path length)网络直径(diameter)、孤立节点
(number of isolated vertices)、最大范围(domain)
密度(density)等几个网络特征数据, 观察节点删除前
后的变化(3).
平均度指某具体网络中, 每个节点平均具有的
节点度数. 网络所有节点度数之和与节点数之比即
是平均度.
平均路径长度指网络中任意两点间的平均最短
路径:
1,
11
2
ij ij
dd
NN
(1)
式中 N代表网络节点数, dij 指节点 i与节点 j的距离,
可以用两点间最短路径所包含的边数来表示.
网络直径指网络中最长路径值. 例如, SHSS
的依存句法网络中, 最长的路径是节点 101 与节点
708 的路径, 这两点间的距离为 9, 是网络中存在的
最长的距离.
孤立节点数指度数为 0的节点数量.
范围指某节点通过链接可达到的节点数目.
大范围指所有节点范围中的最大值.
密度概念描述了一个图中各个节点之间关联的
紧密程度:
2,
L
nn
(2)
其中 L代表网络中实际存在的边数, n表示网络中的
节点数.
2 结果与讨论
参考表 23中平均度的数据可知, A 无论是在
XWLB 还是 SHSS 网络中, 度数、点出度、接近性、
外接近性和中间度都是 1, 即所有节点的最高值.
的高度数、高入度、高出度特性, 说明它在网络中是
一个局部中心节点. 接近性为 1, 说明它与其余各点
的距离和最小. 具有最高的中间度, 这说明它的整体
中心度最高, 是整个网络的最中心节点.
BXWLB SHSS 中均具有高度数特点, 说明
它是一个局部中心节点. 中间度为 0, 所以整体中心
度低, 并不是网络的整体中心节点, 仅能作为一个局
部中心节点存在. B 节点的一个显著特点是, 无论在
XWLB 还是在 SHSS 网络中, 它的点入度、内接近性
均为 0.
2011 4 56 10
738
3 网络数据对比 a)
网络 节点数 平均度 平均路径长度
直径 孤立节点数
最大范围 密度
完整 4011 6.15 3.58 12 0 4010 0.00153
A 4010 5.67 3.93 12 42 3928 0.00141
B 4010 6.09 4.56 20 0 4009 0.00076
XWLB
C 4010 6.04 4.59 20 17 3990 0.00075
完整 2601 8.56 3.05 9 0 2600 0.00327
A 2600 7.92 3.25 10 57 2521 0.00303
B 2600 8.38 3.95 13 0 2599 0.00161
SHSS
C 2600 8.46 3.96 13 5 2590 0.00163
a) 去掉 A, C 之后网络中存在孤立节点, 所统计的平均路径长度、直径均指非孤立节点的值
C同样具有高度数、高点入度、高点出度的特点,
是一个局部中心节点. 整体说来, 它比 B更加靠近网
络的整体中心.
韦洛霞等人[7]曾提出, 按字频选择汉字是导致词
组网幂律度分布的重要原因, 且汉语词组网的组织
结构服从自然界普遍存在的最小省力原则. Liu[8]
则发现, 汉语句法网络中, 词的度分布也是符合幂律
, 汉语的句法结构同样符合最小省力原则, 但词的
度与词频并不是一致对应的, 不是词频高词的度数
就一定高. 对比表 12的数据, 可发现词频与节点
的网络中心地位也不是一一对应的, 并非词频越高
在网络结构中的中心地位就越高. 究竟什么原因造
成这一现象, 值得深入研究探讨. 但就现在的数据和
了解来推测, 这很有可能与节点的接近性和不可替
代性有关. 虽然 B只是一个局部中心节点, 但它的外
接近性大, 它与邻居节点间的距离较短. 另外, 去掉
B之后, 网络的密度降低了, 平均路径长度及直径均
增加了, 而且这些变化都比较明显. 这说明, B 在缩
短某些部分节点间的距离上有着不可替代的功用.
这也是为什么虽然 B的整体中心度低于 C, 但出于最
小省力原则, 它的频率仍比 C.
3显示, 无论是 XWLB 还是 SHSS 网络, 在去
A, 网络的平均度、最大范围、密度均降低了,
平均路径长度及孤立节点数则增加了. XWLB 网络在
A后直径没变, SHSS 网络在去 A后直径变大.
平均度降低是因为 A的度数远大于原始网络的
平均度, A后自然会降低网络的平均度.
A之后, 那些仅能依靠 A而进入句法网络的
节点就变成了孤立节点. A 具有最高的中间度, 是网
络中最重要的中间人”. 因此, A之后有相当一部
分节点被孤立了, 这个现象在口语体的 SHSS 网络中
更加明显, 有超过 2%的节点由于 A的缺失而被孤立.
除了部分节点会被完全孤立外, 还有些节点会
三两成对地游离在大多数节点组成的网络之外.
, 我们需要统计最大范围. 根据表 3, XWLB A
后的最大范围为 3928, 这就是说在去 A后的网络中,
最大的一个子网络是由连通的 3929 个节点组成的,
82 个节点游离在这个大成分之外, 无法链接到大
部分节点, 其中的 42 个节点更是处于完全孤立状态,
无法与任何其他节点链接. SHSS A后的最大范围
2521, 79 个节点游离在这个大成分之外, 无法链
接到大部分的节点, 其中的 57 个节点成了孤立节点.
A , 两个网络的平均路径长度和密度均降
低了, 但由(2)式可知, 网络节点数是影响这两个参
数的要素, 因此, 我们无法判断 A本身对数据的影响
程度.
SHSS 网络在去 A后直径变大, 说明通过 A节点
能够缩短网络中部分节点间的距离. XWLB 网络
在去 A后直径并没变, 我们认为这可能是由于语体
的不同而造成的.
无论是 XWLB 还是 SHSS 网络, 在去掉 B,
络的平均度、最大范围、密度都有所降低, 平均路径
长度及直径则均有所增加. 孤立节点数依然为 0没有
变化.
平均度降低是因为 B的度数远大于原始网络的
平均度, B后自然会降低网络的平均度.
B的中间度为 0, 并不是网络的中心节点. 在去
B之后, 没有节点因此而被孤立, 也没有小的节点
对游离在大网络之外, 整个网络中的所有节点仍然
是连通的.
739
虽然我们仍然无法判断 B本身对平均路径长度
和密度数据的影响有多大, 但可将它与 A做一对比.
通过对比可以看出, B 对平均路径长度和密度的影响
都比 A. 特别是密度, 去掉 B后的网络密度仅为原
始网络密度的一半左右. 由于密度概念描述了一个
网络中各节点之间关联的紧密程度, 因此我们认为 B
具有使网络中部分节点联系得更加紧密的能力, B
的这种能力中不可替代的部分较 A要多. A 作为局部
和整体中心节点一定也具备这种能力, A后密度
变化较小可能是因为 A的这种能力会部分地被一些
其他节点所替代. 例如, A 能够使 x, y, z节点联系更
加紧密, 但可能同时有其他节点也能使这些节点联
系更加紧密, 所以在去掉 A, 节点间紧密程度并没
有变化.
B后直径变大, 这说明通过 B节点能够缩短网
络中部分节点间的距离. A相比, B后直径的增
加要更加明显. 我们认为, 这也可能是 B缩短节点间
距离的能力中不可替代的部分较 A要多所致.
无论是 XWLB 还是 SHSS 网络, 在去掉 C,
络的平均度、最大范围、密度均降低, 平均路径长度、
直径、孤立节点数则增加了.
平均度降低是因为 C的度数远大于原始网络的
平均度, C后自然会降低网络的平均度.
C具有较高的中间度, C受语体影响比较大,
它在 XWLB 网络中更加接近整体中心. C之后有
一部分节点被孤立了, 这个现象在类书面语体的
XWLB 网络中更加明显. 因此, 我们认为 CXWLB
网络中对其他节点有更强的控制力.
XWLB C后的最大范围为 3990, 20 个节点
游离在这个子网络之外, 无法链接到大部分的节点,
其中的 17 个节点更是完全孤立无法与其他任何节点
链接. SHSS C后的最大范围为 2590, 10 个节点
游离在这个子网络之外, 无法链接到大部分的节点,
其中的 5个节点更是完全孤立无法与其他任何节点
链接.
数据也显示, C 对平均路径长度和密度的影响与
B相当, 都比 A的影响要大. B类似, 去掉 C后的
网络密度仅为原始网络密度的一半左右. 因此我们
认为, C 同样具有使网络中部分节点联系得更加紧密
的能力, C的这种能力中不可替代的部分与 B相当,
A要多.
C后直径变大, 这说明通过 C节点能够缩短网
络中部分节点间的距离. A相比, C后直径的增
加更加明显, 但与去 B后的直径是一致的. 因此我们
认为, C 缩短节点间距离的能力中不可替代的部分与
B相当, A要多.
A, B, C 三个节点虽然同为中心节点, 但其地位
也有很大差别. A 是整个句法网络的最中心节点. B
是非常明确的局部节点, 中间度为 0. C 是局部中心
节点, 整体中心度介于 AC之间. 在分别去除这三
个节点后, 数据反映出了网络的不同变化. 其中,
显著的特点是去除 B后未造成任何孤立节点以及游
离节点. 我们认为这与 B的点入度、内接近性和中间
度为 0相关. 其中, 点入度是根源. 点入度为 0意味
着这一节点没有支配其他节点的能力, 它只能依附
在其他节点之上. 假设它的某一邻居节点只能通过
它和其他节点联系起来的话, B 就必须具备入度,
这与 B的真实特性相悖, 因此, 去掉 B后不会有孤立
节点和游离节点的出现. 同理可以推断, 之所以去掉
AC之后会出现孤立节点和游离节点, 是因为它们
的入度不为 0. 至于虚词的节点入度是否为 0, 则是
由词节点本身所具备的配价能力所决定的[18].
3 结论
本研究结果说明, 在依存句法中, 词的入度(
支配能力)比出度(即被支配能力)在维持句法结构的
完整性上更重要. 这与传统句法研究中中心词是维
持句法结构完整的重要组成部分的观点相符. 网络
的研究方法为这一观点提供了宏观、可量化的数据支
. 本文也证明, 汉语的句法结构具有鲁棒性. 即使
在去掉最中心节点的情况下, 仍能保持绝大部分节
点的连通性. 而从复杂网络的角度来看, 研究表明在
网络结构的研究中, 我们应重视节点自身的特性.
然宏观的网络数据很重要, 但网络的构建动力、它们
的形成、发展和变化的背后是节点的个性. 节点根据
自身的不同特性而自发地连接, 最终形成了小世界、
无尺度的网络. 节点的个性才是决定各种网络结构
的根源.
此外, 研究中我们还发现, 3 个虚词受语体的影
响不同. 那么, 对语体敏感的虚词有哪些? 它们的网
络统计特征能否作为语体研究的一个参数? 这些问
题都值得进一步研究和探讨.
2011 4 56 10
740
致谢 感谢本文所有树库标注者们的辛勤工作. 感谢刘望提供的程序支持. 本研究得到中国传媒大学“211 工程三期
重点学科建设项目的部分资助.
参考文献
1 Hudson R. Language Networks: The New Word Grammar. Oxford: Oxford University Press, 2007
2 刘海涛. 语言复杂网络的聚类研究. 科学通报, 2010, 55: 2667–2674
3 刘海涛. 语言网络: 隐喻, 还是利器? 浙江大学学报(人文社会科学版), 2010, doi: 10.3785/j.issn.1008-942X, 2010.10.041
4 Ferrer i Cancho R, Solé R V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915
5 Yu S, Liu H, Xu C. Statistical properties of Chinese phonemic networks. Physica A, 2011, 390: 1370–1380
6 Li J, Zhou J. Chinese character structure analysis based on complex networks. Physica A, 2007, 380: 629–638
7 韦洛霞, 李勇, 康世勇, . 汉语词组网的组织结构与无标度特性. 科学通报, 2005, 50: 1575–1579
8 Liu H. The complexity of Chinese dependency syntactic networks. Physica A, 2008, 387: 3048–3058
9 刘海涛. 汉语语义网络的统计特性. 科学通报, 2009, 54: 2060–2064
10 Liu H, Hu F. What role does syntax play in a language network? Europhys Lett, 2008, 83: 18002
11 刘宏鲲, 张效莉, 曹崀, . 中国城市航空网络航线连接机制分析. 中国科学 G: 物理学 力学 天文学, 2009, 39: 935–942
12 Grimes B F. Ethnologue: Languages of the World. 14th ed. Dallas, TX: SIL International, 2000
13 黄伯荣, 廖序东. 现代汉语(增订三版). 北京: 高等教育出版社, 2002
14 刘源, 梁南元. 汉语处理的基础工程——现代汉语词频统计. 中文信息学报, 1986, 1: 17–25
15 冯志伟. 计算语言学基础. 北京: 商务印书馆, 2001. 75–76
16 北京语言学院语言教学研究所. 现代汉语频率词典. 北京: 北京语言学院出版社, 1986
17 de Nooy W, Mrvar A, Batagelj V. Exploratory Social Network Analysis with Pajek. Cambridge: Cambridge University Press, 2005
18 刘海涛, 冯志伟. 自然语言处理的概率配价模式理论. 语言科学, 2007, 3: 32–41
Central nodes of the Chinese syntactic networks
CHEN XinYing1,2 & LIU HaiTao1
1School of International Studies, Zhejiang University, Hangzhou 310058, China;
2Institute of Applied Linguistics, Communication University of China, Beijing 100024, China
Based on two syntactic dependency treebanks built with two different styles of Chinese, a statistical study is conducted regarding
word-frequency and distributions. We extracted three grammatical words as the research objects and analyzed their network features,
including all degree, out-degree, in-degree, all closeness, in-closeness, out-closeness and betweenness. Then these three nodes were
removed from the networks. We recorded and compared the network features of the two original networks and the three networks from
which one node is respectively removed, including the number of vertices, average degree, average path length, diameter, the number
of isolated vertices, domain and density. The results show that all three function words are central nodes of the Chinese syntactic
networks but have different status. Their influence to the overall structure is also quite different. The research not only provides a new
method for the study about Chinese grammatical words but also provides a new way of thinking the node characteristics in the
complex network.
complex network, central node, language network, grammatical word
doi: 10.1360/972010-2369
... The higher the value of a vertex's centrality indices, the greater the relative strength of that vertex behaving as hubs. The vertices with extremely high network centrality, i.e., the hubs, tend to be function words (e.g., articles and prepositions, etc.; Ferrer-iCancho and Solé, 2001;Solé et al., 2010;Chen and Liu, 2011;Cong and Liu, 2014). These results imply that function words are probably in important central positions in syntactic networks. ...
... It is worthy of investigation whether the inadequacy of linguistic input in the early age of DHH students constitutes one of the reasons for the observed differences. Function words play an important role in constructing complex Chinese sentence patterns (Chen and Liu, 2011). The use of function words is a significant representation of the syntactic ability for Chinese learners. ...
... Results in Section Network Centralities and Function Words indicate that the Chinese character " 的 (de, a relative marker) " ranks the first in terms of the three network centrality measures. This result reinforces the conclusion that the function word " 的(de) " plays the most vital role of network hub in Chinese syntactic dependency networks (Chen and Liu, 2011), indicating that " 的(de) " has the highest combinatorial capacity in Chinese syntax. Function words usually behave as the central hubs in syntactic networks (Ferrer-i-Cancho and Solé, 2001; Ferrer-iChung and Pennebaker, 2007;Solé et al., 2010;Chen and Liu, 2011;Baronchelli et al., 2013), and the positions of hubs are determined by the global structure of the network (Ke and Yao, 2006). ...
Article
Full-text available
Deaf or hard-of-hearing individuals usually face a greater challenge to learn to write than their normal-hearing counterparts. Due to the limitations of traditional research methods focusing on microscopic linguistic features, a holistic characterization of the writing linguistic features of these language users is lacking. This study attempts to fill this gap by adopting the methodology of linguistic complex networks. Two syntactic dependency networks are built in order to compare the macroscopic linguistic features of deaf or hard-of-hearing students and those of their normal-hearing peers. One is transformed from a treebank of writing produced by Chinese deaf or hard-of-hearing students, and the other from a treebank of writing produced by their Chinese normal-hearing counterparts. Two major findings are obtained through comparison of the statistical features of the two networks. On the one hand, both linguistic networks display small-world and scale-free network structures, but the network of the normal-hearing students' exhibits a more power-law-like degree distribution. Relevant network measures show significant differences between the two linguistic networks. On the other hand, deaf or hard-of-hearing students tend to have a lower language proficiency level in both syntactic and lexical aspects. The rigid use of function words and a lower vocabulary richness of the deaf or hard-of-hearing students may partially account for the observed differences.
... For example, it can be used for determining the function or status of some units, such as words, in the language system as a whole. Some research has been done on the structure of syntactic dependency networks (Ferrer i Cancho 2005; Liu 2008; Chen & Liu 2011; Čech et al. 2011 ), the patterns in syntactic dependency networks (Ferrer i Cancho 2004;), the language development or language evolution (Ke & Yao 2008; Mukherjee et al. 2013; Mehler et al. 2011), language clustering and linguistic categorization (Liu 2010; Liu & Cong 2013; Gong et al. 2012; Abramov & Mehler 2011 ), manual and machine translation (Amancio et al. 2008), word sense disambiguation (Christiano Silva & Raphael Amancio 2013), communication and interaction (Banisch et al. 2010; Mehler et al. 2010), the structure of semantic networks (Borge Holthoefer & Arenas 2010; Liu 2009b), phonetics (Arbesman et al. 2010; Yu et al. 2010), morphology (Čech & Mačutek 2009;), parts of speech (Ferrer i Cancho et al. 2007), Knowledge Networks (Allee 2000), cognitive networks (Mehler et al. 2012). Works on Chinese include networks that use as nodes the Chinese characters (Li & Zhou 2007; Peng et al. 2008), words and phrases (Li et al. 2005), phoneme and syllables (Yu et al. 2011; Peng et al. 2008), syntactic structure (Liu 2008; Liu 2010; Chen & Liu 2011;), semantic structure (Liu 2009b). ...
... Some research has been done on the structure of syntactic dependency networks (Ferrer i Cancho 2005; Liu 2008; Chen & Liu 2011; Čech et al. 2011 ), the patterns in syntactic dependency networks (Ferrer i Cancho 2004;), the language development or language evolution (Ke & Yao 2008; Mukherjee et al. 2013; Mehler et al. 2011), language clustering and linguistic categorization (Liu 2010; Liu & Cong 2013; Gong et al. 2012; Abramov & Mehler 2011 ), manual and machine translation (Amancio et al. 2008), word sense disambiguation (Christiano Silva & Raphael Amancio 2013), communication and interaction (Banisch et al. 2010; Mehler et al. 2010), the structure of semantic networks (Borge Holthoefer & Arenas 2010; Liu 2009b), phonetics (Arbesman et al. 2010; Yu et al. 2010), morphology (Čech & Mačutek 2009;), parts of speech (Ferrer i Cancho et al. 2007), Knowledge Networks (Allee 2000), cognitive networks (Mehler et al. 2012). Works on Chinese include networks that use as nodes the Chinese characters (Li & Zhou 2007; Peng et al. 2008), words and phrases (Li et al. 2005), phoneme and syllables (Yu et al. 2011; Peng et al. 2008), syntactic structure (Liu 2008; Liu 2010; Chen & Liu 2011;), semantic structure (Liu 2009b). In general, the language network research, including that on Chinese language network, is developing rapidly in recent years. ...
Conference Paper
Full-text available
This article presents a new approach of using dependency treebanks in theoretical syntactic research: the view of dependency treebanks as combined networks. This allows the usage of advanced tools for network analysis that quite easily provide novel insight into the syntactic structure of language. As an example of this approach, we will show how the network approach can provide clear structural distinctions among the Chinese function words, which are very difficult to obtain directly from the original treebank. We hope to illustrate the enormous potential of the language network approach through a simple example.
... All these data can be used as sources of inducing language networks, given the language network approach a solid data foundation. So far, much research has been carried out, mainly concerned with the structure of syntactic dependency networks (Ferrer i Cancho 2005; Liu 2008; Chen and Liu 2011; ˇ Cech et al. 2011), the patterns in syntactic dependency networks (Ferrer i Cancho et al. 2004;), language development or language evolution (Ke and Yao 2008; Mukherjee et al. 2013; Mehler et al. 2011 ), language clustering and linguistic categorization (Liu 2010; Liu and Cong 2013; Gong et al. 2012; Abramov and Mehler 2011), manual and machine translation (Amancio et al. 2008; Amancio et al. 2011), word sense disambiguation (Christiano and Raphael 2013), communication and interaction (Banisch et al. 2010; Mehler et al. 2010 ), the structure of semantic networks (Holthoefer and Arenas 2010; Liu 2009), phonetics (Arbesman et al. 2010; Yu et al. 2011), morphology ( ˇ Cech and Mačutek 2009; Liu and Xu 2011), parts of speech (Ferrer i Cancho et al. 2007), Knowledge Networks (Allee 2007), cognitive networks (Mehler et al. 2012). Works on Chinese language include networks at different levels: networks taking as nodes the Chinese characters (Li and Zhou 2007; Peng et al. 2008), words and phrases (Li et al. 2005), phoneme and syllables (Yu et al. 2011; Peng et al. 2008), syntactic structure (Liu 2008; Liu 2010; Chen and Liu 2011;), semantic structure (Liu 2009), etc. ...
... So far, much research has been carried out, mainly concerned with the structure of syntactic dependency networks (Ferrer i Cancho 2005; Liu 2008; Chen and Liu 2011; ˇ Cech et al. 2011), the patterns in syntactic dependency networks (Ferrer i Cancho et al. 2004;), language development or language evolution (Ke and Yao 2008; Mukherjee et al. 2013; Mehler et al. 2011 ), language clustering and linguistic categorization (Liu 2010; Liu and Cong 2013; Gong et al. 2012; Abramov and Mehler 2011), manual and machine translation (Amancio et al. 2008; Amancio et al. 2011), word sense disambiguation (Christiano and Raphael 2013), communication and interaction (Banisch et al. 2010; Mehler et al. 2010 ), the structure of semantic networks (Holthoefer and Arenas 2010; Liu 2009), phonetics (Arbesman et al. 2010; Yu et al. 2011), morphology ( ˇ Cech and Mačutek 2009; Liu and Xu 2011), parts of speech (Ferrer i Cancho et al. 2007), Knowledge Networks (Allee 2007), cognitive networks (Mehler et al. 2012). Works on Chinese language include networks at different levels: networks taking as nodes the Chinese characters (Li and Zhou 2007; Peng et al. 2008), words and phrases (Li et al. 2005), phoneme and syllables (Yu et al. 2011; Peng et al. 2008), syntactic structure (Liu 2008; Liu 2010; Chen and Liu 2011;), semantic structure (Liu 2009), etc. In general, the language network research, including those devoted to Chinese language, is developing rapidly in recent years. ...
Article
Full-text available
Based on two syntactic dependency networks derived from two Chinese treebanks of different registers, a statistical study is conducted regarding word frequency and distributions. We chose three grammatical (function) words as our research objects and analyzed their network features, including degree, out-degree, in-degree, closeness, in-closeness, out-closeness and betweenness. Then we removed these three word nodes from the networks so as to see what consequences may follow in the number of vertices, average degree, average path length, diameter , the number of isolated vertices, domain and density. The results showed that all three function words are central nodes of the Chinese syntactic networks but have different status, since their influence to the overall structure is quite different. The research provides not only a new way for the study on Chinese function words but also a method for examining the influence of node characteristics to a complex network.
... Both trees can represent the syntactic structure of linguistic units in a sentence, while POS trees are more abstract and less detailed in a way. Various previous research has been undertaken on the network analysis of syntactic dependency treebanks (Chen & Liu 2011, Čech et al. 2011, Liu 2008, Ferrer i-Cancho 2005), some also based on the same Chinese dependency treebank used for this study (Liu 2008, Chen 2013, Chen & Liu 2011 ). These approaches all used word dependency trees, thus obtaining results on the network behavior of individual words. ...
... Both trees can represent the syntactic structure of linguistic units in a sentence, while POS trees are more abstract and less detailed in a way. Various previous research has been undertaken on the network analysis of syntactic dependency treebanks (Chen & Liu 2011, Čech et al. 2011, Liu 2008, Ferrer i-Cancho 2005), some also based on the same Chinese dependency treebank used for this study (Liu 2008, Chen 2013, Chen & Liu 2011 ). These approaches all used word dependency trees, thus obtaining results on the network behavior of individual words. ...
Conference Paper
Full-text available
This article presents a new approach of using dependency treebanks in theoretical syntactic research: The view of dependency treebanks as combined networks. This allows the usage of advanced tools for network analysis that quite easily provide novel insight into the syntactic structure of language. As an example of this approach, we will show how the network approach can provide an interesting angle to discuss the degree of connectivity of Chinese syntactic categories, which it is not so easy to detect from the original treebank.
... A multiplicity of studies have shown that man is living in a world filled with complex networks (Wang, Li, Chen 2006; Costa et al. 2011). Therefore complex network studies are playing a more and more important role in such sciences as mathematics, physics, biology and engineering; moreover, they are now expanding to such fields as the humanities and social science. ...
Article
Full-text available
Language is not just a network but a complex one. Owing to the lack of appropriate research methodology, the traditional network view of language is no more than a metaphorical comprehension. However, with the introduction of complex network theories, it is now plausible to conduct large-scale empirical study into language networks, which must be constructed upon solid foundations of linguistic theories. At the same time, complex network, instead of being treated as a metaphorical understanding of language, should be fully exploited as an efficient and effective approach in linguistic study. The complex network properties not only reflect the stylistic and typological features of languages, but also provide valuable means to distinguish languages at various levels. While the social network analysis can uncover the distinctive characteristics of language networks, complex network approaches can reveal the generality between linguistic system and other systems, social or natural. As a result, the integration of complex network approaches and social network analysis can contribute significantly to the quest for the underlying laws and properties of human language.
... It is known that the hub vector plays an important role in small-world, scale-free network, but the problems is how they work? Researchers [9] [10] [11] claim the function words affect the complex organization of the syntactic sub-system, especially in language acquisition. The hub positions of a number of content words in child networks are replaced by function words in adult language networks, and different function words may act in different ways. ...
Article
Full-text available
Previous studies found that linguistic features can be used to predict the success of novels. However, which specific linguistic features better contribute to a novel’s popularity is unclear. This study addressed this issue by investigating the linguistic features of 2,008 online Chinese fantasy novels with different popularity (indicated by the Baidu Index). Specifically, word part-of-speech, personal pronouns, word complexity, and local/overall sentence semantic coherence were analyzed using a word segmentation tool (Jieba) and a latent semantic analysis software (Chinese version of Coh-Metrix). Results showed significant differences between popular and non-popular (high and low popularity) novels in the distribution of parts-of-speech, use of the second person pronoun, word complexity, and sentence semantic coherence. Moreover, the presence of the second person pronoun (“you”), local sentence semantic coherence, auxiliary words, word complexity, overall sentence semantic coherence, and adjectives better predicted the popularity of a Chinese online fantasy novel. The theoretical background and the implications of these results are detailed in the study discussion.
Article
This paper studies the implicit structures and the diffusion modes of semantic prosody on the dependency networks of some English words such as cause and their Chinese equivalents. It is found that the structure of semantic prosody is a bi-stratified network consisting of a few large clusters gathering in the center with most nodes of low dependency capability scattered around. With regard to the diffusion modes, results show that: (i) within one shortest path length, the core words directly attract the nodes with the same or similar semantic characteristics and exclude those with conflicting ones, creating the clearest and the most intense semantic diffusion; (ii) over one shortest path length, semantic diffusion is achieved through content words or function words, and the semantic diffusion modes created with function words as bridges are relatively vaguer and more complicated ones. This conclusion also results in the semantic prosodies of other English words and their Chinese equivalent words, revealing, to some extent, a common cognitive approach to understanding the internal structure and the diffusion modes of semantic prosody.
Article
Full-text available
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects.
Article
Full-text available
That almost all language networks are small-world and scale-free raises the question of whether syntax plays a role to measure the complexity of a language network. To answer this question, we built up two random language (dependency) networks based on a dependency syntactic network and investigated the complexity of these three language networks to see if the non-syntactic ones have network indicators similar to the syntactic one. The results show that all the three networks are small-world and scale-free. While syntax influences the indicators of a complex network, scale-free is only a necessary but not sufficient condition to judge whether a network is syntactic or non-syntactic. The network analysis focuses on the global organization of a language, it may not reflect the subtle syntactic differences of the sentence structure.
Article
Full-text available
Many languages are spoken on Earth. Despite their diversity, many robust language universals are known to exist. All languages share syntax, i.e., the ability of combining words for forming sentences. The origin of such traits is an issue of open debate. By using recent developments from the statistical physics of complex networks, we show that different syntactic dependency networks (from Czech, German, and Romanian) share many nontrivial statistical patterns such as the small world phenomenon, scaling in the distribution of degrees, and disassortative mixing. Such previously unreported features of syntax organization are not a trivial consequence of the structure of sentences, but an emergent trait at the global scale.
Book
This book argues that language is a network of concepts which in turn is part of the general cognitive network of the mind. It challenges the widely-held view that language is an innate mental module with its own special internal organization. It shows that language has the same internal organization as other areas of knowledge such as social relations and action schemas, and reveals the rich links between linguistic elements and contextual categories. Professor Hudson presents a new theory of how we learn and use our knowledge of language. He puts this to work in a series of extended explorations of morphology, syntax, semantics, and sociolinguistics. Every step of his argument and exposition is illustrated with examples, including the kind mainstream theory finds it hard to analyse. He introduces the latest version of his influential theory of Word Grammar and shows how it can be used to explain the operations of language and as a key to understanding the associated operations of the mind.
Article
In this paper, Chinese character networks are modelled using complex networks theory. We analyze statistical properties of the networks and find that character networks also display two important features as other real networks, i.e., small-world feature and the non-Poisson distribution. These results indicate that the discovered features of Chinese character structure reflect the combinatorial nature of Chinese characters. We also simulate the formation of Chinese phono-semantic characters using bipartite graph theory. The bipartite graph model generates non-Poisson distributions and disassortative mixing as the empirical networks, which effectively explain the origin and formation of phono-semantic characters.
Article
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world networks, and their degree distributions obey a power law. The finding, that the two syntactic networks have the same diameter and different average degrees, path lengths, clustering coefficients and power exponents, can be seen as an indicator that complexity theory can work as a means of stylistic study. The paper links the degree of a vertex with a valency of a word, the small world with the minimized average distance of a language, that reinforces the explanations of the findings from linguistics.
Book
This is an extensively revised and expanded second edition of the successful textbook on social network analysis integrating theory, applications, and network analysis using Pajek. The main structural concepts and their applications in social research are introduced with exercises. Pajek software and data sets are available so readers can learn network analysis through application and case studies. Readers will have the knowledge, skill, and tools to apply social network analysis across the social sciences, from anthropology and sociology to business administration and history. This second edition has a new chapter on random network models, for example, scale-free and small-world networks and Monte Carlo simulation; discussion of multiple relations, islands, and matrix multiplication; new structural indices such as eigenvector centrality, degree distribution, and clustering coefficients; new visualization options that include circular layout for partitions and drawing a network geographically as a 3D surface; and using Unicode labels. This new edition also includes instructions on exporting data from Pajek to R software. It offers updated descriptions and screen shots for working with Pajek (version 2.03).
  • 刘海涛 语言网络
刘海涛. 语言网络: 隐喻, 还是利器? 浙江大学学报(人文社会科学版), 2010, doi: 10.3785/j.issn.1008-942X, 2010.10.041