Content uploaded by Kazuhide Yamamoto
Author content
All content in this area was uploaded by Kazuhide Yamamoto
Content may be subject to copyright.
, ,
E-mail:{ikeda,ohashi,ykaz}@nlp.nagaokaut.ac.jp
52% 1
2.50 95%
Processing sentence end reduction of a newsflash
Satoshi Ikeda , Kazuteru Ohashi , Kazuhide Yamamoto
Department of Electrical Engineering,Nagaoka University of Technology
E-mail:{ikeda,ohashi,ykaz}@nlp.nagaokaut.ac.jp
The electrical bulletin board news consists of high density expressions. The end of the
sentence is unique shape that is nouns or case particles. This paper focuses on expressions
of the sentence end, and attempt to summarize them by forming them into nouns or case
particles. We summarize the news sentence by pattern matching approach. Our evaluation
illustrates that our summarizer reduces 2.50 characters on average; the summarization ratio
of sentence ends is 52%. We also show that the correctness of reduction is 95%.
1
1
1)
[4]
[6]
[1,3,5] [1]
[3], [5]
[1,3,5]
1
2
(1) NIKKEI-goo
1 3
2)
2
2
1 60
56 1999 12
4
1
1:
3365
21127
40374
2
(2)
2:
[%]
23.7 55.92
( ) (5.00) (39.90)
28.66 15.91
1.80 0.19
0.20 0.22
1.56 8.83
( ) (0.34) (6.41)
38.59 18.52
5.42 0.40
2
3 3
3
3:
(a) (b) a/b
1.059 3.126 0.335
0.622 2.147 0.290
0.342 2.522 0.136
0.188 2.633 0.072
1.356 3.605 0.376
0.446 0.191 2.432
5.914 50.882 0.116
7 2.712 1.011 0.373
3
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
3. 9.
3.1
:
:
3)
3.2
4)
3.3
2
5)
*
(1)
Step 1
Step 2
6)
Step 3
7)
Step 4 +
8)
Step 5
1 2
1
9)
10)
Step2
Step 2
(1)
‘*’
11)
12)
Step 3
13)
14)
Step 4.1
15)
16)
Step 4.2
Step 5
3.4
17)
*
3
18)
19)
3.5
20)
*
Step 1
Step 2
21)
22)
Step 3
23)
24)
3.6
3
25)
*
3
Step 1
Step 2
26)
Step 3
Step 4
27)
28)
29)
30)
3.7
31)
4
32)
3.8
33)
*
Step 1
34)
Step 2
35)
36)
37)
3.9
(2)
(2)
[2]
3
38)
3.10
(3)
(4)
39)
39
4
2000
1
232,038 73,512
4.1
40)
40
(5 ) (2 )
40 =
2
5
= 0.40
5
3.7
4
52%
4:
3.1 3.2 3.3 3.4
0.60 0.33 0.49 0.45
16825 1313 37995 7510
3.5 3.6 3.8 3.9
0.62 0.66 0.41 0.36 0.52
199 7194 197 848 72727
4.2
73,512 1,000
3
3
2 5
5:
3.1 3.2 3.3 3.4 3.5
231 19 492 107 9
205 18 481 106 8
0.89 0.95 0.98 0.99 0.89
3.6 3.7 3.8 3.9
116 21 3 13 1000
113 17 3 12 952
0.97 0.81 1 0.92 0.95
3
6
90%
6:
1 2 3
0.98 0.95 0.91
4.3
73,512 100
7
7
100
8 8 1 2.5
5
5.1
7:
72727 100
0.52 0.51
8:
73512 100
183606 290
1 2.50 2.90
41)
*
41
42)
*
42
43)
*
43
44)
*
44
6
45
23
45)
*
46)
*
46
1
2
1
47
47)
48)
*
48
49)
*
49
5.2
3.3
51 52
50)
51)
52)
53
55
53)
54)
55)
[2]
50 53
53
50 53
53 50
2
5.3
4.3
56)
56
57)
57
7
58)
58 3.9
59)
60)
59
60
61)
61
5.4
158,526 200
9
62)
63)
62
63
6
52% 1 2.5
95%
5.1
(B)
16700134 (A)
16200009
(1) ,NIKKEI-goo,
http://nikkeimail.goo.ne.jp/
(2) 2000 ,
.
(3) ,Ver.2.3.3,
,
http://chasen.naist.jp/hiki/ChaSen/
(4) 2000 , .
[1] , , , :
, ,NL-133-7,pp.45-
52,1999.
[2] , :
, 10
,pp.693-696,2004.
[3] , , : Web
,
,NL-153-1,pp.1-8,2003.
[4] , , , : Web
, ,NL-159-27,pp.193-
200,2004.
[5] , , :
,
,Vol.6,No.6,pp.65-
81,1999.
[6] , , :
,
,NL-122-13,pp.83-89,1997.
8E