Conference Proceeding

Relevant document retrieval using a spoken document

Grad. Sch. of Eng., Tohoku Univ., Sendai, Japan
10/2009; DOI:10.1109/ISCIT.2009.5341051 In proceeding of: Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on
Source: IEEE Xplore

ABSTRACT In this paper, we proposed a method of retrieving documents from the world wide Web using a spoken document as a ldquokey.rdquo This method can be viewed as a speech version of an ordinary relevant document retrieval, where a text document is used as a query of retrieval. Basically the retrieval is based on an automatic transcription of a spoken document using a speech recognizer. The difficult point of this task is that the automatic transcription contains many recognition errors, therefore we cannot trust keywords extracted from the automatic transcription using conventional method such as tfmiddotidf. To solve this problem, we developed three methods. The first one is to measure relevance of a keyword to the spoken document by using Web documents retrieved using a Web search engine by specifying the keyword as a query. The second one is to compose a query from the selected keywords so that words derive from misrecognitions are excluded and similar words are gathered. The third one is to measure relevance of a downloaded Web document to the spoken document. The experimental results suggest that the proposed methods are promising for retrieving relevant documents of a spoken document.

0 0
 · 
0 Bookmarks
 · 
36 Views

Keywords

automatic transcription
 
conventional method
 
downloaded Web document
 
experimental results
 
keyword
 
ldquokey.rdquo
 
misrecognitions
 
ordinary relevant document retrieval
 
proposed methods
 
recognition errors
 
retrieval
 
retrieving documents
 
retrieving relevant documents
 
selected keywords
 
speech recognizer
 
speech version
 
spoken document
 
text document
 
Web documents
 
Web search engine