-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we describe a Japanese speech corpus collected for investigating the speech variability of a specific speaker over short and long time periods and then report the variability of speech recognition performance over short and long time periods. Although speakers use a speaker-dependent speech recognition system, it is known that speech recognition performance varies pending when the utterance was uttered. This is because speech quality varies by occasion even if the speaker and utterance remain constant. However, the relationships between intra-speaker speech variability and speech recognition performance are not clear. Hence, we have been collecting speech data to investigate these relationships since November 2002. In this paper, we introduce our speech corpus and report speech recognition experiments using our corpus. Experimental results show that the variability of recognition performance over different days is larger than variability of recognition performance within a day
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006 · 4.63 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Automatic meta-data annotation of images region is essentially important for cross-media information retrieval between texts and images. In this paper, we propose an automatic meta-data annotation of images region. We apply and discuss Gaussian mixture models for this problem. The annotation meta-data of each region is prepared from top 5 of the log likelihood. This method can annotate a number of language meta-data because it annotates the language meta-data in each region. The experimental results show that the accuracy of automatic annotation meta-data to top 5 achieved about 70%.
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on; 12/2005
-
[show abstract]
[hide abstract]
ABSTRACT: The final purpose of the authors is to achieve indicative
summarization for retrieved text. The paper examines the effectiveness
of a method of automatically deriving Japanese compound keywords based
on rules of dependency relationship and restrictions on them. Then, an
importance measure of derived keywords is discussed. The derivation
method can derive more significant Japanese compound keywords that do
not appear in the original text
Systems, Man, and Cybernetics, 2001 IEEE International Conference on; 02/2001
-
[show abstract]
[hide abstract]
ABSTRACT: The vector space model (VSM) is a conventional information
retrieval model, which represents a document collection by a
term-by-document matrix. Since term-by-document matrices are usually
high-dimensional and sparse, they are susceptible to noise and are also
difficult to capture the underlying semantic structure. Additionally,
the storage and processing of such matrices places great demands on
computing resources. Dimensionality reduction is a way to overcome these
problems. Principal component analysis (PCA) and singular value
decomposition (SVD) are popular techniques for dimensionality reduction
based on matrix decomposition, however they contain both positive and
negative values in the decomposed matrices. In the work described here,
we use non-negative matrix factorization (NMF) for dimensionality
reduction of the vector space model. Since matrices decomposed by NMF
only contain non-negative values, the original data are represented by
only additive, not subtractive, combinations of the basis vectors. This
characteristic of parts-based representation is appealing because it
reflects the intuitive notion of combining parts to form a whole. Also
NMF computation is based on the simple iterative algorithm, it is
therefore advantageous for applications involving large matrices. Using
the MEDLINE collection, we experimentally showed that NMF offers great
improvement over the vector space model
Systems, Man, and Cybernetics, 2001 IEEE International Conference on; 02/2001
-
[show abstract]
[hide abstract]
ABSTRACT: In the current natural language interface system, it is impossible to understand erroneous sentences. In order to realize the superior one, the automatic error recovery for erroneous sentences is one of the problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches, however it takes many time to parse the sentence. This paper shows the method to improve the time efficiency keeping the accuracy of the traditional method
Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on; 11/1998
-
[show abstract]
[hide abstract]
ABSTRACT: This paper describes an efficient multi-attribute pattern matching machine to locate all occurrences of any of a finite number of sequences of rule structures in a series of input structures. The proposed machine has the following distinctive features: it can match set representations containing multiple attributes; it also enables us to match separate components; and it can match a rule consisting of an exclusive set. In this paper, these features are described in detail. Moreover, the pattern matching algorithm is evaluated by the theoretical evaluation and the experimental evaluation that are supported by the simulation results for a variety of rules for document processing, such as text proofreading, text reduction, and examining a relation between sentences
Intelligent Processing Systems, 1997. ICIPS '97. 1997 IEEE International Conference on; 11/1997
-
[show abstract]
[hide abstract]
ABSTRACT: In several key strategies, the Patricia trie has the shallowest
trie by eliminating all nodes which have only one arc, and these nodes
are called single descendant nodes. For this reason, this trie can
retrieve the key faster than any other trie strategies. This trie,
however, must store information concerning the eliminated nodes, and
thus if this trie structure is implemented, the required storage is
large. This paper shows the retrieval algorithm using the compact
Patricia trie, which is represented by the bit stream
Intelligent Processing Systems, 1997. ICIPS '97. 1997 IEEE International Conference on; 11/1997
-
[show abstract]
[hide abstract]
ABSTRACT: In many applications, information retrieval is a very important
research field. In several key strategies, the trie is famous as a fast
access method to be able to retrieve keys in order. Especially, the
Patricia trie gives the shallowest trie by eliminating all single
descendant nodes, for this reason, the Patricia trie is often used as
indices of information retrieval systems. If trie structures are
implemented, however, the greater the number of registered keys, the
larger storage is required. Jonge et al. (1987) proposed a method to
change the normal binary trie into a compact bit stream. This paper
shows the method for compressing the Patricia trie into the new bit
stream. The theoretical and experimental results show that this method
generates 40~60 percent shorter than the traditional method. This method
thus enables us to provide more compact storage and faster access than
the traditional method
Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on; 11/1997
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a strategy for building a morphological
machine dictionary of English efficiently to infer meanings of
derivatives from a simple word (semantic stem) by considering
morphological affixes and their semantic classifications. The basic
concept is to group the derivatives into one frame and to restrict the
derivatives, accessible to a knowledge base, to the semantic stem. This
approach enables us to simplify the structures of a morphological
dictionary and the representation of the knowledge base. An efficient
retrieval algorithm for these representation is presented by a modified
hash technique
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes an automatic selection method for key search
algorithms. The methodology has been implemented in a system called
KESE2. Key search algorithms are selected according to user's
requirements through conversation controlled by inferences performed
upon an evaluation table. The evaluation table has values representing
fitness between search algorithms and their characteristics, or
properties, to the applications. The selection algorithm presented
determines candidates of key search algorithms by reducing unsuitable
methods step by step. The questions to be asked to the user are driven
by inferences over the restricted set. The paper also proposes an
assisting facility that consists of both a supporting function and a
program synthesis function. Experimental results show that by using the
selection algorithm, the number of questions to be asked in order to
select the appropriate key search algorithm was less than half the
number of questions without inferences
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents a new method of LR parsing based on the
distinction of stack states and non-stack states. Non-stack states are
states which do not need to be pushed into the LR parsing stack and
stack states are states to be pushed into it. By using some of the
properties based on the stack-controlling LR parser defined, the parsing
speed and the size of parsing tables can be improved, and the
improvement includes the traditional method eliminating unit
productions. By empirical observations for variety of programming
languages, the efficiency is verified. An extension of the method to the
generalized LR parsers for natural language is also discussed
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: A trie structure is frequently used for various applications, such
as natural language dictionaries, database systems and compilers.
However, the total number of states (and transitions between them) of a
trie becomes large so that space cost may not be acceptable for a huge
key set. In order to resolve this disadvantage, this paper presents a
new scheme, called “trio-trie”, that enables us to perform
efficient retrievals, insertions and deletions for the key sets. The
essential idea is to construct two tries for both front and rear
compressions of keys which is similar to a DAWG (Directed Acyclic
Word-Graph). The approach differs from a DAWG in that the two-trie
approach presented can determine uniquely information corresponding to
keys while a DAWG cannot. For an efficient implementation of the
two-trie, two types of data structures are introduced. The theoretical
and experimental observations show that the method presented is more
practical than existing ones considering the use of dynamic key sets,
storing information of keys and compression of transitions
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: In many applications, information retrieval is a very important
research field. In several key strategies, the binary trie is famous as
a fast access method to be able to retrieve keys in order. However, if
the binary trie is implemented, the greater the number of the registered
keys, the larger storage in secondary memory is required. In order to
solve this problem, Jonge et al. (1987) proposed the method to change
the binary trie into a compact bit stream (called the pre-order bit
stream). However, searching and updating a key takes a lot of time in
large key sets. This paper proposes an efficient binary digital search
algorithm by introducing a new hierarchical structure. The algorithms
for retrieval, insertion and deletion of keys using this new method are
introduced through examples. The theoretical and experimental results,
using 50,000 Japanese nouns and 50,000 English words, show that this
method provides faster access than the traditional method. Retrieval is
18~20 times, the insertion is 11~13 times and the deletion is 4~6 times
faster
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: A row displacement method compresses efficiently a sparse matrix
into a one-dimensional array. The access time with this method is O(1),
but the application was restricted to the static matrices. In order to
extend the use of the row displacement method to the dynamic matrices,
the algorithms for insertion and deletion are proposed and the
efficiency is confirmed by theoretical and empirical observations
Systems, Man, and Cybernetics, 1996., IEEE International Conference on; 11/1996
-
[show abstract]
[hide abstract]
ABSTRACT: A trie structure is frequently used for various applications, such
as natural language dictionaries, database systems and compilers.
However, the total number of states of a trie (and transitions between
them) becomes large, so that the space cost may not be acceptable for a
huge key set. In order to resolve this disadvantage, this paper presents
a new scheme, called a “two-trie”, that enables us to
perform efficient retrievals, insertions and deletions for the key sets.
The essential idea is to construct two tries for both front and rear
compressions of keys, which is similar to a DAWG (directed acyclic
word-graph). The approach differs from a DAWG in that the two-trie
approach presented can uniquely determine information corresponding to
keys while a DAWG cannot. For an efficient implementation of the
two-trie, two types of data structures are introduced. Theoretical and
experimental observations show that the method presented is more
practical than existing ones considering the use of dynamic key sets,
information storage of keys and compression of transitions
IEEE Transactions on Knowledge and Data Engineering 07/1996; · 1.66 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We collect the speech data for investigating an intra-speakers' speech variability over a short and long time. In general, to reduce the load of speakers, the speech data are collected as one file from collecting start to collecting end. Hence, there are some noises, non-speech sections and mistaken sections in this file. Consequently, we must segment this file into individual utterances and select the useful utterances. This process requires a lot of time and efforts. In this paper, we propose an automatic utterance segmentation tool for dividing the collected speech data. The proposed tool is composed of four processes, which are a voice activity detection, speech recognition, a DP matching, and a correct of speech section. For evaluating the proposed tool, we conduct the evaluation experiments using a female speaker's speech data in our corpus. Experimental results show that the proposed method can reduce a filing time by 90% compared to a manual filing. In This paper, first, we introduced the large speech corpus. This speech corpus contains is the speech data collected by specific speaker over long and short time periods. And, we explained the automatic utterance segmentation tool which we made in the case of corpus build. And inspected the validity. As a result, it was demonstrated that the automatic utterance segmentation tool was high-performance. Furthermore, it was demonstrated that speech corpus build became simple by using the automatic utterance segmentation tool.
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on;
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we describe a Japanese speech corpus collected for investigating the speech variability of a specific speaker over short and long time periods. Although speakers use a speaker-dependent speech recognition system, it is known that speech recognition performance varies pending when the utterance was uttered. This is because speech varies even if the speaker utters a specific sentence. However, the relationship between intra-speaker speech variability and speech recognition performance is not clear. We have not seen a corpus of Japanese speech data of a specific speaker over a long time period. Hence, since 2002, we have been collecting speech data for investigating the relationships between speech variability and speech recognition performance. In this paper, we introduce our speech corpus and conduct speech recognition experiments. Experimental results show that the variability of recognition performance over different days is larger than variability of recognition performance within a day.
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on;
-
[show abstract]
[hide abstract]
ABSTRACT: Even if a speaker uses a speaker-dependent speech recognition system, speech recognition performance varies. However, the relationships between intra-speaker's speech variability and speech recognition performance are not clear. To investigate these relationships, we have been collecting speech data since November 2002. In this paper, we analyze the relationships between intra-speaker's speech variability and the phoneme accuracy by a correlation analysis. Analyzed results showed the strong negative correlation between the phoneme accuracy and the speaking rate. The correlation coefficient indicated -0.77. Moreover, we can see that the phoneme accuracy is correlated with the temperature in the recording room and the humidity difference.
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on;
-
[show abstract]
[hide abstract]
ABSTRACT: WWW image retrieval systems can retrieve the corresponding images to the query keyword from WWW, however every system cannot retrieve suitable images with high precision. In this paper, a new WWW image retrieval system using the image knowledge database is proposed. This system can show more suitable images by filtering retrieval results of the conventional system. If the query keyword is not registered in the database, the user must select the suitable ones from image data retrieved by the conventional system, and then features of selected images are registered into the database as the supervised data. If the query keyword is the registered one, more similar images to the supervised data in the database can be indicated in the top order. The experimental results show that the average precision of this system becomes 11.6 % better than the conventional system.
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on;