March 2012
·
13 Reads
Latent Semantic Indexing (LSI) is an information retrieval (IR) method that connects IR with numerical linear algebra by representing a dataset as a term-document matrix. Because of the tremendous size of modern databases, such matrices can be very large. The partial singular value decomposition (PSVD) is a matrix factorization that captures the salient features of a matrix, while using much less storage. We look at two challenges posed by this PSVD data compression process in LSI. Traditional methods of computing the PSVD are very expensive; most of the processing time in LSI is spent in calculating the PSVD of the term-document matrix. Thus, the first challenge is calculating the PSVD efficiently, in terms of computational and memory requirements. The second challenge is efficiently updating the PSVD when the matrix is altered slightly. In a rapidly expanding environment, such as the Internet, the term-document matrix is altered often as new documents and terms are added. Updating the PSVD of this matrix is much more efficient than recalculating it after each change. We investigate the use of the PSVD updating