Are you William M. Pottenger?

Claim your profile

Publications (4)0 Total impact

  • Chapter: Identification of Critical Values in Latent Semantic Indexing
    April Kontostathis, William M. Pottenger, Brian D. Davison
    [show abstract] [hide abstract]
    ABSTRACT: In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.
    09/2005: pages 333-346;
  • Source
    Article: Assessing The Impact of Sparsification on LSI Performance
    William M. Pottenger, Brian D. Davison
    [show abstract] [hide abstract]
    ABSTRACT: We describe an approach to information retrieval using Latent Semantic Indexing (LSI) that directly manipulates the values in the Singular Value Decomposition (SVD) matrices. We convert the dense term by dimension matrix into a sparse matrix by removing a fixed percentage of the values. We present retrieval and runtime performance results, using seven collections, which show that using this technique to remove up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI), while reducing memory requirements and query response time. Removal of 90% of the values results in significantly reduced memory requirements and dramatic improvements in query response time. Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested.
    09/2004;
  • Source
    Article: Identification of Critical Values in Latent Semantic Indexing
    William M. Pottenger, Brian D. Davison
    [show abstract] [hide abstract]
    ABSTRACT: This paper reports the results of a study to determine the most critical elements of the T k and S k D k matrices, which are input to LSI. We are interested in the impact, both in terms of retrieval quality and query run time performance, of the removal (zeroing) of a large portion of the entries in these matrices
    08/2004;
  • Source
    Article: Robust document image understanding technologies
    [show abstract] [hide abstract]
    ABSTRACT: No existing document image understanding technology, whether experimental or commercially available, can guarantee high accu-racy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among document images as effec-tively as they can among encoded data files, using familiar inter-faces and tools as fully as possible. We are investigating novel algorithms and software tools at the frontiers of document image analysis, information retrieval, text mining, and visualization that will assist in the full integration of such documents into collections of textual document images as well as "born digital" documents. Our approaches emphasize versatility first: that is, methods which work reliably across the broadest possible range of documents.