Todays society generates and stores digital information in enormous amounts and at rapidly increasing rates. This trend affects all parts of modern society, such as commerce and economy, politics and governments, health and medicine, science in general, media and entertainment, the private sector, etc. The stored information comprises text documents, images, audio files, videos, structured data from a variety of sources, as well as multimodal combinations of them, and is available in a multitude of electronic formats and flavors. As a consequence, the need for automated and interactive tools supporting tasks, such as searching, exploring, monitoring, sorting, and making sense of this information at different levels of abstraction and within different but steadily converging domains, increases at the same pace as the data is generated and represents one of the biggest challenges for current computer science.
A relatively young approach to tackle these tasks by exploiting human analytic power in synergetic combination with advanced computerized techniques has emerged with the research field of visual analytics. Visual analytics aims at combining automated methods, visualization techniques, and approaches from the field of human computer interaction in order to equip analysts with more powerful tools, tailored to domains, where large amounts of data must be analyzed. In this work, visual analytics methods and concepts play a central role. They are used to search and analyze texts or multimodal documents containing a considerable amount of textual content. The presented approaches are primarily employed for analyzing a very special type of document from the intellectual property domain, namely patents. Since the retrieval and analysis tasks carried out in the patent domain differ greatly from standard search and analysis tasks regarding rigorous requirements, high costs, and the involved risks, new, more effective, efficient, and more reliable methods need to be developed.
Accordingly, this thesis focuses on researching the combination of automatic methods and information visualization by using advanced interaction techniques in order to improve upon the state of the art in patent literature retrieval. Such integration is achieved and exemplified through different visual analytics prototypes, aiming at creating support for real-world tasks and processes. The main contributions presented in this thesis encompass enhancements for all stages of patent literature analysis processes. This includes improving patent search by presenting techniques for interactive visual query building, which helps analysts to formulate complex information needs, the development of a technique that allows users to build their own precise search mechanism in the form of binary classifiers, and advanced approaches for making sense of a retrieved result set through visual analysis. The latter builds the base to let users generate insights needed for judging and improving previous query formulations. Interaction methods facilitating forward analysis as well as feedback loops, which constitute a critical part of visual analytics approaches, are discussed afterwards. These methods are the key to integrating all stages of the patent analysis process in a seamless visual manner. Another contribution is the discussion of scalability issues in context of the described visual analytics approaches. Especially interaction scalability, the recording of analytic provenance, insight management, the visualization of analytic reporting, and collaborative approaches are addressed.
Although the described approaches are exemplified by applying them to the field of intellectual property analysis, the developments regarding search and analysis have the potential to be adapted to complicated text document retrieval and analysis tasks in various domains. The general ideas regarding the facilitation of low-level feedback loops, user-steered machine classification, and technical solutions for diminishing negative scalability effects can be directly transferred to other visual analytics scenarios.