Conference Paper

Visual trend analysis with digital libraries

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The early awareness of new technologies and upcoming trends is essential for making strategic decisions in enterprises and research. Trends may signal that technologies or related topics might be of great interest in the future or obsolete for future directions. The identification of such trends premises analytical skills that can be supported through trend mining and visual analytics. Thus the earliest trends or signals commonly appear in science, the investigation of digital libraries in this context is inevitable. However, digital libraries do not provide sufficient information for analyzing trends. It is necessary to integrate data, extract information from the integrated data and provide effective interactive visual analysis tools. We introduce in this paper a model that investigates all stages from data integration to interactive visualization for identifying trends and analyzing the market situation through our visual trend analysis environment. Our approach improves the visual analysis of trends by investigating the entire transformation steps from raw and structured data to visual representations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Early technological trends are often propagated first in research and scientific publications. Therefore, these data are the "real" information pool for early signals and trends [2]. Although, scientific publications and their value for identifying early trends is obvious, a real analysis and in particular identification of emerging trends out of textual scientific publications are rarely proposed. ...
... geographical or semantic) on data to inspect and analyze potential technological trends with regards to the following questions: (1) when have technologies or topics emerged and when established? (2) where are the key-players and keylocations, (3) who are the key-players, (4) what are the coretopics (4) how will the technologies or topics probably evolve, and which technologies or topics are relevant for a certain enterprise or application area? Addressing these question requires the ability to get an overview of core topics that are relevant at the moment, navigate through the different perspectives, analyze the results, and reason about probable evolving of trends. ...
... At the top (1), users are able to search (including advanced search formulation capabilities), activate assisted search, select a database or hide the visualization selection area. The assisted search is implemented according to our previous work [2] that enhances users' performed query based on the resulted top five phrases of the top ranked topic [2]. At the left (2) the facets of the underlying data are generated and visualized dynamically. ...
Conference Paper
The awareness of emerging technologies is essential for strategic decision making in enterprises. Emerging and decreasing technological trends could lead to strengthening the competitiveness and market positioning. The exploration, detection and identification of such trends can be essentially supported through information visualization, trend mining and in particular through the combination of those. Commonly, trends appear first in science and scientific documents. However, those documents do not provide sufficient information for analyzing and identifying emerging trends. It is necessary to enrich data, extract information from the integrated data, measure the gradient of trends over time and provide effective interactive visualizations. We introduce in this paper an approach for integrating, enriching, mining, analyzing, identifying and visualizing emerging trends from scientific documents. Our approach enhances the state of the art in visual trend analytics by investigating the entire analysis process and providing an approach for enabling human to explore undetected potentially emerging trends. (Best Paper Award at IV2019)
... Their model contains query generation, data collection, data pre-processing, topic modeling, topic analysis and visualization, whereas only the query generation is assigned to be performed partially by humans [34]. The proposed process is similar to our early work [40] that investigates (without natural language processing) the same transformation process that was refined and enhanced with the emergence of trends [36]. Although one main outcome of their approach is visualization, a real analytical interaction design for visual representations is not introduced. ...
... The user should be enabled to gather an overall trend evolution and different perspectives (e.g., geographical or temporal). Thus, the focus lies on answering the questions [40] toward the analysis of potential technological trends: "(1) when have technologies or topics emerged and when were they established, (2) who are the key players, (3) where are the key players and key locations, (4) what are the core topics (5) how will the technologies or topics probably evolve, and (6) which technologies or topics are relevant for a certain enterprise or application area?" [36,40] The question where basically introduced by Marchionini [30] toward exploratory searches. However, we expanded the question space and adopted them to the specific characteristics of technology and innovation management that includes early trend detection. ...
... The user should be enabled to gather an overall trend evolution and different perspectives (e.g., geographical or temporal). Thus, the focus lies on answering the questions [40] toward the analysis of potential technological trends: "(1) when have technologies or topics emerged and when were they established, (2) who are the key players, (3) where are the key players and key locations, (4) what are the core topics (5) how will the technologies or topics probably evolve, and (6) which technologies or topics are relevant for a certain enterprise or application area?" [36,40] The question where basically introduced by Marchionini [30] toward exploratory searches. However, we expanded the question space and adopted them to the specific characteristics of technology and innovation management that includes early trend detection. ...
Article
Full-text available
The awareness of emerging trends is essential for strategic decision making because technological trends can affect a firm’s competitiveness and market position. The rise of artificial intelligence methods allows gathering new insights and may support these decision-making processes. However, it is essential to keep the human in the loop of these complex analytical tasks, which, often lack an appropriate interaction design. Including special interactive designs for technology and innovation management is therefore essential for successfully analyzing emerging trends and using this information for strategic decision making. A combination of information visualization, trend mining and interaction design can support human users to explore, detect, and identify such trends. This paper enhances and extends a previously published first approach for integrating, enriching, mining, analyzing, identifying, and visualizing emerging trends for technology and innovation management. We introduce a novel interaction design by investigating the main ideas from technology and innovation management and enable a more appropriate interaction approach for technology foresight and innovation detection.
... In a first step, a common baseline is defined through the identification of appropriate data in certain databases. If these data do not provide any kind of information, data enrichment methods as described in [27,29] are used to extract though a unique identifier, such as the Document Object identifier (DOI) with further information with web-mining methods [27,29]. ...
... In a first step, a common baseline is defined through the identification of appropriate data in certain databases. If these data do not provide any kind of information, data enrichment methods as described in [27,29] are used to extract though a unique identifier, such as the Document Object identifier (DOI) with further information with web-mining methods [27,29]. ...
... Furthermore, data from other sources are gathered and compared either through the identifier or through the combination of several parameters that lead to a unique identification, e.g. in scientific publications title, year of publication and authors are used for enriching the data base. This method can be applied to eliminate duplicates too [27,29]. ...
Conference Paper
Visual Analytics enables a deep analysis of complex and multivariate data by applying machine learning methods and interactive visualization. These complex analyses lead to gain insights and knowledge for a variety of analytics tasks to enable the decision-making process. The enablement of decision-making processes is essential for managing and planning mobility and transportation. These are influenced by a variety of indicators such as new technological developments, ecological and economic changes, political decisions and in particular humans’ mobility behaviour. New technologies will lead to a different mobility behaviour with other constraints. These changes in mobility behaviour require analytical systems to forecast the required information and probably appearing changes. These systems must consider different perspectives and employ multiple indicators. Visual Analytics enable such analytical tasks. We introduce in this paper the main indicators for Visual Analytics for mobility and transportation that are exemplary explained through two case studies.
... The commonly used search mechanisms are primarily focused on providing the users with easy access to information of their interest and deal with the access to information items and resources [20], but neither provide an overview of the content nor enable the exploration of emerging or disappearing technological changes. [21] This case study builds upon our previous work on visual trend analytics [21] and enhances the approach with dashboarding to allow the analysis of early signals and technologies in mobility and transportation. The main questions that should be answered here are the following (1) when have technologies in mobility emerged and when established? ...
... The commonly used search mechanisms are primarily focused on providing the users with easy access to information of their interest and deal with the access to information items and resources [20], but neither provide an overview of the content nor enable the exploration of emerging or disappearing technological changes. [21] This case study builds upon our previous work on visual trend analytics [21] and enhances the approach with dashboarding to allow the analysis of early signals and technologies in mobility and transportation. The main questions that should be answered here are the following (1) when have technologies in mobility emerged and when established? ...
... (2) where are the key-players and key-locations of those technologies, (3) who are the key-players, (4) what are the core-topics related to those technologies (4) how will the technologies probably evolve, and (5) which technologies or topics are relevant for an enterprise? [21] The integrated dashboards should allow to view and interact with the different aspects of data gathered through unsupervised machine-learning methods. The underlying data in our case study come from the DBLP-data base and are enriched with further information and information extraction methods. ...
Article
Full-text available
Mobility, logistics and transportation are emerging fields of research and application. Humans’ mobility behavior plays an increasing role for societal challenges. Beside the societal challenges these areas are strongly related to technologies and innovations. Gathering information about emerging technologies plays an increasing role for the entire research in this ares. Humans’ information processing can be strongly supported by Visual Analytics that combines automatic modelling and interactive visualizations. The juxtapose orchestration of interactive visualization enables gathering more information in a shorter time. We propose in this paper an approach that goes beyond the established methods of dashboarding and enables visualizing different databases, data-sets and sub-sets of data with juxtaposed visual interfaces. Our approach should be seen as an expandable method. Our main contributions are an in-depth analysis of visual task models and an approach for juxtaposing visual layouts as visual dashboards to enable solving complex tasks. We illustrate our main outcome through a case study that investigates the area of mobility and illustrates how complex analytical tasks can be performed easily by combining different visual interfaces.
... In their evaluation they could find out that users of such analytics systems are willing to adopt and leverage functionality in contrast to web searchers [11]. Nazemi et al. [19], proposed the following main question in Visual Analytics systems for technology and innovation management: (1) when have technologies or topics emerged and when established? (2) "where are the key-players and key-locations, (3) who are the key-players, (4) what are the coretopics (4) how will the technologies or topics evolve, and (5) which technologies or topics are relevant for an enterprise?" [19, p. 4]. ...
... The assisted search was applied according to Nazemi et al. [19] and extends the search functionalities beside traditional linguistic methods with a topic-based approach. The approach incorporates the information of the generated topics to enhance search-terms from the query. ...
... We integrated a set of different visualizations in particular to respond to the questions according to Nazemi et al. [19]: (1) when have technologies or topics emerged and when established? (2) "where are the key-players and key-locations, (3) who are the key-players, (4) what are the core-topics (4) how will the technologies or topics evolve, and (5) which technologies or topics are relevant for an enterprise?" [19, p. 4]. ...
Chapter
Visual Analytics provides with a combination of automated techniques and interactive visualizations huge analysis possibilities in technology and innovation management. Thereby not only the use of machine learning data mining methods plays an important role. Due to the high interaction capabilities, it provides a more user-centered approach, where users are able to manipulate the entire analysis process and get the most valuable information. Existing Visual Analytics systems for Trend Analytics and technology and innovation management do not really make use of this unique feature and almost neglect the human in the analysis process. Outcomes from research in information search, information visualization and technology management can lead to more sophisticated Visual Analytics systems that involved the human in the entire analysis process. We propose in this paper a new interaction approach for Visual Analytics in technology and innovation management with a special focus on technological trend analytics.
... Visual trend analysis combines trend analysis with the benefits of visual analytics. Nazemi et al. [10,12] describe their approach to visual trend analysis as a process which starts with the extraction and identification of key topics and their relevance, over a specific period, from scientific publications. The extracted information can be displayed in visual analysis tools which support users in the process of identifying trends and making decisions. ...
... The first step when building such a strategy is an analysis of the current situation. The information gathered in the process defined by Nazemi et al. [10] can assist in this process, by providing valuable information about technologies and the most relevant authors in different research areas. As the information comes in huge amounts of nested datasets, the data must be transformed into data models which can be visualized and orchestrated. ...
Conference Paper
The rapid change due to digitalization challenge a variety of market players and force them to find strategies to be aware of developments in these markets, particularly those that impact their business. The main challenge is what a practical solution could look like and how technology can support market players in these trend observation tasks. The paper outlines therefore a technological solution to observe specific authors e.g. researchers who influence a certain market or engineers of competitors. In many branches both are well-known groups to market players and there is almost always the need of a technology that supports the topical observation. This paper focuses on the concept of how a visual dashboard could enable a market observation and how data must be processed for it and its prototypical implementation which enables an evaluation later. Furthermore, the definition of a principal technological analysis for innovation and technology management is created and is also an important contribution to the scientific community that specifically considers the technology perspective and its corresponding requirements.
... Tracking the hot front of the discipline is conducive to scientific researchers to better grasp the scientific trends and trends. The frontiers of the subject are determined according to the mutational terms extracted from the literature titles, abstracts and the high frequency words of the article [8,9] . The mutation algorithm is better for us to detect the occurrence of mutation words, so as to better apply the time series analysis and discover the frontier of the subject. ...
... Tracking the hot front of the discipline is conducive to researchers to better grasp the scientific trends and trends, mutation algorithm can be used to detect the emergence of mutant words, In order to make better use of time series related literature analysis and retrieval [8,9] , In Citespace II, research frontiers are based on abrupt terminology extracted from identifiers of titles, abstracts, keywords, and documentation. In this study, we selected the node type as the key word, detected a total of 2171 abrupt terminology, and select the display type as "time zone" to generate a research frontier glossaries and frontier view of the 15 years of visual analytics. ...
... These data do not provide any kind of information beside the authors' names, titles, years and in most cases a DOI. Using that DOI, the data are enriched through web-mining methods [11,12]. Further data from other sources are gathered and compared either through the DOI or through the combination of title and authors to eliminate duplicates. ...
... The above described case study just introduced a small number of analytical possibilities. A deeper insight can be found in [11,12,14]. ...
Chapter
Mobility, transportation and logistics are more and more influenced by a variety of indicators such as new technological developments, ecological and economic changes, political decisions and in particular humans’ mobility behavior. These indicators will lead to massive changes in our daily live with regards to mobility, transportation and logistics. New technologies will lead to a different mobility behavior with new constraints. These changes in mobility behavior and logistics require analytical systems to forecast the required information and probably appearing changes. These systems have to consider different perspectives and employ multiple indicators. Visual Analytics provides both, the analytical approaches by including machine learning approaches and interactive visualizations to enable such analytical tasks. In this paper the main indicators for Visual Analytics in the domain of mobility transportation and logistics are discussed and followed by exemplary case studies to illustrate the advantages of such systems. The examples are aimed to demonstrate the benefits of Visual Analytics in mobility.
... For instance, the analysis of statistical government data or the search and use of simulation models as well as simulators for foresight analysis is difficult and support in that direction is beneficial for users 13 . This is similar to visual trend analysis 14 , where also here a couple of options do exist to analyze massive textual data with the goal of early identification of upcoming trends in form of innovation. ...
Article
Full-text available
A new approach for classifying users’ search intentions is described in this paper. The approach uses the parameters: word frequency, query length and entity matching for distinguishing the user's query into exploratory, targeted and analysis search. The approach focuses mainly on word frequency analysis, where different sources for word frequency data are considered such as the Wortschatz frequency service by the University of Leipzig and the Microsoft Ngram service (now part of the Microsoft Cognitive Services). The model is evaluated with the help of a survey tool and few machine learning techniques. The survey was conducted with more than one hundred users and on evaluating the model with the collected data, the results are satisfactory. In big data applications the search intention analysis can be used to identify the purpose of a performed search, to provide an optimal initially set of visualizations that respects the intended task of the user to work with the result data.
... The Visual Analytics systems that were designed to be used in particular for analytical tasks and integrated any kind of data modelling in term of data mining methods to enable handling huge amounts of data. The systems were integrated and used in a variety of application domains, such as medicine or Business Analytics [27]. Thomas and Cook defined Visual Analytics in their pioneering work [28] as "[…] the science of analytical reasoning facilitated by interactive visual interfaces" [28, p. 4]. ...
Chapter
Visual Analytics enables solving complex and analytical tasks by combining automated data analytics methods and interactive visualizations. The complexity of tasks, the huge amount of data and the complex visual representation may overstrain the users of such systems. Intelligent and adaptive visualizations system show already promising results to bridge the gap between human and the complex visualization. We introduce in this paper a revised version of layer-based visual adaptation model that considers the human perception and cognition abilities. The model is then used to enhance the most popular Visual Analytics model to enable the development of Intelligent Visual Analytics systems.
... In a next step we aim to include the search intention analysis feature to consider in visual trend analysis solution [30] to provide a better support of users in the challenging domain of trend detection in technology and innovation management. ...
Article
Full-text available
Visual information search systems support different search approaches such as targeted, exploratory or analytical search. Those visual systems deal with the challenge of composing optimal initial result visualization sets that face the search intention and respond to the search behavior of users. The diversity of these kinds of search tasks require different sets of visual layouts and functionalities, e.g. to filter, thrill-down or even analyze concrete data properties. This paper describes a new approach to calculate the probability towards the three mentioned search intentions, derived from users’ behavior. The implementation is realized as a web-service, which is included in a visual environment that is designed to enable various search strategies based on heterogeneous data sources. In fact, based on an entered search query our developed search intention analysis web-service calculates the most probable search task, and our visualization system initially shows the optimal result set of visualizations to solve the task. The main contribution of this paper is a probability-based approach to derive the users’ search intentions based on the search behavior enhanced by the application to a visual system.
... (open access) digital libraries and web data, such as news from enterprises or market magazines/blogs/portals. With this data and a highly interactive visual analytical solutions for trend analysis [3][4][5], technology and innovation foresight is possible to identify early market trends. However, to perform sufficient analysis professional knowledge about how to do an analysis would be still required. ...
Chapter
In the domain of mobility and logistics, a variety of new technologies and business ideas are arising. Beside technologies that aim on ecologically and economic transportation, such as electric engines, there are also fundamental different approaches like central packaging stations or deliveries via drones. Yet, there is a growing need for analytical systems that enable identifying new technologies, innovations, business models etc. and give also the opportunity to rate those in perspective of business relevance. Commonly adaptive systems investigate only the users’ behavior, while a process-related supports could assist to solve an analytical task more efficient and effective. In this article an approach that enables non-experts to perform visual trend analysis through an advanced process support based on process mining is described. This allow us to calculate a process model based on events, which is the baseline for process support feature calculation. These features and the process model enable to assist non-expert users in complex analytical tasks.
... The integrated data models allow us to provide several interactive visual layouts that enable information gathering from different perspectives [29]. An overview of emerging topics gathered from the entire database is illustrated in Figure 3-a. ...
Conference Paper
Scientific publications are an essential resource for detecting emerging trends and innovations in a very early stage, by far earlier than patents may allow. Thereby Visual Analytics systems enable a deep analysis by applying commonly unsupervised machine learning methods and investigating a mass amount of data. A main question from the Visual Analytics viewpoint in this context is, do abstracts of scientific publications provide a similar analysis capability compared to their corresponding full-texts? This would allow to extract a mass amount of text documents in a much faster manner. We compare in this paper the topic extraction methods LSI and LDA by using full text articles and their corresponding abstracts to obtain which method and which data are better suited for a Visual Analytics system for Technology and Corporate Foresight. Based on a easy replicable natural language processing approach, we further investigate the impact of lemmatization for LDA and LSI. The comparison will be performed qualitative and quantitative to gather both, the human perception in visual systems and coherence values. Based on an application scenario a visual trend analytics system illustrates the outcomes.
Chapter
To stay competitive in an environment of rapidly changing science, it is important to monitor the development of existing technology and to discover new and promising technologies. Similarly, it is necessary for a firm to establish a technology development strategy through emerging technology forecast to gain a competitive edge while utilizing limited resources. Numerous methods of emerging technology trend analysis and forecast (TTAF) have been proposed; however, no study described data mining methods’ review of this research area in a systematic and structured procedure. Hence, this paper intends to give a review of TTAF data mining methods and shortages by surveying and constructing challenging problems, research and resolving approaches. Moreover, the study highlights adopted data mining methods and types of data sources. Specifically, 50 documents from SCOPUS over a ten-year timespan between 2010 and 2019 were systematically reviewed, and each performing step was followed properly in accordance with systematic mapping study.
Conference Paper
A variety of new technologies and ideas for businesses are arising in the domain of logistics and mobility. It can be differentiated between fundamental new approaches, e.g. central packaging stations or deliveries via drones and minor technological advancements that aim on more ecologically and economic transportation. The need for analytical systems that enable identifying new technologies, innovations, business models etc. and give also the opportunity to rate those in perspective of business relevance is growing. The users’ behavior is commonly investigated in adaptive systems, which is considering the induvial preferences of users, but neglecting often the tasks and goals of the analysis. A process-related supports could assist to solve an analytical task in a more efficient and effective way. We introduce in this paper an approach that enables non-professionals to perform visual trend analysis through an advanced process assistance based on process mining and visual adaptation. This allows generating a process model based on events, which is the baseline for process support feature calculation. These features in form of visual adaptations and the process model enable assisting non-experts in complex analytical tasks.
Chapter
Full-text available
Politische und gesellschaftliche Prozesse werden durch Informationen sehr stark geprägt, wie auch die jüngsten Ereignisse aufzeigen. Diese Informationen können, trotz enormer Fortschritte, nicht immer aus den sehr großen, heterogenen und verteilten Daten entnommen werden. „Big Data“ stellt somit auch in der öffentlichen Verwaltung eine immer größere Herausforderung dar. Sowohl durch eine umfangreiche Erhebung von Statistiken, als auch durch Dokumente wie Berichte und Studien, wachsen in Behörden die zu bewältigenden Informationsaufgaben. Darüber hinaus spielt die Be-rücksichtigung von Bürgermeinungen, vor allem auf kommunaler Ebene, eine immer größere Rolle. Eine Auswertung ohne moderne Informationstechnik ist dabei kaum mehr möglich. Damit aber aus diesen Daten tatsächlich die relevanten Informationen extrahiert werden, bedarf es Informationsvisu-alisierung und Visual Analytics Systeme die sehr detaillierte, aber dennoch einfache und schnelle Analysen für den Menschen erlauben. Dies stellt aber sehr hohe Anforderungen an die visuellen Systeme, da sie gleichzeitig auch den Nutzer und dessen Fähigkeiten berücksichtigen müssen.
Chapter
Monitoring the development of existing technologies and discovering new and promising technologies help enterprises or firms obtain competitiveness in an environment of rapidly changing science. In other words, it is significant for a firm to promote a technology development strategy based on forecasting emerging technologies to gain competitive advantages while utilizing finite resources. In addition to the prospective changes, the speed and directions of technologies are considered to forecast future technology. Analyzing technically relevant historical information can be applied to identify how changes in technological developments are influenced by current and past changes in related technologies. An important aspect for identifying technological trends is the data. Among large amounts of data generated every day, Web news is an important data resource for studying social public awareness of emerging technologies. For this reason, this paper presents results of analyzing and forecasting emerging technology trends in Web news by utilizing burst detection algorithm, which is still impactful and significant in recent researches. After applying burst detection algorithm and clustering burst terms manually, we are able to detect and infer some main technological development trends for the near future, which are evaluated later by domain experts. That means the proposed method is effective and prospective for analyzing and forecasting technology trends.
Article
Purpose This paper aims to present an overview of the challenges encountered in integrating visual search interfaces into digital libraries and repositories. These challenges come in various forms, including information visualisation, the use of knowledge organisation systems and metadata quality. The main purpose of this study is the identification of criteria for the evaluation and integration of visual search interfaces, proposing guidelines and recommendations to improve information retrieval tasks with emphasis on the education-al context. Design/methodology/approach The information included in this study was collected based on a systematic literature review approach. The main information sources were explored in several digital libraries, including Science Direct, Scopus, ACM and IEEE, and include journal articles, conference proceedings, books, European project reports and deliverables and PhD theses published in an electronic format. A total of 142 studies comprised the review. Findings There are several issues that authors did not fully discuss in this literature review study; more specific, aspects associated with access of digital resources in digital libraries and repositories based on human computer interaction, i.e. usability and learnability of user interfaces; design of a suitable navigation method of search based on simple knowledge organisation schemes; and the use of usefulness of visual search interfaces to locate relevant resources. Research limitations/implications The main steps for carrying out a systematic review are drawn from health care; this methodology is not commonly used in fields such as digital libraries and repositories. The authors aimed to apply the fundamentals of the systematic literature review methodology considering the context of this study. Additionally, there are several aspects of accessibility that were not considered in the study, such as accessibility to content for disabled people as defined by ISO/IEC 40500:2012. Originality/value No other systematic literature reviews have been conducted in this field. The research presents an in-depth analysis of the criteria associated with searching and navigation methods based on the systematic literature review approach. The analysis is relevant for researchers in the field of digital library and repository creation in that it may direct them to considerations in designing and implementing visual search interfaces based on the use of information visualisation.
Chapter
Strategic foresight, corporate foresight, and technology management enable firms to detect discontinuous changes early and develop future courses for a more sophisticated market positioning. The enhancements in machine learning and artificial intelligence allow more automatic detection of early trends to create future courses and make strategic decisions. Visual Analytics combines methods of automated data analysis through machine learning methods and interactive visualizations. It enables a far better way to gather insights from a vast amount of data to make a strategic decision. While Visual Analytics got various models and approaches to enable strategic decision-making, the analysis of trends is still a matter of research. The forecasting approaches and involvement of humans in the visual trend analysis process require further investigation that will lead to sophisticated analytical methods. We introduce in this paper a novel model of Visual Analytics for decision-making, particularly for technology management, through early trends from scientific publications. We combine Corporate Foresight and Visual Analytics and propose a machine learning-based Technology Roadmapping based on our previous work. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Book
Full-text available
This book introduces a novel approach for intelligent visualizations that adapts the different visual variables and data processing to human’s behavior and given tasks. Thereby a number of new algorithms and methods are introduced to satisfy the human need of information and knowledge and enable a usable and attractive way of information acquisition. Each method and algorithm is illustrated in a replicable way to enable the reproduction of the entire “SemaVis” system or parts of it. The introduced evaluation is scientifically well-designed and performed with more than enough participants to validate the benefits of the methods. Beside the introduced new approaches and algorithms, readers may find a sophisticated literature review in Information Visualization and Visual Analytics, Semantics and information extraction, and intelligent and adaptive systems. This book is based on an awarded and distinguished doctoral thesis in computer science.
Thesis
Full-text available
Human access to the increasing amount of information and data plays an essential role for the professional level and also for everyday life. While information visualization has developed new and remarkable ways for visualizing data and enabling the exploration process, adaptive systems focus on users’ behavior to tailor information for supporting the information acquisition process. Recent research on adaptive visualization shows promising ways of synthesizing these two complementary approaches and make use of the surpluses of both disciplines. The emerged methods and systems aim to increase the performance, acceptance, and user experience of graphical data representations for a broad range of users. Although the evaluation results of the recently proposed systems are promising, some important aspects of information visualization are not considered in the adaptation process. The visual adaptation is commonly limited to change either visual parameters or replace visualizations entirely. Further, no existing approach adapts the visualization based on data and user characteristics. Other limitations of existing approaches include the fact that the visualizations require training by experts in the field. In this thesis, we introduce a novel model for adaptive visualization. In contrast to existing approaches, we have focused our investigation on the potentials of information visualization for adaptation. Our reference model for visual adaptation not only considers the entire transformation, from data to visual representation, but also enhances it to meet the requirements for visual adaptation. Our model adapts different visual layers that were identified based on various models and studies on human visual perception and information processing. In its adaptation process, our conceptual model considers the impact of both data and user on visualization adaptation. We investigate different approaches and models and their effects on system adaptation to gather implicit information about users and their behavior. These are than transformed and applied to affect the visual representation and model human interaction behavior with visualizations and data to achieve a more appropriate visual adaptation. Our enhanced user model further makes use of the semantic hierarchy to enable a domain-independent adaptation. To face the problem of a system that requires to be trained by experts, we introduce the canonical user model that models the average usage behavior with the visualization environment. Our approach learns from the behavior of the average user to adapt the different visual layers and transformation steps. This approach is further enhanced with similarity and deviation analysis for individual users to determine similar behavior on an individual level and identify differing behavior from the canonical model. Users with similar behavior get similar visualization and data recommendations, while behavioral anomalies lead to a lower level of adaptation. Our model includes a set of various visual layouts that can be used to compose a multi-visualization interface, a sort of "‘visualization cockpit"’. This model facilitates various visual layouts to provide different perspectives and enhance the ability to solve difficult and exploratory search challenges. Data from different data-sources can be visualized and compared in a visual manner. These different visual perspectives on data can be chosen by users or can be automatically selected by the system. This thesis further introduces the implementation of our model that includes additional approaches for an efficient adaptation of visualizations as proof of feasibility. We further conduct a comprehensive user study that aims to prove the benefits of our model and underscore limitations for future work. The user study with overall 53 participants focuses with its four conditions on our enhanced reference model to evaluate the adaptation effects of the different visual layers.
Article
Full-text available
News reports are an important source of information about society. Their analysis allows to understand its current interests and to measure the social importance of many events. In this paper, we use the analysis of news as a means to explore the society interests. We present a text mining technique that uncovers trends, discovers associations and detects deviations from news notes. The method uses simple statistical representations of the news reports (frequencies and probability distributions of topics) and statistical measures (the average or median, the standard deviation, and the correlation coefficient) for analysis and discovery of useful information. We illustrate the method with some results obtained from preliminary experiments and discuss their main implications.
Article
Full-text available
This paper presents a system for visualization of large amounts of new stories. In the first phase, the new stories are preprocessed for the purpose of name -entity extraction. Next, a graph of relationships between the extracted name entities is created, where each name entity represents one vertex in the graph and two name entities are connected if they appear in the same document. The graph of entities is presented as a local neighborhood enriched with additional contextual information in the form of characteristic keywords and related name entities connected to the entity in the focus. Operations for browsing a graph are implemented to be efficient enabling quick capturing of large amounts of information present in the original text.
Conference Paper
Full-text available
Bibliographic databases are a prosperous field for data mining research and social network analysis. The representation and visualization of bibliographic databases as graphs and the application of data mining techniques can help us uncover interesting knowledge regarding how the publication records of authors evolve over time. In this paper we propose a novel methodology to model bibliographical databases as Power Graphs, and mine them in an unsupervised manner, in order to learn basic author types and their properties through clustering. The methodology takes into account the evolution of the co-authorship information, the volume of published papers over time, as well as the impact factors of the venues hosting the respective publications. As a proof of concept of the applicability and scalability of our approach, we present experimental results in the DBLP data. KeywordsPower Graph Analysis–Authors’ Clustering–Graph Mining
Conference Paper
Full-text available
Do court cases differ from place to place? What kind of picture do we get by looking at a country's collection of law cases? We introduce parallel tag clouds: a new way to visualize differences amongst facets of very large metadata-rich text corpora. We have pointed parallel tag clouds at a collection of over 600,000 US Circuit Court decisions spanning a period of 50 years and have discovered regional as well as linguistic differences between courts. The visualization technique combines graphical elements from parallel coordinates and traditional tag clouds to provide rich overviews of a document collection while acting as an entry point for exploration of individual texts. We augment basic parallel tag clouds with a details-in-context display and an option to visualize changes over a second facet of the data, such as time. We also address text mining challenges such as selecting the best words to visualize, and how to do so in reasonable time periods to maintain interactivity.
Conference Paper
Full-text available
We describe a system that monitors social and mainstream media to determine shifts in what people are thinking about a product or company. We process over 100,000 news articles, blog posts, review sites, and tweets a day for mentions of items (e.g., products) of interest, extract phrases that are mentioned near them, and determine which of the phrases are of greatest possible interest to, for example, brand managers. Case studies show a good ability to rapidly pinpoint emerging subjects buried deep in large volumes of data and then highlight those that are rising or falling in significance as they relate to the firms interests. The tool and algorithm improves the signal-to-noise ratio and pinpoints precisely the opportunities and risks that matter most to communications professionals and their organizations.
Conference Paper
Full-text available
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Conference Paper
Full-text available
PaperLens is a novel visualization that reveals trends, connections, and activity throughout a conference community. It tightly couples views across papers, authors, and references. PaperLens was developed to visualize 8 years (1995-2002) of InfoVis conference proceedings and was then extended to visualize 23 years (1982-2004) of the CHI conference proceedings. This paper describes how we analyzed the data and designed PaperLens. We also describe a user study to focus our redesign efforts along with the design changes we made to address usability issues. We summarize lessons learned in the process of design and scaling up to the larger set of CHI conference papers.
Conference Paper
Full-text available
The ever-increasing growth of the Web as primary provider of news and opinion shaper makes it impossible for individuals to manually spot and analyze all information of particular importance for globally-acting large-scale cor- porations. Hence, automated means of analysis, identifying upcoming topics of specific relevance and monitoring the reputation of the brand at hand, as well as its competitors, are becoming indispensable. In this paper, we present our platform for analyzing Web data for such purposes, adopting different semantic per- spectives and providing the market analyst with a flexible suite of instruments. We focus on two of these tools and outline their particular utility for research and exploration.
Conference Paper
Full-text available
Patent text is a rich source to discover technological progresses, useful to understand the trend and forecast upcoming advances. For the importance in mind, several researchers have attempted textual-data mining from patent documents. However, previous mining methods are limited in terms of readability, domain- expertise, and adaptability. In this paper, we first formulate the task of technological trend discovery and propose a method for discovering such a trend. We complement a probabilistic approach by adopting linguistic clues and propose an unsupervised procedure to discover technological trends. Based on the experiment, our method is promising not only in its accuracy, 77% in R-precision, but also in its functionality and novelty of discovering meaningful technological trends.
Book
Full-text available
This groundbreaking book defines the emerging field of information visualization and offers the first-ever collection of the classic papers of the discipline, with introductions and analytical discussions of each topic and paper. The authors' intention is to present papers that focus on the use of visualization to discover relationships, using interactive graphics to amplify thought. This book is intended for research professionals in academia and industry; new graduate students and professors who want to begin work in this burgeoning field; professionals involved in financial data analysis, statistics, and information design; scientific data managers; and professionals involved in medical, bioinformatics, and other areas. * Full-color reproduction throughout * Author power team - an exciting and timely collaboration between the field's pioneering, most-respected names * The only book on Information Visualization with the depth necessary for use as a text or as a reference for the information professional * Text includes the classic source papers as well as a collection of cutting edge work
Article
Full-text available
Tag clouds have proliferated over the web over the last decade. They provide a visual summary of a collection of texts by visually depicting the tag frequency by font size. In use, tag clouds can evolve as the associated data source changes over time. Interesting discussions around tag clouds often include a series of tag clouds and consider how they evolve over time. However, since tag clouds do not explicitly represent trends or support comparisons, the cognitive demands placed on the person for perceiving trends in multiple tag clouds are high. In this paper, we introduce SparkClouds, which integrate sparklines into a tag cloud to convey trends between multiple tag clouds. We present results from a controlled study that compares SparkClouds with two traditional trend visualizations-multiple line graphs and stacked bar charts-as well as Parallel Tag Clouds. Results show that SparkClouds' ability to show trends compares favourably to the alternative visualizations.
Article
Full-text available
From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This can be also viewed as extraction of hidden semantic concepts from text documents. On the other hand visualization can be very helpful in data analysis, for instance, for finding main topics that appear in larger sets of documents. Extraction of main concepts from documents using techniques such as LSI, can make the results of visualizations more useful. For example, given a set of descriptions of European Research projects (6FP) one can find main areas that these projects cover including semantic web, e-learning, security, etc. In this paper we describe a method for visualization of document corpus based on LSI, the system implementing it and give results of using the system on several datasets.
Article
Full-text available
This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science – research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature – an evolving network of scientific publications cited by research front concepts. Kleinberg’s burst detection algorithm is adapted to identify emergent research front concepts. Freeman’s betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are: 1) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, 2) the value of a co-citation cluster is explicitly interpreted in terms of research front concepts and 3) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified. Dimensions ID: pub.1001131784
Conference Paper
Full-text available
This paper presents a tool, called Text Map Explorer, which can be used to create and explore document maps (visual representations of document collections). This tool is capable of grouping (and separating) documents by their contents, revealing to the user relationships amongst them. This paper also presents a novel multi-dimensional projection technique for text that reduces the quadratic time complexity of our previous approach to O(N32), keeping the same quality of maps. The technique creates a surface that reveals intrinsic patterns and support various kinds of exploration of a text collection.
Article
Full-text available
The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this information flood. Knowledge Discovery in Databases (KDD) is a new paradigm that focuses on computerized exploration of large amounts of data and on discovery of relevant and interesting patterns within them. While most work on KDD is concerned with structured databases, it is clear that this paradigm is required for handling the huge amount of information that is available only in unstructured textual form. To apply traditional KDD on texts it is necessary to impose some structure on the data that would be rich enough to allow for interesting KDD operations. On the other hand, we have to consider the severe limitations of current text processing technology and define rather simple structures that can be extracted from texts fairly automatically and in a reasonable cost. We propose using a text categoriza...
Article
Full-text available
Information retrieval (IR) has changed considerably in the last years with the expansion of the Web (World Wide Web) and the advent of modern and inexpensive graphical user interfaces and mass storage devices. As a result, traditional IR textbooks have become quite out-of-date which has led to the introduction of new IR books recently. Nevertheless, we believe that there is still great need of a book that approaches the field in a rigorous and complete way from a computer-science perspective (in opposition to a user-centered perspective). This book is an effort to partially fulfill this gap and should be useful for a first course on information retrieval as well as for a graduate course on the topic.
Book
While being an entirely new domain where researchers and practitioners likewise push the frontier further westward, the interest in systems and methods for automatically capturing competitive strategic intelligence from Web-based resources is increasing at high-speed. Large companies like Microsoft and IBM have jumped on the bandwagon early on (see Ch. 3), seeing a growing market for such products. This new field does not embody an area of research in its own right, but rather draws from existing and well-established ones, like text and data mining, natural language processing, social network analysis, and information retrieval (see Ch. 2). However, the existence of the new research domain of "sentiment detectionv (see Sec. 2.6.2) is undoubtedly due to the emergence of knowledge capturing systems. Severalworkshops have already sparked that only focus on proceedings of sentiment detection techniques, such as the "Workshop on Subjectivity and Sentiment in Text", co-located with the ACL confe ence on computational linguistics (COLING). By means of this monograph and its accompanying publications, we contribute to the yet small body of research on mining competitive strategic intelligence. Our contributions have thereby been made along three major topics: Reputation analysis and trend monitoring. Drawing heavily from search engine technology and text classification and clustering, the systems we presented aim at gauging the public opinion on topics known beforehand as well as topics and issues potentially unknown by the time the analysis is conducted. Automated news extraction and filtering.While our contributions to reputation analysis and trend monitoring are applicable end-to-end, this is not the case for our two approaches to automated news extraction. These can rather be seen as pre-processing units that sit on the backend side of analysis and reporting systems such as the ones presented in Sec. 4.2. Semantic analysis of named entities and synergy detection. In contrast to the two previous topics, the one at hand relies on structured rather than unstructured (i.e., textual) information to generate actionable knowledge. Here, background information from the world's largest taxonomy, the DMOZ Open Directory Project, is exploited along with the hyperlink structure of articles on Wikipedia.
Article
We are building an interactive visual text analysis tool that aids users in analyzing large collections of text. Unlike existing work in visual text analytics, which focuses either on developing sophisticated text analytic techniques or inventing novel text visualization metaphors, ours tightly integrates state-of-the-art text analytics with interactive visualization to maximize the value of both. In this article, we present our work from two aspects. We first introduce an enhanced, LDA-based topic analysis technique that automatically derives a set of topics to summarize a collection of documents and their content evolution over time. To help users understand the complex summarization results produced by our topic analysis technique, we then present the design and development of a time-based visualization of the results. Furthermore, we provide users with a set of rich interaction tools that help them further interpret the visualized results in context and examine the text collection from multiple perspectives. As a result, our work offers three unique contributions. First, we present an enhanced topic modeling technique to provide users with a time-sensitive and more meaningful text summary. Second, we develop an effective visual metaphor to transform abstract and often complex text summarization results into a comprehensible visual representation. Third, we offer users flexible visual interaction tools as alternatives to compensate for the deficiencies of current text summarization techniques. We have applied our work to a number of text corpora and our evaluation shows promise, especially in support of complex text analyses.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
We present CitNetExplorer, a new software tool for analyzing and visualizing citation networks of scientific publications. CitNetExplorer can for instance be used to study the development of a research field, to delineate the literature on a research topic, and to support literature reviewing. We first introduce the main concepts that need to be understood when working with CitNetExplorer. We then demonstrate CitNetExplorer by using the tool to analyze the scientometric literature and the literature on community detection in networks. Finally, we discuss some technical details on the construction, visualization, and analysis of citation networks in CitNetExplorer.
Conference Paper
Adaptive visualizations aim to reduce the complexity of visual representations and convey information using interactive visualizations. Although the research on adaptive visualizations grew in the last years, the existing approaches do not make use of the variety of adaptable visual variables. Further the existing approaches often premises experts, who has to model the initial visualization design. In addition, current approaches either incorporate user behavior or data types. A combination of both is not proposed to our knowledge. This paper introduces the instantiation of our previously proposed model that combines both: involving different influencing factors for and adapting various levels of visual peculiarities, on visual layout and visual presentation in a multiple visualization environment. Based on data type and users’ behavior, our system adapts a set of applicable visualization types. Moreover, retinal variables of each visualization type are adapted to meet individual or canonic requirements on both, data types and users’ behavior. Our system does not require an initial expert modeling.
Conference Paper
Web-based digital libraries have sped up the process that scholars use to find new, important research papers. Unfortunately, current digital libraries are limited by their inadequate webpage-based paradigm, and it is easy for even the most experienced scholar to get lost. A paper and its immediate references are shown on a webpage, but it is not obvious where that paper belongs in the larger context of a field of research. The goal for our research was to develop and test the effectiveness of a web-based application, PaperCube, that was designed to augment a scholar's interaction with a digital library and explore bibliographic meta data using a defined set of visualizations. These visualizations needed to provide different levels of visibility into a paper's citation network without losing focus of the currently viewed paper. PaperCube was validated through a user study which showed that it was very useful when it comes to augmenting digital library search by reducing the ¿cognitive load¿ put on a scholar and aiding the ¿discoverability¿ of new research material.
Conference Paper
For companies acting on a global scale, the necessity to monitor and analyze news channels and consumer-generated media on the Web, such as weblogs and n news-groups, is steadily increasing. In particular the identification of novel trends and upcoming issues, as well as their dynamic evolution over time, is of utter importance to corporate communications and market analysts. Automated machine learning systems using clustering techniques have only partially succeeded in addressing these newly arising requirements, failing in their endeavor to properly assign short-term hype topics to long-term trends. We propose an approach which allows to monitor news wire on different levels of temporal granularity, extracting key-phrases that reflect short-term topics as well as longer-term trends by means of statistical language modelling. Moreover, our approach allows for assigning those windows of smaller scope to those of longer intervals.
Conference Paper
The InfoVis 2004 contest led to the development of several bibliography visualization systems. Even though each of these systems oers some unique views of the bibliography data, there is no single best sys- tem oering all the desired views. We have thus stud- ied how to consolidate the desirable functionalities of these systems into a cohesive design. We have also designed a few novel visualization methods. This pa- per presents our findings and creation: BiblioViz, a bibliography visualization system that gives the max- imum number of views of the data using a minimum number of visualization constructs in a unified fash- ion.
Conference Paper
Topic models, through their ability to automatically learn and assign topics to documents in a collection, have the potential to greatly improve how content is organized and searched in digital libraries. However, much remains to be done to assess the value of topic models in digital library applications. In this work, we present results from a user study, in which participants evaluated the similarity of books clustered using matched topics and Library of Congress Subject Headings (LCSH). Topics outperformed LCSH in 11 cases; LCSH outperformed topics in 4. These results suggest that topics are a viable alternative to LCSH.
Conference Paper
Scalable and effective analysis of large text corpora remains a challenging problem as our ability to collect textual data continues to increase at an exponential rate. To help users make sense of large text corpora, we present a novel visual analytics system, Parallel-Topics, which integrates a state-of-the-art probabilistic topic model Latent Dirichlet Allocation (LDA) with interactive visualization. To describe a corpus of documents, ParallelTopics first extracts a set of semantically meaningful topics using LDA. Unlike most traditional clustering techniques in which a document is assigned to a specific cluster, the LDA model accounts for different topical aspects of each individual document. This permits effective full text analysis of larger documents that may contain multiple topics. To highlight this property of the model, ParallelTopics utilizes the parallel coordinate metaphor to present the probabilistic distribution of a document across topics. Such representation allows the users to discover single-topic vs. multi-topic documents and the relative importance of each topic to a document of interest. In addition, since most text corpora are inherently temporal, ParallelTopics also depicts the topic evolution over time. We have applied ParallelTopics to exploring and analyzing several text corpora, including the scientific proposals awarded by the National Science Foundation and the publications in the VAST community over the years. To demonstrate the efficacy of ParallelTopics, we conducted several expert evaluations, the results of which are reported in this paper.
Conference Paper
The proliferation of digitally available textual data necessitates automatic tools for analyzing large textual collections. Thus, in analogy to data mining for structured databases, text mining is defined for textual collections. A central tool in text mining is the analysis of concept relationship, which discovers connections between different concepts, as reflected in the corpus. Most previous work on text mining in general, and concept relationship in particular, viewed the entire corpus as one monolithic entity. However, large corpuses are often composed of documents with different characteristics. Most importantly, documents are often tagged with timestamps (e.g. news articles), and thus represent the state of the domain in different time periods. In this paper we introduce a new technique for analyzing and visualizing differences and similarities in the concept relationships, as they are reflected in different segments of the corpus. Focusing on the case of timestamped documents, we introduce Trend Graphs, which provide a graphical tool for analyzing and visualizing the dynamic changes in concept relationships over time. Trend Graphs thus provide a tool for tracking the evaluation of the corpus over time, highlighting trends and discontinuities.
Article
Digital libraries in their current form are bounded by their ineffcient webpage-based user interface paradigm, and even the most knowledgeable researchers can get lost in the large amount of published material available. A paper and its immediate references are displayed on a single webpage. Unfortunately, it is not readily apparent where the paper belongs in the greater context of a research field. Our goal was to develop, test, and iteratively improve the effectiveness of a web-based application, PaperCube, designed to augment and enhance a researcher's interaction with digital libraries through the use of interactive visualizations of bibliographic meta data. The visualizations set out to show different levels of detail and aspects of a paper's citation network. A primary concern was to ensure that while switching visualizations, the viewed paper or author remained focused at all times. PaperCube was validated through a user study that showed that it helped users who are trying to explore a field of research. In particular, PaperCube was shown to reduce the "cognitive load" put on a researcher and aid the "discoverability" of new research material.
Article
Reviewing literatures for a certain research field is always important for academics. One could use Googlelike information seeking tools, but oftentimes he/she would end up obtaining too many possibly related papers, as well as the papers in the associated citation network. During such a process, a user may easily get lost after following a few links for searching or cross-referencing. It is also difficult for the user to identify relevant/important papers from the resulting huge collection of papers. Our work, called PaperVis, endeavors to provide a user-friendly interface to help users quickly grasp the intrinsic complex citation-reference structures among a specific group of papers. We modify the existing Radial Space Filling (RSF) and Bullseye View techniques to arrange involved papers as a node-link graph that better depicts the relationships among them while saving the screen space at the same time. PaperVis applies visual cues to present node attributes and their transitions among interactions, and it categorizes papers into semantically meaningful hierarchies to facilitate ensuing literature exploration. We conduct experiments on the InfoVis 2004 Contest Dataset to demonstrate the effectiveness of PaperVis.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Conference Paper
Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientific literature. In this paper, we study a particular TTM task -- discovering and summarizing the evolutionary patterns of themes in a text stream. We define this new text mining problem and present general probabilistic methods for solving this problem through (1) discovering latent themes from text; (2) constructing an evolution graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods on two different domains (i.e., news articles and literature) shows that the proposed methods can discover interesting evolutionary theme patterns effectively.
Article
The ThemeRiver visualization depicts thematic variations over time within a large collection of documents. The thematic changes are shown in the context of a time-line and corresponding external events. The focus on temporal thematic change within a context framework allows a user to discern patterns that suggest relationships or trends. For example, the sudden change of thematic strength following an external event may indicate a causal relationship. Such patterns are not readily accessible in other visualizations of the data. We use a river metaphor to convey several key notions. The document collection's time-line, selected thematic content and thematic strength are indicated by the river's directed flow, composition and changing width, respectively. The directed flow from left to right is interpreted as movement through time and the horizontal distance between two points on the river defines a time interval. At any point in time, the vertical distance, or width, of the river indicates the collective strength of the selected themes. Colored "currents" flowing within the river represent individual themes. A current's vertical width narrows or broadens to indicate decreases or increases in the strength of the individual theme
Article
We describe a system we developed for identifying trends in text documents collected over a period of time. Trends can be used, for example, to discover that a company is shifting interests from one domain to another. Our system uses several data mining techniques in novel ways and demonstrates a method in which to visualize the trends. We also give experiences from applying this system to the IBM Patent Server, a database of U.S. patents. Introduction We address the problem of discovering trends in text databases. We are given a database D of documents. Each document consists of one or more text fields and a timestamp. The unit of text is a word and a phrase is a list of words. (We defer the discussion of more complex structures till the "Methodology" section.) Associated with each phrase is a history of the frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps. The frequency of occurrence in a particular time period is the number o...
Die Mercator-Projektion
  • R Buchholz
  • W Krücken
R. Buchholz and W. Krücken. Die Mercator-Projektion: zu Ehren von Gerhard Mercator (1512 -1594). Becker, 1994.
Citation Map Visualizing Citation Data in the Web of Science
  • T Matthews
T. Matthews. Citation Map Visualizing Citation Data in the Web of Science. Thomson Reuters, 2010.
Understanding research trends in conferences using paperLens, CHI '05 Extended Abstracts on Human Factors in Computing Systems
  • Mary Bongshin Lee
  • George Czerwinski
  • Benjamin B Robertson
  • Bederson
Towards automated reputation and brand monitoring on the web. InMining for Strategic Competitive Intelligence
  • C.-N Ziegler
  • M Skubacz
Detecting and visualizing emerging trends and transient patterns in scientific literature
  • Chaomei Chen
  • Ii Citespace
  • R Baeza-Yates
  • B Ribeiro-Neto
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, 2nd edition, 2010.
Knowledge Discovery in Textual Databases (KDT)
  • R Feldman
  • I Dagan
R. Feldman and I. Dagan. Knowledge Discovery in Textual Databases (KDT). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 1995.
Visualization of news articles
  • M Grobelnik
  • D Mladenic
M. Grobelnik and D. Mladenic. Visualization of news articles. In SIKDD 2004 at multiconference IS 2004, 2004.
Discovering trends in text databases
  • B Lent
  • R Agrawal
  • R Srikant
B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proceedings of KDDŠ97, 1997.