Article

Integrated visual analysis of patterns in time series and text data - Workflow and application to financial data analysis

Authors:
  • University of Oklahoma
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which are significantly connected in time to quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a priori method. First, based on heuristics, we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a priori method supports the discovery of such sequential temporal patterns. Then, various text features such as the degree of sentence nesting, noun phrase complexity, and the vocabulary richness, are extracted from the news items to obtain meta-patterns. Meta-patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time, cluster, and sequence visualization and analysis functionality. We provide a case study and an evaluation on financial data where we identify important future work. The workflow could be generalized to other application domains such as data analysis of smart grids, cyber physical systems, or the security of critical infrastructure, where the data consist of a combination of quantitative and textual time series data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When different measurement items measure the same variable, higher reliability indicates that the observations do not change due to changes in form or time and are fairly stable [28]. In this paper, SPSS 17.0 was used to test the internal reliability of the variables, which yielded the results of Cronbach's  values of the measured items, corrected item-total correlations and the item removed Cronbach's  values for each variable, as shown in Table 3. ...
Article
Full-text available
In this paper, a Gaussian mixture network distribution finance has been carried out to assess the risk, which is used as a risk assessment tool for the visual platform of higher vocational financial education. Financial data is quantified and determined by determining the cumulative expected loss amount to establish the financial investment risk assessment function. The Activiti open-source workflow engine was utilized to remove complex financial data and configure the K-line as the platform’s data visualization tool. Finally, the financial education visualization platform was used to analyze the Gaussian distribution and K-line data of X stock, which verified the practicality of the platform, and the effectiveness of the platform was verified by taking the students of H higher vocational college as the sample of the teaching experiment. The results show that the influence coefficient of the platform teaching on the quality of the course is 0.856, and the influence coefficient on the learning interest is 0.887, which indicates that the visual platform teaching makes students interested and strengthens their cognitive level. The visual digital reform of teaching finance majors in colleges and universities is provided with a new reference direction by this paper.
... en, based on the financial time series forecasting model combining fuzzy neural network and GARCH, a specific mixed model is given for the data of the Shanghai Stock Exchange Index [13]. Yao uses Elman-NN to predict the stock composite index forecasting [14]. ...
Article
Full-text available
A difficult area of study is financial time series forecasting. The development of time series and FNN are introduced in this paper, which also conducts a thorough study of local financial time series prediction. The next step is to propose and build a local forecasting model based on FNN for financial time series. The pseudo-inverse of the matrix is updated using ridge regression in this study in order to update the network parameters. This paper provides the corresponding incremental algorithm to update the network parameters as training input data or fuzzy rules increase, avoiding the need for parameter retraining. This paper employs MATLAB for simulation and comparative analysis in order to validate the viability and reliability of this approach. 96.31 percent is the predicted accuracy according to simulation results, which is 9.84 percent better than the predicted accuracy of the conventional NN algorithm. In terms of predicting financial time series, the model put forth in this paper performs better. The performance of the financial time series prediction model is further enhanced, making up for the shortcomings of the earlier research. Additionally, it contributes to related research in the area of financial time series prediction.
... In addition to anomalous activity detection, financial data visualization is also used for other tasks. A common task is the analysis of stock data which contains time series of share prices of companies over a long time, including trend, pattern, performance, and predictive analysis [11,25]. Keim et al. [4] used value cells within bar charts to represent business metrics to assist analysts in identifying specific areas. ...
... Colors plus the additional labels allow the identification of the events. This representation can be modified to use different symbols for the events or even glyphs which allows visualizing additional information for an event [89] ( Figure 15b). In case of a larger alphabet, the visual representation does not scale well regarding understandability [50]. ...
... Work on integrating temporal aspects has generally focused on approaches such as the SparkClouds [15] or Fish-Eye Clouds [23], where additional temporal information is integrated into the basic word-cloud layout. Alternatives have also used word-clouds as part of a more complex visual interface [24] or developed alternatives for comparing changes between two time-slices [6]. However, none of these address the requirement for adapting the actual word-cloud content over time. ...
Chapter
Full-text available
Word-clouds are a useful tool for providing overviews over texts, visualising relevant words. Multiple word-clouds can also be used to visualise changes over time in a text. This requires that the words in the individual word-clouds have stable positions, as otherwise it is very difficult so see what changed between two consecutive word-clouds. Existing approaches have used coordinated positioning algorithms, which do not allow for their use in an online, dynamic context. In this paper we present a fast word-cloud algorithm that uses word orthogonality to determine which words can share the same space in the word-clouds combined with a simple, but fast spiral-based layout algorithm. The evaluation shows that the algorithm achieves its goal of creating series of word-clouds fast enough to enable use in an online, dynamic context.
... Two step approach of analyzing of time series financial data is presented in [8]. The different combination of parameters of Time series data was examined and analyzed for pattern identification. ...
... Colors plus the additional labels allow the identification of the events. This representation can be modified to use different symbols for the events or even glyphs which allows visualizing additional information for an event [89] ( Figure 15b). In case of a larger alphabet, the visual representation does not scale well regarding understandability [50]. ...
Chapter
This chapter surveys visualization techniques for frequent itemsets, association rules, and sequential patterns. The human is crucial in the process of identifying interesting patterns and thus, mining such patterns and visualizing them is important for the decision making. The complementary feedback loop that a user may use to refine parameters through inspecting the current mining results is broadly described as visual analytics. This survey identifies visual designs for patterns of each category and analyzes and compares their strengths and weaknesses systematically. The comparison and overview help decision-makers selecting the appropriate technique for their tasks and systems while knowing about their limitations.
... 4 As opposed to fully automated black boxes that only require data. 5 For prior works on stock market visualization and related literature, see, e.g., [Chang et al, 2007], [Csallner et al, 2003], [Dao et al, 2008], [Deboeck, 1997a,b], [Dwyer & Eades, 2002], [Eklund et al, 2003], [Huang et al, 2009], [Ingle & Deshmukh, 2017], [Jungmeister & Turo, 1992], [Keim et al, 2006], [Korczak & Łuszczyk, 2011], [Lin et al, 2005], [Ma, 2009], [Marghescu, 2007], [Novikova & Kotenko, 2014], [NYT, 2011], [Parrish, 2000], [Rehan et al, 2013], [Roberts, 2004], [Schreck et al, 2007] , [SEC, 2014], [Šimunić, 2003], , [Vande Moere & Lau, 2007], [ Wang & Han, 2015], [Wanner et al, 2016], [Wattenberg, 1999], [Ziegler et al, 2010]. 6 Apart from aggressive taxi drivers, who are still rather innocuous compared with drivers in some other big cities. ...
Article
We provide complete source code for a front-end GUI and its back-end counterpart for a stock market visualization tool. It is built based on the "functional visualization" concept we discuss, whereby functionality is not sacrificed for fancy graphics. The GUI, among other things, displays a color-coded signal (computed by the back-end code) based on how "out-of-whack" each stock is trading compared with its peers ("mean-reversion"), and the most sizable changes in the signal ("momentum"). The GUI also allows to efficiently filter/tier stocks by various parameters (e.g., sector, exchange, signal, liquidity, market cap) and functionally display them. The tool can be run as a web-based or local application.
... However, the objective is presenting relationship between "events", where an event is defined by the values of many different time series at that instant. In [10], the authors adopted self-organizing map (which can be seen as an alternative to MDS) to project subsequences of stock price time series to 2D plane. Because they only focused on stock price time series, they use a subsequence selection (or interest point detection) algorithm designed specifically for the domain and unlikely to generalize. ...
Article
Consumer demand estimation is a key step in real-time drinking water system (DWS) modeling used for demand forecasting, optimal operations, and water quality management. Consumer nodes in a DWS are generally clustered to reduce the number of unknown demands to be estimated from a limited number of measurement locations. A clustering methodology using the self-organizing map (SOM) is presented, which groups consumer nodes based on sensitivity of measurements to perturbations in the consumer demands and through the use of exogenous consumer information representative of, for example, socioeconomic information. The SOM algorithm not only developed demand clusters, but also provided intuitive visualization of the high-dimensional sensitivity space, which can provide important visual clues about the clustering problem such as the maximum number of clusters that can reasonably be formed and sharpness of the clusters. When applied to an example network, the sensitivity-based SOM clusters improved the performance in representing the observed measurements and demand estimate uncertainty, but reduced the performance in representing the overall network hydraulics relative to the actual clusters. Incorporating exogenous information about the actual clusters demonstrated the potential for providing trade-offs between representing the limited observed hydraulic information and the overall network hydraulics. The results from the SOM algorithm clearly demonstrate a need for clustering approaches that incorporate network-specific information (e.g., measurement locations, sensitivity information, and exogenous data) to develop demand estimates that are capable of representing observed information while adequately capturing overall system dynamics.
Article
Media data has been the subject of large scale analysis with applications of text mining being used to provide overviews of media themes and information flows. Such information extracted from media articles has also shown its contextual value of being integrated with other data, such as criminal records and stock market pricing. In this work, we explore linking textual media data with curated secondary textual data sources through user-guided semantic lexical matching for identifying relationships and data links. In this manner, critical information can be identified and used to annotate media timelines in order to provide a more detailed overview of events that may be driving media topics and frames. These linked events are further analyzed through an application of causality modeling to model temporal drivers between the data series. Such causal links are then annotated through automatic entity extraction which enables the analyst to explore persons, locations, and organizations that may be pertinent to the media topic of interest. To demonstrate the proposed framework, two media datasets and an armed conflict event dataset are explored.
Article
With the development of social media (e.g. Twitter, Flickr, Foursquare, Sina Weibo, etc.), a large number of people are now using them and post microblogs, messages and multi-media information. The everyday usage of social media results in big open social media data. The data offer fruitful information and reflect social behaviors of people. There is much visualization and visual analytics research on such data. We collect state-of-the-art research and put it into three main categories: social network, spatial temporal information and text analysis. We further summarize the visual analytics pipeline for the social media, combining the above categories and supporting complex tasks. With these techniques, social media analytics can apply to multiple disciplines. We summarize the applications and public tools to further investigate the challenges and trends.
Article
Full-text available
Market participants and businesses have made tremendous efforts to make the best decisions in a timely manner under varying economic and business circumstances. As such, decision-making processes based on Financial data have been a popular topic in industries. However, analyzing Financial data is a non-trivial task due to large volume, diversity and complexity, and this has led to rapid research and development of visualizations and visual analytics systems for Financial data exploration. Often, the development of such systems requires researchers to collaborate with Financial domain experts to better extract requirements and challenges in their tasks. Work to systematically study and gather the task requirements and to acquire an overview of existing visualizations and visual analytics systems that have been applied in Financial domains with respect to real-world data sets has not been completed. To this end, we perform a comprehensive survey of visualizations and visual analytics. In this work, we categorize Financial systems in terms of data sources, applied automated techniques, visualization techniques, interaction, and evaluation methods. For the categorization and characterization, we utilize existing taxonomies of visualization and interaction. In addition, we present task requirements extracted from interviews with domain experts in order to help researchers design better systems with detailed goals.
Article
Patterns of words used in different text collections can characterize interesting properties of a corpus. However, these patterns are challenging to explore as they often involve complex relationships across many words and collections in a large space of words. In this paper, we propose a configurable colorfield design to aid this exploration. Our approach uses a dense colorfield overview to present large amounts of data in ways that make patterns perceptible. It allows flexible configuration of both data mappings and aggregations to expose different kinds of patterns, and provides interactions to help connect detailed patterns to the corpus overview. TextDNA, our prototype implementation, leverages the GPU to provide interactivity in the web browser even on large corpora. We present five case studies showing how the tool supports inquiry in corpora ranging in size from single document to millions of books. Our work shows how to make a configurable colorfield approach practical for a range of analytic tasks.
Article
Full-text available
A bandwidth limiting neutron chopper prototype for spallation-neutron facilities is introduced. The content includes the structure of the chopper, design specifications, neutron absorber construction and motor phase control. The investigation shows that the strength of Boron Carbide + Resin composite covered on an Aluminum alloy disk is high enough when the chopper rotor runs at a speed of 3000 rpm. A general purpose PM servo motor was equipped for the chopper. High accuracy phase tracking performance was achieved for the chopper based on a dual-loop motor control structure including a velocity loop and a phase loop. Torque disturbance was well restrained by the velocity control loop at a high sampling rate with help of a high resolution encoder in the motor. With the stable inner loop, the phase loop could work at a low sampling frequency equal to the rotation speed. Thus the rotor was synchronous with an inner reference pulse within less than 10 s. Related experimental results about the chopper are provided.
Article
Full-text available
This paper describes early work trying to predict stock market indicators such as Dow Jones, NASDAQ and S&P 500 by analyzing Twitter posts. We collected the twitter feeds for six months and got a randomized subsample of about one hundredth of the full volume of all tweets. We measured collective hope and fear on each day and analyzed the correlation between these indices and the stock market indicators. We found that emotional tweet percentage significantly negatively correlated with Dow Jones, NASDAQ and S&P 500, but displayed significant positive correlation to VIX. It therefore seems that just checking on twitter for emotional outbursts of any kind gives a predictor of how the stock market will be doing the next day.
Conference Paper
Full-text available
Stock market prediction is an attractive research problem to be investigated. News contents are one of the most important factors that have influence on market. Considering the news impact in analyzing the stock market behavior, leads to more precise predictions and as a result more profitable trades. So far various prototypes have been developed which consider the impact of news in stock market prediction. In this paper, the main components of such forecasting systems have been introduced. In addition, different developed prototypes have been introduced and the way whereby the main components are implemented compared. Based on studied attempts, the potential future research activities have been suggested.
Conference Paper
Full-text available
Both sequential pattern mining and temporal pattern mining have become highly relevant data mining topics in this decade. In 2009, Wu and Chen proposed a representation for hybrid events and an HTPM mining method. However, their approach neither addresses nor analyzes the length of event time. An event representation may stand for the same event with extremely different time lengths, which may induces the loss of accurate mining results. This paper addresses this difficulty and explores different models and solutions. Firstly, this paper introduces the concept of the time grain, and proposes new hybrid models as well as the pattern mining algorithms associated with the concept of event length limit. Events in hybrid sequences are divided or distinguished according to a given threshold, to enable a detailed exploration of the more frequent hybrid sequence of events. Secondly, this paper utilizes the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) as the testing data, to examine the proposed model and the feasibility and effectiveness of the algorithm.
Article
Full-text available
Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: Bag of Words, Noun Phrases, and Named Entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a Support Vector Machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.
Article
Full-text available
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Article
Full-text available
We study the relation between the number of news announcements reported daily by Dow Jones & Company and aggregate measures of securities market activity including trading volume and market returns. We find that the number of Dow Jones announcements and market activity are directly related and that the results are robust to the addition of factors previously found to influence financial markets such as day-of-the-week dummy variables, news importance as proxies by large 'New York Times' headlines and major macroeconomic announcements, and non-information sources of market activity as measured by dividend capture and triple switching rating. However, the observed relation between news and market activity is not particularly strong and the patterns in news announcements do not explain the day-of-the-week seasonalities in market activity. Our analysis of the Dow Jones database confirms the difficulty of linking volume and volatility to observed measures of information. Copyright 1994 by American Finance Association.
Article
Full-text available
A multiple view system uses two or more distinct views to support the investigation of a single conceptual entity. Many such systems exist, ranging from computer-aided design (CAD) systems for chip design that display both the logical structure and the actual geometry of the integrated circuit to overview-plus-detail systems that show both an overview for context and a zoomed-in-view for detail. Designers of these systems must make a variety of design decisions, ranging from determining layout to constructing sophisticated coordination mechanisms. Surprisingly, little work has been done to characterize these systems or to express guidelines for their design. Based on a workshop discussion of multiple views, and based on our own design and implementation experience with these systems, we present eight guidelines for the design of multiple view systems. Keywords Multiple views, information visualization, design guidelines, usability heuristics, user interfaces INTRODUCTION Multiple v...
Article
The study of movement data is an important task in a variety of domains such as transportation, biology, or finance. Often, the data objects are grouped (e. g. countries by continents). We distinguish three main categories of movement data analysis, based on the focus of the analysis: (a) movement characteristics of an individual in the context of its group, (b) the dynamics of a given group, and (c) the comparison of the behavior of multiple groups. Examination of group movement data can be effectively supported by data analysis and visualization. In this respect, approaches based on analysis of derived movement characteristics (called features in this article) can be useful. However, current approaches are limited as they do not cover a broad range of situations and typically require manual feature monitoring. We present an enhanced set of movement analysis features and add automatic analysis of the features for filtering the interesting parts in large movement data sets. Using this approach, users can easily detect new interesting characteristics such as outliers, trends, and task-dependent data patterns even in large sets of data points over long time horizons. We demonstrate the usefulness with two real-world data sets from the socioeconomic and the financial domains.
Conference Paper
Visual analysis of time series data is an important, yet challenging task with many application examples in fields such as financial or news stream data analysis. Many visual time series analysis approaches consider a global perspective on the time series. Fewer approaches consider visual analysis of local patterns in time series, and often rely on interactive specification of the local area of interest. We present initial results of an approach that is based on automatic detection of local interest points. We follow an overview-first approach to find useful parameters for the interest point detection, and details-on-demand to relate the found patterns. We present initial results and detail possible extensions of the approach.
Article
Can the choice of words and tone used by the authors of financial news articles correlate to measurable stock price movements? If so, can the magnitude of price movement be predicted using these same variables? We investigate these questions using the Arizona Financial Text (AZFinText) system, a financial news article prediction system, and pair it with a sentiment analysis tool. Through our analysis, we found that subjective news articles were easier to predict in price direction (59.0% versus 50.0% of chance alone) and using a simple trading engine, subjective articles garnered a 3.30% return. Looking further into the role of author tone in financial news articles, we found that articles with a negative sentiment were easiest to predict in price direction (50.9% versus 50.0% of chance alone) and a 3.04% trading return. Investigating negative sentiment further, we found that our system was able to predict price decreases in articles of a positive sentiment 53.5% of the time, and price increases in articles of a negative sentiment 52.4% of the time. We believe that perhaps this result can be attributable to market traders behaving in a contrarian manner, e.g., see good news, sell; see bad news, buy.
Article
Studying a large sample of announcements made by US companies I find that announcements containing bad news are longer and less focused on the originating company than good news. The pattern is pervasive and suggests companies attempt to "package" bad news and mitigate its negative impact. The paper also examines how news agencies react to company news. It appears that news agencies step in and cut through the packaging by reporting bad company news in a much more concise and focused way. There are measurable benefits to this kind of "unpackaging" activity - by processing company news and making it more transparent news agencies significantly contribute to the resolution of asymmetric information. This is the first paper to compare the language of company news and associated agency reports and analyze the implications for the information environment.
Article
I analyze company news from Reuters with the 'General Inquirer' and relate measures of positive sentiment, negative sentiment and disagreement to abnormal stock returns, stock and option trading volume, the volatility spread and the CDS spread. I test hypotheses derived from market microstructure models. Consistent with these models, sentiment and disagreement are strongly related to trading volume. Moreover, sentiment and disagreement might be used to predict stock returns, trading volume and volatility. Trading strategies based on positive and negative sentiment are profitable if the transaction costs are moderate, indicating that stock markets are not fully efficient.
Book
From the Publisher: SOMs (Self-Organizing Maps) have proven to be an effective methodology for analyzing problems in finance and economics--including applications such as market analysis, financial statement analysis, prediction of bankruptcies, interest rates, and stock indices. This book covers real-world financial applications of neural networks, using the SOM approach, as well as introducing SOM methodology, software tools, and tips for processing. 106 illus. in color.
Article
Similarities and differences between speech and writing have been the subject of innumerable studies, but until now there has been no attempt to provide a unified linguistic analysis of the whole range of spoken and written registers in English. In this widely acclaimed empirical study, Douglas Biber uses computational techniques to analyse the linguistic characteristics of twenty three spoken and written genres, enabling identification of the basic, underlying dimensions of variation in English. In Variation Across Speech and Writing, six dimensions of variation are identified through a factor analysis, on the basis of linguistic co-occurence patterns. The resulting model of variation provides for the description of the distinctive linguistic characteristics of any spoken or written text andd emonstrates the ways in which the polarization of speech and writing has been misleading, and thus enables reconciliation of the contradictory conclusions reached in previous research.
Article
The self-organizing map (SOM) is an automatic data-analysis method. It is widely applied to clustering problems and data exploration in industry, finance, natural sciences, and linguistics. The most extensive applications, exemplified in this paper, can be found in the management of massive textual databases and in bioinformatics. The SOM is related to the classical vector quantization (VQ), which is used extensively in digital signal processing and transmission. Like in VQ, the SOM represents a distribution of input data items using a finite set of models. In the SOM, however, these models are automatically associated with the nodes of a regular (usually two-dimensional) grid in an orderly fashion such that more similar models become automatically associated with nodes that are adjacent in the grid, whereas less similar models are situated farther away from each other in the grid. This organization, a kind of similarity diagram of the models, makes it possible to obtain an insight into the topographic relationships of data, especially of high-dimensional data items. If the data items belong to certain predetermined classes, the models (and the nodes) can be calibrated according to these classes. An unknown input item is then classified according to that node, the model of which is most similar with it in some metric used in the construction of the SOM. A new finding introduced in this paper is that an input item can even more accurately be represented by a linear mixture of a few best-matching models. This becomes possible by a least-squares fitting procedure where the coefficients in the linear mixture of models are constrained to nonnegative values.
Article
This introduction to the R package beanplot is a (slightly) modied version of Kamp- stra (2008), published in the Journal of Statistical Software. Boxplots and variants thereof are frequently used to compare univariate data. Boxplots have the disadvantage that they are not easy to explain to non-mathematicians, and that some information is not visible. A beanplot is an alternative to the boxplot for visual comparison of univariate data between groups. In a beanplot, the individual observations are shown as small lines in a one-dimensional scatter plot. Next to that, the estimated density of the distributions is visible and the average is shown. It is easy to compare dierent groups of data in a beanplot and to see if a group contains enough observations to make the group interesting from a statistical point of view. Anomalies in the data, such as bimodal distributions and duplicate measurements, are easily spotted in a beanplot. For groups with two subgroups (e.g., male and female), there is a special asymmetric beanplot. For easy usage, an implementation was made in R.
Article
The self-organizing map (SOM) is an efficient tool for visualization of multidimensional numerical data. In this paper, an overview and categorization of both old and new methods for the visualization of SOM is presented. The purpose is to give an idea of what kind of information can be acquired from different presentations and how the SOM can best be utilized in exploratory data visualization. Most of the presented methods can also be applied in the more general case of first making a vector quantization (e.g. k-means) and then a vector projection (e.g. Sammon's mapping).
Article
Previous sequential pattern mining studies have dealt with either point-based event sequences or interval-based event sequences. In some applications, however, event sequences may contain both point-based and interval-based events. These sequences are called hybrid event sequences. Since the relationships among both kinds of events are more diversiform, the information obtained by discovering patterns from these events is more informative. In this study we introduce a hybrid temporal pattern mining problem and develop an algorithm to discover hybrid temporal patterns from hybrid event sequences. We carry out an experiment using both synthetic and real stock price data to compare our algorithm with the traditional algorithms designed exclusively for mining point-based patterns or interval-based patterns. The experimental results indicate that the efficiency of our algorithm is satisfactory. In addition, the experiment also shows that the predicting power of hybrid temporal patterns is higher than that of point-based or interval-based patterns.
Conference Paper
Widespread interest in discovering the features and trends in time-series has generated a need for tools that support interactive exploration. This chapter introduces timeboxes: a powerful direct-manipulation metaphor for the specification of queries over time-series datasets. The TimeSearcher implementation of timeboxes supports the interactive formulation and modification of queries, thus, speeding the process of exploring time series data sets and guiding data mining. TimeSearcher uses dynamic queries, overviews, and other information visualization techniques that have proven useful in a variety of other domains to support the interactive examination of time series data. The incorporation of data mining algorithms into systems that support exploration and interactive knowledge discovery is the next step in making data mining more accessible to a wider range of users and problem domains.
Conference Paper
Aspect-level news browsing provides readers with a classified view of news articles with different viewpoints. It facilitates active interactions with which readers easily discover and compare diverse existing biased views over a news event. As such, it effectively helps readers understand the event from a plural of viewpoints and formulate their own, more balanced viewpoints free from specific biased views. Realizing aspect-level browsing raises important challenges, mainly due to the lack of semantic knowledge with which to abstract and classify the intended salient aspects of articles. We first demonstrate the feasibility of aspect-level news browsing through user studies. We then deeply look into the news article production process and develop framing cycle-aware clustering. The evaluation results show that the developed method performs classification more accurately than other methods.
Conference Paper
Stock market is an important and active part of nowadays financial markets. Addressing the question as to how to model financial information from two sources, we focus on improving the accuracy of a computer aided prediction by combining information hidden in market news and stock prices in this study. Using the multi-kernel learning technique, a system is presented that makes predictions for the Hong Kong stock market by incorporating those two information sources. Experiments were conducted and the results have shown that in both cross validation and independent testing, our system has achieved better directional accuracy than those by the baseline system that is based on single one information source, as well as by the system that integrates information sources in a simple way.
Book
Time is an exceptional dimension that is common to many application domains such as medicine, engineering, business, or science. Due to the distinct characteristics of time, appropriate visual and analytical methods are required to explore and analyze them. This book starts with an introduction to visualization and historical examples of visual representations. At its core, the book presents and discusses a systematic view of the visualization of time-oriented data along three key questions: what is being visualized (data), why something is visualized (user tasks), and how it is presented (visual representation). To support visual exploration, interaction techniques and analytical methods are required that are discussed in separate chapters. A large part of this book is devoted to a structured survey of 101 different visualization techniques as a reference for scientists conducting related research as well as for practitioners seeking information on how their time-oriented data can best be visualized.
Article
Visual-interactive cluster analysis provides valuable tools for ef-fectively analyzing large and complex data sets. Due to desirable properties and an inherent predisposition for visualization, the Ko-honen Feature Map (or Self-Organizing Map, or SOM) algorithm is among the most popular and widely used visual clustering tech-niques. However, the unsupervised nature of the algorithm may be disadvantageous in certain applications. Depending on initial-ization and data characteristics, cluster maps (cluster layouts) may emerge that do not comply with user preferences, expectations, or the application context. Considering SOM-based analysis of trajectory data, we propose a comprehensive visual-interactive monitoring and control frame-work extending the basic SOM algorithm. The framework imple-ments the general Visual Analytics idea to effectively combine au-tomatic data analysis with human expert supervision. It provides simple, yet effective facilities for visually monitoring and interac-tively controlling the trajectory clustering process at arbitrary levels of detail. The approach allows the user to leverage existing do-main knowledge and user preferences, arriving at improved cluster maps. We apply the framework on a trajectory clustering prob-lem, demonstrating its potential in combining both unsupervised (machine) and supervised (human expert) processing, in producing appropriate cluster results.
Article
A ubiquitous type of street-wise visualization tag clouds are presented. Tags clouds are an eclectic bunch spanning a variety of data inputs and usage patterns that defy much of the orthodox wisdom about how visualizations have to work. A tag cloud usually has a particular purpose to present a visual overview of a collection of text. It showed the popularity of various tags using font size. These tags cloud functions as aggregators of activity being carried out by thousand of users and summarizing the action that happens beneath the surface of socially oriented Websites. It allows users to quickly examine a collection of sequential documents. The alphabetical ordering in tag clouds is an important organizing principle because it provides the only way to visually search for specific items in the display. These clouds act as individual and group mirrors, which are fun rather than serious and businesslike. They have become a tool of choice for analysis.
Article
We present a tool that is specifically designed to support a writer in revising a draft version of a document. In addition to showing which paragraphs and sentences are difficult to read and understand, we assist the reader in understanding why this is the case. This requires features that are expressive predictors of readability, and are also semantically understandable. In the first part of the paper, we, therefore, discuss a semiautomatic feature selection approach that is used to choose appropriate measures from a collection of 141 candidate readability features. In the second part, we present the visual analysis tool VisRA, which allows the user to analyze the feature values across the text and within single sentences. Users can choose between different visual representations accounting for differences in the size of the documents and the availability of information about the physical and logical layout of the documents. We put special emphasis on providing as much transparency as possible to ensure that the user can purposefully improve the readability of a sentence. Several case studies are presented that show the wide range of applicability of our tool. Furthermore, an in-depth evaluation assesses the quality of the measure and investigates how well users do in revising a text with the help of the tool.
Article
Behavioral economics tells us that emotions can profoundly affect individual behavior and decision-making. Does this also apply to societies at large, i.e., can societies experience mood states that affect their collective decision making? By extension is the public mood correlated or even predictive of economic indicators? Here we investigate whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time. We analyze the text content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder that measures positive vs. negative mood and Google-Profile of Mood States (GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). We cross-validate the resulting mood time series by comparing their ability to detect the public's response to the presidential election and Thanksgiving day in 2008. A Granger causality analysis and a Self-Organizing Fuzzy Neural Network are then used to investigate the hypothesis that public mood states, as measured by the OpinionFinder and GPOMS mood time series, are predictive of changes in DJIA closing values. Our results indicate that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others. We find an accuracy of 87.6% in predicting the daily up and down changes in the closing values of the DJIA and a reduction of the Mean Average Percentage Error by more than 6%.
Article
Viele Daten sind nur in textueller Form verfügbar. Da Text zu den semi-strukturierten Datentypen gehört und natürliche Sprache sich durch erstaunliche Flexibilität und Komplexität auszeichnet, stellt die Entwicklung von automatischen Methoden eine herausfordernde Aufgabe dar. Der vorliegenden Arbeit liegt ein Framework zur Analyse von Dokumenten(kollektionen) zugrunde, das den gesamten Analyseprozess berücksichtigt. Die zentrale Annahme des Frameworks ist, dass die meisten Analyseaufgaben kein vollständiges Textverständnis erfordern. Stattdessen können ein oder mehrere semantische Aspekte identifiziert werden (genannt quasi-semantische Maße), die relevant für die Bearbeitung einer Analyseaufgabe sind. Das erlaubt es, gezielt nach Kombinationen von (messbaren) Texteigenschaften zu suchen, die in der Lage sind, den spezifischen semantischen Aspekt zu approximieren. Diese Approximation wird dann verwendet, um die Analyseaufgabe maschinell zu bearbeiten oder um Unterstützung durch Visualisierungstechniken anzubieten. Die Doktorarbeit diskutiert das oben genannte Framework theoretisch und präsentiert konkrete Anwendungsbeispiele aus vier verschiedenen Domänen: Literaturanalyse, Lesbarkeitsanalyse, Extraktion von diskriminierenden und überlappenden Termen, sowie Stimmungs- und Meinungsanalyse. Hierbei werden die Vorteile aufgezeigt, die eine Arbeit mit dem Framework mit sich bringt. Ein Schwerpunkt wird darauf gelegt, wo und wie Visualisierungstechniken gewinnbringend im Analyseprozess eingesetzt werden können. Neue Darstellungsarten werden vorgestellt und bewährte Techniken auf ihre Tauglichkeit in diesem Kontext untersucht. Darüber hinaus werden mehrere Beispiele dafür gegeben, wie gute Approximationen von semantischen Aspekten gefunden werden können und wie vorhandene Maße evaluiert und verbessert werden können. Large amounts of data are only available in textual form. However, due to the semi-structured nature of text and the impressive flexibility and complexity of natural language the development of automatic methods for text analysis is a challenging task. The presented work is centered around a framework for analyzing document (collections) that takes the whole document analysis process into account. Central to this framework is the idea that most analysis tasks do not require a full text understanding. Instead, one or several semantic aspects of the text (called quasi-semantic properties) can be identied that are relevant for answering the analysis task. This permits to targetly search for combinations of (measurable) text features that are able to approximate the specific semantic aspect. Those approximations are then used to solve the analysis task computationally or to support the analysis of a document (collection) visually. The thesis discusses the above mentioned framework theoretically and presents concrete application examples in four different domains: literature analysis, readability analysis, the extraction of discriminating and overlap terms, and finally sentiment and opinion analysis. Thereby, the advantages of working with the above mentioned framework are shown. A focus is put on showing where and how visualization techniques can provide valuable support in the document analysis process. Novel visualizations are introduced and common ones are evaluated for their suitability in this context. Furthermore, several examples are given how good approximations of semantic aspects of a document can be found and how given measures can be evaluated and improved.
Conference Paper
An association rule in data mining is an implication of the form X→Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available. This paper presents a novel visualization technique to tackle many of these problems. We apply the technology to a text mining study on large corpora. The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping
Article
A sequential pattern in data mining is a finite series of elements such as A B C D where A, B, C, and D are elements of the same domain. The mining of sequential patterns is designed to find patterns of discrete events that frequently happen in the same arrangement along a timeline. Like association and clustering, the mining of sequential patterns is among the most popular knowledge discovery techniques that apply statistical measures to extract useful information from large datasets. As our computers become more powerful, we are able to mine bigger datasets and obtain hundreds of thousands of sequential patterns in full detail. With this vast amount of data, we argue that neither data mining nor visualization by itself can manage the information and reflect the knowledge effectively. Subsequently, we apply visualization to augment data mining in a study of sequential patterns in large text corpora. The result shows that we can learn more and more quickly in an integrated visual...
  • Schreck T
Mastering the information age-solving problems with visual analytics
  • Keim
  • Da
  • J Kohlhammer
  • G Ellis
Visual cluster analysis of trajectory data with interactive Kohonen maps
  • T Schreck
  • J Bernard
  • T Tekuov
Efficient Market Hypothesis-EMH [Online; accessed 17
  • Investopedia
Investopedia, "Efficient Market Hypothesis-EMH." http://www.investopedia.com/terms/e/ efficientmarkethypothesis.asp (2013). [Online; accessed 17-July-2013].
Visualization of time-oriented data (Human-computer interaction series)
  • W Aigner
  • S Miksch
  • H Schumann
Aigner W, Miksch S, Schumann H, et al. Visualization of time-oriented data (Human-computer interaction series). London: Springer, 2011.
Grammar & Composition
  • About
  • Com
About.com, "Grammar & Composition." http://grammar.about.com/ (2013). [Online; accessed 22-July-2013].
Efficient Market Hypothesis -EMH
  • Investopedia
Investopedia, "Efficient Market Hypothesis -EMH." http://www.investopedia.com/terms/e/ efficientmarkethypothesis.asp (2013). [Online; accessed 17-July-2013].
  • W Aigner
  • S Miksch
  • H Schumann
  • C Tominski
Aigner, W., Miksch, S., Schumann, H., and Tominski, C., [Visualization of Time-Oriented Data ], Human-Computer Interaction Series, Springer (2011).