Di Wu

The Chinese University of Hong Kong, Hong Kong, Hong Kong

Are you Di Wu?

Claim your profile

Publications (8)1.46 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study a problem of detecting priming events based on a time series index and an evolving document stream. We define a priming event as an event which triggers abnormal movements of the time series index, i.e., the Iraq war with respect to the president approval index of President Bush. Existing solutions either focus on organizing coherent keywords from a document stream into events or identifying correlated movements between keyword frequency trajectories and the time series index. In this paper, we tackle the problem in two major steps. (1) We identify the elements that form a priming event. The element identified is called influential topic which consists of a set of coherent keywords. And we extract them by looking at the correlation between keyword trajectories and the interested time series index at a global level. (2) We extract priming events by detecting and organizing the bursty influential topics at a micro level. We evaluate our algorithms on a real-world dataset and the result confirms that our method is able to discover the priming events effectively.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, World Wide Web is full of rich information, including text data, XML data, multimedia data, time series data, etc. The web is usually represented as a large graph and PageRank is computed to rank the importance of web pages. In this paper, we study the problem of ranking evolving time series and discovering leaders from them by analyzing lead-lag relations. A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series. At each time point, we compute the lagged correlation between each pair of time series and model them in a graph. Then, the leadership rank is computed from the graph, which brings order to time series. Based on the leadership ranking, the leaders of time series are extracted. However, the problem poses great challenges since the dynamic nature of time series results in a highly evolving graph, in which the relationships between time series are modeled. We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy. Our experiments on real weather science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both weather monitoring and financial risk control. KeywordsPageRank–time series–lagged correlation–leadership rank–incremental correlation update
    World Wide Web 01/2011; 14(1):1-25. · 1.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing the relationships of time series is an important problem for many applications, including climate monitoring, stock investment, traffic control, etc. Existing research mainly focuses on studying the relationship between a pair of time series. In this paper, we study the problem of discovering leaders among a set of time series by analyzing lead-lag relations. A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series. At each time point, we compute the lagged correlation between each pair of time series and model them in a graph. Then, the leadership rank is computed from the graph, which brings order to time series. Based on the leadership ranking, the leaders of time series are extracted. However, the problem poses great challenges as time goes by, since the dynamic nature of time series results in highly evolving relationships between time series. We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy. Our experiments on real climate science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both climate monitoring and financial risk control.
    Database Systems for Advanced Applications, 15th International Conference, DASFAA 2010, Tsukuba, Japan, April 1-4, 2010, Proceedings, Part I; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the fast delivery of news articles by news providers on the Internet and/or via news datafeeds, it becomes an important research issue of predicting the risk of stocks by utilizing such textual information available in addition to the time series information. In the literature, the issue of predicting stock price up/down trend based on news articles has been stud- ied. In this paper, we study a new problem which is to predict the risk of stocks by their corresponding news of companies. We discuss the unique challenges of volatility prediction, volatility ranking and volatility index construction. A new feature selection approach is proposed to select bursty volatility features. Such selected features can accurately represent/simulate volatility bursts. A volatility prediction method is then proposed based on random walk by considering both the direct impacts of bursty volatility features on the stocks and the propagated impacts through correlation between stocks. Finally, we construct a volatility index, called VN-index, which is a time se- ries of predicted stock volatility. Moreover, stocks are ranked based on the predicted volatility values. Such information provides investors with knowledge on how widely a stock price is dispersed from the av- erage, as an important indicator of stock risks in a stock market. We conducted extensive experimental studies using real datasets and report our flndings in this paper.
    Database Technologies 2010, Twenty-First Australasian Database Conference (ADC 2010), Brisbane, Australia, 18-22 January, 2010, Proceedings; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: There are many real applications existing where the decision making process depends on a model that is built by collecting information from different data sources. Let us take the stock market as an example. The decision making process depends on a model which that is influenced by factors such as stock prices, exchange volumes, market indices (e.g. Dow Jones Index), news articles, and government announcements (e.g., the increase of stamp duty). Yet Nevertheless, modeling the stock market is a challenging task because (1) the process related to market states (rise state/drop state) is a stochastic process, which is hard to capture using the deterministic approach, and (2) the market state is invisible but will be influenced by the visible market information, like stock prices and news articles. In this paper, we propose an approach to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM). It takes both stock prices and news articles into consideration when it is being computed. A unique feature of our approach is event driven. We identify associated events for a specific stock using a set of bursty features (keywords), which has a significant impact on the stock price changes when building the NHMM. We apply the model to predict the trend of future stock prices and the encouraging results indicate our proposed approach is practically sound and highly effective.
    Frontiers of Computer Science in China 01/2009; 3:145-157. · 0.27 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In many real world applications, decisions are usually made by collecting and judging information from multiple different data sources. Let us take the stock market as an example. We never make our decision based on just one single piece of advice, but always rely on a collection of information, such as the stock price movements, exchange volumes, market index, as well as the information from the news articles, expert comments and special announcements (e.g., the increase of stamp duty). Yet, modeling the stock market is difficult because: (1) The process related to market states (up and down) is a stochastic process, which is hard to capture by using the deterministic approach; and (2) The market state is invisible but will be influenced by the visible market information, such as stock prices and news articles. In this paper, we try to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM) which takes multiple sources of information into account when making a future prediction. Our model contains three major elements: (1) External event, which denotes the events happening within the stock market (e.g., the drop of US interest rate); (2) Observed market state, which denotes the current market status (e.g. the rise in the stock price); and (3) Hidden market state, which conceptually exists but is invisible to the market participants. Specifically, we model the external events by using the information contained in the news articles, and model the observed market state by using the historical stock prices. Base on these two pieces of observable information and the previous hidden market state, we aim to identify the current hidden market state, so as to predict the immediate market movement. Extensive experiments were conducted to evaluate our work. The encouraging results indicate that our proposed approach is practically sound and effective.
    08/2008: pages 77-89;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new model, called co-movement model, for constructing financial portfolios by analyzing and mining the co-movement patterns among multiple time series. Unlike the existing approaches where the portfolios’ expected risks are computed based on the co-variances among the assets in the portfolios, we model their risks by considering the co-movement patterns of the time series. For example, given two financial assets, A and B, where we know that whenever the price of A drops, the price of B will drop, and vice versa. Intuitively, it may not be appropriate to construct a portfolio by including both A and B concurrently, as the exposure of loss will be increased. Yet, such kind of relationship can not always be captured by co-variance(i.e traditional statistics). Apart from manipulating the risk, our proposed co-movement model also alters the computation of the portfolio’s expected return out of the traditional perspective. Existing approaches for computing the portfolio’s expected return are to combine the expected return of each individual asset and its contribution in the portfolio linearly. This formulation ignores the dependence relationship among assets. In contrast, our co-movement model would capture all dependence relationships. This can mimic the real life situation much better than the traditional approach. Extensive experiments are conducted to evaluate the effectiveness of our proposed model. The favorable experimental results indicate that the co-movement model is highly effective and feasible.
    Progress in WWW Research and Development, 10th Asia-Pacific Web Conference, APWeb 2008, Shenyang, China, April 26-28, 2008. Proceedings; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of detecting the shape anomalies in this paper. Our shape anomaly detection algorithm is performed on the one-dimensional representation (time series) of shapes, whose similarity is modeled by a generalized segmental hidden Markov model (HMM) under a scaling, translation and rotation invariant manner. Experimental results show that our proposed approach can find shape anomalies in a large collection of shapes effectively and efficiently.
    Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, México; 01/2008