Conference Paper

An Effective Clustering Approach to Stock Market Prediction.

Department of Information Management, National Taiwan University, Taipei, Taiwan, ROC; Department of Information Management, Ming-Chih Lin, National Taiwan University, Taipei, Taiwan, ROC; Department of Information Management, Rung-Tai Kao, National Taiwan University, Taipei, Taiwan, ROC; Department of Accounting, National Taiwan University, Taipei, Taiwan
Conference: Pacific Asia Conference on Information Systems, PACIS 2010, Taipei, Taiwan, 9-12 July 2010
Source: DBLP

ABSTRACT In this paper, we propose an effective clustering method, HRK (Hierarchical agglomerative and Recursive K-means clustering), to predict the short-term stock price movements after the release of financial reports. The proposed method consists of three phases. First, we convert each financial report into a feature vector and use the hierarchical agglomerative clustering method to divide the converted feature vectors into clusters. Second, for each cluster, we recursively apply the K-means clustering method to partition each cluster into sub-clusters so that most feature vectors in each sub- cluster belong to the same class. Then, for each sub-cluster, we choose its centroid as the representative feature vector. Finally, we employ the representative feature vectors to predict the stock price movements. The experimental results show the proposed method outperforms SVM in terms of accuracy and average profits.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hierarchical and k-means clustering are two major analytical tools for unsupervised microarray datasets. However, both have their innate disadvantages. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. Also, as clusters grow in size, the actual expression patterns become less relevant. K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly: in addition, it is sensitive to outliers. We present a novel hybrid approach to combined merits of the two and discard disadvantages we mentioned above. It is different from existed method: carry out hierarchical clustering first to decide location and number of clusters in the first round and run the K-means clustering in another round. The brief idea is we cluster around half data through hierarchical clustering and succeed by K-means for the rest half in one single round. Also, our approach provides a mechanism to handle outliers. Comparing with existed hybrid clustering approach and K-means clustering in 2 different distance measure on Eisen's yeast microarray data, our method always generate much higher quality clusters.
    Fourth International IEEE Computer Society Computational Systems Bioinformatics Conference Workshops & Poster Abstracts (CSB 2005 Workshops), 8-11 August 2005, Stanford, CA, USA; 01/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Previous research indicates that the narration disclosure in company annual reports can be used to assist in assessing the company's short-term financial prospects. However, not much effort has been made to systematically and automatically assess the predictive potential of such reports using text classification, information retrieval, and machine learning techniques. In this study, we built SVM-based predictive models with different feature selection methods from ten years of annual reports of 30 companies. We used feature selection methods to reduce the term space and studied the class-related vocabulary. Evaluation of predictive accuracy is performed with cross validation and t-test significance tests. We compare different models' performance and analyze misclassification rates by year and by industry. We identify the strengths and weaknesses of each model. Our results support the feasibility of automatically predicting next-year company financial performance from the current year's report. We suggest text features can be further studied to understand their roles as indicators of company's future performance. This research paves the way for large-scale automatic analysis of the relationship between annual reports and short-term performance, as well as the identification of interesting signals within annual reports.
    Proceedings of the American Society for Information Science and Technology 01/2006;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Our research examines a predictive machine learning approach for financial news articles analysis using several different textual representations: Bag of Words, Noun Phrases, and Named Entities. Through this approach, we investigated 9,211 financial news articles and 10,259,042 stock quotes covering the S&P 500 stocks during a five week period. We applied our analysis to estimate a discrete stock price twenty minutes after a news article was released. Using a Support Vector Machine (SVM) derivative specially tailored for discrete numeric prediction and models containing different stock-specific variables, we show that the model containing both article terms and stock price at the time of article release had the best performance in closeness to the actual future stock price (MSE 0.04261), the same direction of price movement as the future price (57.1% directional accuracy) and the highest return using a simulated trading engine (2.06% return). We further investigated the different textual representations and found that a Proper Noun scheme performs better than the de facto standard of Bag of Words in all three metrics.
    ACM Trans. Inf. Syst. 01/2009; 27.

Full-text (2 Sources)

Available from