Shih-Wen Ke’s research while affiliated with University of Sunderland and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


PERC: A personal email classifier
  • Conference Paper
  • Full-text available

April 2006

·

54 Reads

·

14 Citations

Lecture Notes in Computer Science

Shih-Wen Ke

·

·

Improving the accuracy of assigning new email messages to small folders can reduce the likelihood of users creating duplicate folders for some topics. In this paper we presented a hybrid classification model, PERC, and use the Enron Email Corpus to investigate the performance of kNN, SVM and PERC in a simulation of a real­time situation. Our results show that PERC is significantly better at assigning messages to small folders. The effects of dif­ ferent parameter settings for the classifiers are discussed.

Download

Mining Personal Data Collections to Discover Categories and Category Labels.

January 2005

·

18 Reads

·

2 Citations

This paper describes the text mining of personal document collections in order to learn the categories of the documents in the collection, and to assign a suitable text label to each category. In the first experiment we make use of a pre­ classified collection of documents from which we extract a text label for each category. In the second experiment we use the k­means clustering algorithm to automatically learn which categories are present in a collection, then generate short text labels to describe each category. The technique, which is essentially a form of automatic indexing, can be used whenever class exemplars are generated as part of the document clustering process.

Citations (2)


... When it comes to existing automations, automatic classification is one that is widely offered in email clients. Many email clients automatically classify and prioritize emails using machine learning techniques [25,29,57]. Beyond spam classification, clients like Gmail offer additional classifications by default such as "Social" or "Promotions". ...

Reference:

Opportunities for Automating Email Processing: A Need-Finding Study
PERC: A personal email classifier

Lecture Notes in Computer Science

... For instance, Maqbool and Babri (2006) used the TF.IDF (Term Frequency Inverse Document Frequency) measure to identify the words extracted from comments in computer code which best characterized clusters of code found by hierarchical agglomerative clustering. Similarly, Ke et al. (2005) selected words from email messages using TF.IDF to characterise clusters of emails found by k-means clustering. The TF.IDF weighting W for a particular label i (such as a word) with respect to a particular cluster j is given by the formula where In this paper, we use an alternative technique based on the chi-squared measure for finding the vocabulary associated with clusters, which is better suited for labeling small numbers of clusters (Manning, Raghavan, & Schütze, 2009). ...

Mining Personal Data Collections to Discover Categories and Category Labels.