Archived project

PhD in RMIT University, Melbourne, Australia

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
0
Reads
1 new
9

Project log

Wahyu Catur Wibowo
added 2 research items
Categorisation of digital documents is useful for organisation and retrieval. While document categories can be a set of unstructured category labels, some document categories are hierarchically structured. This paper investigates automatic hierarchical categorisation and, specifically, the role of features in the development of more effective categorisers. We show that a good hierarchical machine learning-based categoriser can be developed using small numbers of features from pre-categorised training documents. Overall, we show that by using a few terms, categorisation accuracy can be improved substantially: unstructured leaf level categorisation can be improved by up to 8.6\%, while top-down hierarchical categorisation accuracy can be improved by up to 12\%. In addition, unlike other feature selection models --- which typically require different feature selection parameters for categories at different hierarchical levels --- our technique works equally well for all categories in a hierarchical structure. We conclude that, in general, more accurate hierarchical categorisation is possible by using our simple feature selection technique.