James A. Thom

James A. Thom
RMIT University | RMIT · School of Computer Science and Information Technology

PhD

About

142
Publications
39,681
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,029
Citations
Additional affiliations
February 1987 - present
RMIT University

Publications

Publications (142)
Article
Full-text available
Emotions can be evoked in humans by images. Previous reports on Recognition of Emotions induced by Visual Content of images (REVC) mainly focused on numerous features to improve recognition performance. To devise a more robust REVC system, this paper examines the performance of a wide range of classifiers using color histogram as a single feature....
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Article
Intra-prediction modes (IPMs) of H.264/AVC, as visual feature components, are easily extracted from the compressed domain. However, in spite of their efficiency and effectiveness, they can be sensitive to encoder settings such as encoder-type and encoding-profile. In content-based video retrieval using different encoders between a video and its cop...
Conference Paper
Using color histograms in automatic emotion recognition systems faces different issues. One of the important challenges is to determine the appropriate number of bins in the color histogram to achieve the highest recognition performance possible with minimal computations. This research focuses on emotion recognition induced by visual contents of im...
Conference Paper
Full-text available
This paper investigates how social images and image change detection techniques can be applied to identify the damages caused by natural disasters for disaster assessment. We propose a framework that takes advantages of near duplicate image detection and robust boundary matching for the change detection in disasters. First we perform the near dupli...
Conference Paper
Full-text available
Sustainability indicators are increasingly being used to measure the economic, environmental and social properties of complex systems across different temporal and spatial scales. This motivates their inclusion in open distributed knowledge systems such as the Semantic Web. The diversity of such indicator sets provides considerable choice but also...
Article
Searching for images is an everyday activity. Nevertheless, even a highly skilled searcher often struggles to find what they are looking for. This article studies the factors that affect users’ online web image search behaviour, investigating (1) the use of criteria in making image relevance judgements and (2) the effect of familiarity, difficulty...
Article
Full-text available
The segmentation into acts of a circus performance video is challenging as the content has similar characteristics to other performance videos but is quite different from movies, TV programs, and home videos. Segmentation is useful as a long duration circus show usually contains several shorter segments that are acts. We propose a new method for de...
Article
Many organizations often need to share semantic knowledge base content with selected members of other organizations. However, sharing semantic knowledge across different organizations is a critical problem. This is because the differences in the vocabulary utilized by the organizations have to be resolved before knowledge can be shared. Also, if se...
Article
Full-text available
This paper presents a software application for sustainability reporting where a multi-agent system is an integral part of the overall architecture. We describe the social science philosophy and approach on which the application is based, and the ways in which an agent-based system is able to support these. In particular, we explore how the pro-acti...
Conference Paper
There have been several studies of online newspapers that use web server logs to analyze traffic and their user behavior but most of these studies were undertaken requiring a demographic profile of the users. Our study adds to the literature by empirically examining user behavior using web server logs in the absence of demographic information, expl...
Conference Paper
Full-text available
This paper introduces a global descriptor from the com-pressed video domain (H.264) for near duplicate video copy detection tasks. The proposed descriptor uses a spatial-temporal feature structure in an ordinal pattern distribution format. The proposed descriptor is constructed from Intra-Prediction Modes (IPM) of key frames (IDR & I slices) and ex...
Article
Full-text available
Purpose – The purpose of this paper is to examine the history of the standardisation of two largely overlapping electronic document formats between 2005 and 2008, and its implications for future IT standards development. Design/methodology/approach – The document format controversy is researched as an exemplary case study of the institutional riva...
Article
During times of crisis microblogging platforms such as Twitter have played an important role as a communication channel to distribute information. Particularly, disaster-related tweets are valuable resources when tagged with their location for detecting unexpected events. However, they often contain different types of location and one of the main c...
Article
Full-text available
Purpose – Previous work highlights two key challenges in searching for information about individual entities (such as persons, places and organisations) over semantic data: query ambiguity and redundant attributes. The purpose of this paper is to consider these challenges and proposes the Attribute Importance Model (AIM) for clustering and ranking...
Article
The success of an enterprise information retrieval system is determined by interactions among three key entities: the search engine employed; the service provider who delivers, modifies, and maintains the engine; and the users of the service within the organization. Evaluations of an enterprise search have predominately focused on the effectiveness...
Conference Paper
Using eye tracking in the evaluation of web search interfaces can provide rich information on users' information search behaviour, particularly in the matter of user interaction with different informative components on a search results screen. One of the main issues affecting the use of eye tracking in research is the quality of captured eye moveme...
Article
Semantic models help in achieving semantic interoperability among sources of data and applications. The necessity to efficiently manage these types of objects has increased the number of specialized repositories, usually referred to as semantic databases. An increasing number of project initiatives have been recorded that choose to formalize applic...
Conference Paper
In the last decade, online newspapers have become a viable alternative to conventional hardcopy papers. Many studies have shown that digital media have increased their share of Internet audience. In this study, we use Web usage and Web content mining techniques to recommend news articles to users. We are using Web server logs from a Malaysian newsp...
Article
Full-text available
Existing approaches to sustainability assessment are typically characterized as being either “top–down” or “bottom–up.” While top–down approaches are commonly adopted by businesses, bottom–up approaches are more often adopted by civil society organizations and communities. Top–down approaches clearly favor standardization and commensurability betwe...
Article
Semantic models help in achieving semantic interoperability among sources of data and applications. The necessity to efficiently manage these types of objects has increased the number of specialized repositories, usually referred to as semantic databases. An increasing number of project initiatives have been recorded that choose to formalize applic...
Conference Paper
Full-text available
We present an ontology to represent the key concepts of sustainability indicators that are increasingly being used to measure the economic, environmental and social properties of complex systems. There have been few efforts to represent multiple indicators formally, in spite of the fact that comparison of indicators and measurements across reportin...
Conference Paper
Full-text available
We have developed a software application for a new, emerging approach to sustainability reporting, where a multi-agent system is an integral part of the overall architecture. The agent-oriented approach readily achieves the functionality required for this application, and the Belief Desire Intention (BDI) agent framework assisted in clarifying syst...
Article
Full-text available
Information retrieval has a strong founda-tion of empirical investigation: based on the position of relevant resources in a ranked answer list, a variety of system performance metrics can be calculated. One of the most widely reported measures, mean average precision (MAP), provides a single numerical value that aims to capture the overall performa...
Article
Full-text available
The use of cloud computing has increased rapidly in many organizations. Cloud computing provides many benefits in terms of low cost and accessibility of data. Ensuring the security of cloud computing is a major factor in the cloud computing environment, as users often store sensitive information with cloud storage providers but these providers may...
Article
Full-text available
The value of a single dataset is increased when it is linked to combinations of datasets to provide users with more information. Linked Data is a style of publishing data on the Web by using a struc-tured machine-readable format, RDF, and semantically typed relations to connect related data. Its structured representation opens up new possibilities...
Conference Paper
This report describes our participation in the Snippet retrieval track. Snippets were constructed by first selecting sentences according to the occurrence of query terms. We also used a pseudo-relevance feedback approach in order to expand the original query. Results showed that a large number of extra terms may harm sentence selection for short su...
Conference Paper
Maintaining strictness in dimensions is important in integration of data warehouses. A dimension that satisfies all of its roll-up constraints is said to be strict, a property that is required for correct aggregation. Existing work on instance matching does not address the problem of enforcing the strictness of roll-up constraints. In this paper, w...
Conference Paper
Full-text available
In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language...
Conference Paper
Star schemas describe the structure and properties of multidimensional sources such as data marts and data warehouses. They have a simple structure and a predictable topology. We propose StarMod a representation of Star schema model described in UML and infer its instances from relational schemas. StarMod includes a comprehensive set of properties...
Article
Full-text available
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2010 evaluation campaign, which consisted of a wide range of tracks: Ad Hoc, Book, Data Centric, Interactive, Q...
Conference Paper
Full-text available
In this paper, we introduce our approach used for TRECVID 2011 Content-Based Copy Detection (CCD) task. It was the first experience of RMIT University in TRECVID and with respect to the team background in image processing and using global features in this field of research we preferred to follow the same technique in video copy detection task in th...
Article
Full-text available
Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag names of entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed,...
Conference Paper
Full-text available
The problem of integrating heterogeneous data marts is an important problem in building enterprise data warehouses. Specially identifying compatible dimensions is crucial to successful integration. Existing notions of dimension compatibility rely on given and exact dimension hierarchy information being available. In this paper, we propose to infer...
Article
Full-text available
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2009 evaluation campaign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking,...
Conference Paper
Full-text available
This paper gives an overview of the INEX 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For th...
Article
Full-text available
This paper reports on the RMIT group's approach to XML retrieval while participating in INEX 2003. We indexed XML documents using Lucy, a compact and fast text search engine designed and written by the Search Engine Group at RMIT University. For each INEX topic, up to 1000 highly ranked documents were then loaded and indexed by eXist, an open sourc...
Conference Paper
Full-text available
Systems that filter web search results to return open educational resources need evaluation. The Cranfield method, which is widely used in information retrieval evaluation, can be used as the basis of a model for evaluating such systems. The Cranfield method requires a collection of resources with associated judgments. In this paper, we describe an...
Conference Paper
The Web Service Discovery track aims to investigate techniques for discovery of Web services based on searching service descriptions provided in Web Services Description Language (WSDL). Participating groups contributed to topic development and to the evaluation, which allows them to compare the effectiveness of their XML retrieval techniques for t...
Conference Paper
Successful information search requires a joint effort from both syntactic matching provided by current search engines and semantic matching performed by human users. Word-based syntactic matching schemes work well for tasks such as homepage finding or fact finding, but they are less effective in supporting exploratory search tasks such as learning...
Article
With ever-increasing amounts of information on the World Wide Web, an effective interface for displaying search results is required. Recent studies have developed various novel approaches for visual summaries, aiming to improve the effectiveness of search results. In this study we evaluate the effectiveness of four types of visual summary: thumbnai...
Article
This paper reports the result of an exploratory user study investigating criteria that are important to users when judging relevance while performing an image search. Data was collected from 12 participants using questionnaires and screen capture recordings. Users were required to perform three image search tasks which are specific, general and abs...
Chapter
Full-text available
This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity ex...
Article
Many applications benefit from the use of a suitable ontology but it can be difficult to determine which ontology is best suited to a particular application. Although ontology evaluation techniques are improving as more measures and methodologies are proposed, the literature contains few specific examples of cohesive evaluation activity that links...
Conference Paper
Full-text available
Many different ranking algorithms based on content and context have been used in web search engines to find pages based on a user query. Furthermore, to achieve better performance some new solutions combine different algorithms. In this paper we use simulated click-through data to learn how to combine many content and context features of web pages....
Conference Paper
Full-text available
The utility of an enterprise search system is determined by three key players: the information retrieval (IR) system (the search engine), the enterprise users, and the service provider who delivers the tailored IR service to its designated enterprise users. Currently, evaluations of enterprise search have been focused largely on the IR system effec...
Conference Paper
Full-text available
Search result organisation and presentation is an important component of a Web search system, it can have a substantial impact on the ability of users to find useful information. In this study we compare the effectiveness of three publicly available search interfaces for supporting navigational search tasks. The three interfaces vary primarily in t...
Article
Full-text available
The organisation, content and presentation of doc- ument surrogates has a substantial impact on the effec- tiveness of web search result interfaces. Most interfaces include textual information, including for example the document title, URL, and a short query-biased sum- mary of the content. Other interfaces include additional browsing features, suc...
Conference Paper
Full-text available
This paper describes the RMIT group's participation in the book retrieval task of the INEX booktrack in 2008. Our results suggest that for book retrieval task, using a page-based index and ranking books based on the number of pages retrieved may be more effective than di- rectly indexing and ranking whole books. This paper describes the participati...
Article
Service matching approaches trade precision for recall, creating the need for users to choose the correct services, which obviously is a major obstacle for automating the service discovery and ag- gregation processes. Our approach to overcome this problem, is to eliminate the appearance of false positives by returning only the correct services. As...
Conference Paper
Full-text available
Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In...
Conference Paper
The traditional entity extraction problem lies in the abili ty of ex- tracting named entities from plain text using natural langu age pro- cessing techniques and intensive training from large document col- lections. Examples of named entities include organisations, people, locations, or dates. There are many research activities inv olving named ent...
Conference Paper
Full-text available
Data analysts would benefit greatly from the ability to navigate and view combined multidimensional data from multiple sources, a key requirement of which is the conformity between their dimensions. The strict requirements of conformity restrict navigating to related multidimensional data from unseen or unfamiliar sources. In this paper we make a d...
Conference Paper
Full-text available
A common approach to content-based image retrieval is to use example images as queries; images in the collection that have low-level features similar to the query examples are returned in response to the query. In this paper, we explore the use of image regions as query examples. We compare the retrieval eec- tiveness of using whole images, single...
Article
The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%.
Conference Paper
Full-text available
Many realistic user tasks involve the retrieval of specific entities instead of just any type of documents. Examples of information needs include `Countries where one can pay with the euro' or `Impressionist art museums in The Netherlands'. The Initiative for Evaluation of XML Retrieval (INEX) started the XML Entity Ranking track (INEX-XER) to crea...
Conference Paper
Full-text available
This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the e...
Article
Full-text available
Wikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking queries...
Article
The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we...
Article
Full-text available
The application of machine learning techniques to image and video search has been shown to boost the performance of multimedia retrieval systems, and promises to lead to more generalized semantic search approaches. In particular, the availability of large training collections allows model-driven search using a substantial number of semantic concept...
Article
Full-text available
Focused retrieval, identified by question answering, passage retrieval, and XML element retrieval, is becoming increasingly important within the broad task of information retrieval. In this paper, we present a taxonomy of text retrieval tasks based on the structure of the answers required by a task. Of particular importance are the in context tasks...
Conference Paper
Full-text available
Statistical learning methods are commonly applied in content-based video and image retrieval. Such meth- ods require a large number of examples which are usu- ally obtained through a manual annotation process, that is human raters review images and assign seman- tic concept labels. The human judgement, however, cannot be regarded as the ultimate tr...
Conference Paper
Full-text available
Ontology evaluation is a maturing discipline with methodologies and measures being developed and proposed. However, evaluation methods that have been proposed have not been applied to spe- cific examples. In this paper, we present the state-of-the-art in on- tology evaluation - current methodologies, criteria and measures, analyse appropriate evalu...
Article
Full-text available
Content-based image retrieval has been used in various application domains, but the semantic gap problem remains a challenge to be overcome. One possible way to overcome this problem is to represent the knowledge extracted from the low-level image features through semantic concepts. In this paper we describe how we use an image ontology to this end...
Conference Paper
Full-text available
Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of interest as part of the query; and the information does not have to be text. In this paper, we describe an investigatio...
Conference Paper
Full-text available
All Information Technology (IT) systems have architecture and these architectures are developed by people, frequently called IT architects. These people vary in their capabilities and this directly affects the systems they work with. This research investigates whether some previously identified capabilities, (intuitive cognitive style, problem solv...
Article
Full-text available
The role of IT Architect is important in the development and successful implementation of Information Technology systems across the world. The people performing the role are critical to the success of the systems. This paper reports on the results of an experiment aimed at developing two key IT architect capabilities within the context of a post gr...