S.M.M. Tahaghoghi

S.M.M. Tahaghoghi
Microsoft · Bing

PhD

About

46
Publications
12,913
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
932
Citations

Publications

Publications (46)
Conference Paper
Preference based methods for collecting relevance data for information retrieval (IR) evaluation have been shown to lead to better inter-assessor agreement than the traditional method of judging individual documents. However, little is known as to why preference judging reduces assessor disagreement and whether better agreement among assessors also...
Conference Paper
Test collections are powerful mechanisms for the evaluation and optimization of information retrieval systems. However, there is reported evidence that experiment outcomes can be affected by changes to the judging guidelines or changes in the judge population. This paper examines such effects in a web search setting, comparing the judgments of four...
Article
In information retrieval, relevance judgments play an important role as they are required both for evaluating the quality of retrieval systems and for training learning to rank algorithms. In recent years, numerous papers have been published using judgments obtained from a commercial search engine by researchers in industry. As typically no informa...
Conference Paper
Full-text available
Whole page relevance defines how well the surface-level repre- sentation of all elements on a search result page and the corre- sponding holistic attributes of the presentation respond to users' information needs. We introduce a method for evaluating the whole-page relevance of Web search engine results pages. Our key contribution is that the metho...
Article
Full-text available
We introduce a method for evaluating the relevance of all visible components of a Web search results page, in the context of that results page. Contrary to Cranfield-style evaluation methods, our approach recognizes that a user"s initial search interaction is with the result page produced by a search system, not the land-ing pages linked from it. O...
Conference Paper
Full-text available
Detecting whether computer program code is a student's original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval con- cepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity f...
Conference Paper
Full-text available
A common approach to content-based image retrieval is to use example images as queries; images in the collection that have low-level features similar to the query examples are returned in response to the query. In this paper, we explore the use of image regions as query examples. We compare the retrieval eec- tiveness of using whole images, single...
Article
Full-text available
The application of machine learning techniques to image and video search has been shown to boost the performance of multimedia retrieval systems, and promises to lead to more generalized semantic search approaches. In particular, the availability of large training collections allows model-driven search using a substantial number of semantic concept...
Conference Paper
Full-text available
There are several well-known approaches to parsing Arabic text in preparation for indexing and retrieval. Techniques such as stemming and stopping have been shown to improve search results on written newswire dispatches, but few comparisons are available on other data sources. In this paper, we apply several alternative stemming and stopping approa...
Conference Paper
Full-text available
Among the vast numbers of images on the web are many du- plicates and near-duplicates, that is, variants derived from the same original image. Such near-duplicates appear in many web image searches and may represent infringements of copyright or indicate the presence of redundancy. While methods for identifying near-duplicates have been investi- ga...
Article
Full-text available
Transliteration of a word into another language often leads to multiple spellings. Unless an information retrieval system recognises different forms of transliterated words, a significant number of documents will be missed when users specify only one spelling variant. Using two different datasets, we evaluate several approaches to finding variants...
Article
The copying of programming assignments is a widespread problem in academic institutions. Manual plagiarism detection is time-consuming, and current popular plagiarism de- tection systems are not scalable to large code repositories. While there are text-based plagiarism detection systems capable of handling millions of student papers, comparable sys...
Article
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be used to predict how people interact with the search system. In this paper, we introduce two novel applications of query logs, in the context of distributed information ret...
Article
Full-text available
Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability" to their common morphological root, "comput-." In text search, this permits a search for "compu...
Conference Paper
Full-text available
Statistical learning methods are commonly applied in content-based video and image retrieval. Such meth- ods require a large number of examples which are usu- ally obtained through a manual annotation process, that is human raters review images and assign seman- tic concept labels. The human judgement, however, cannot be regarded as the ultimate tr...
Article
Full-text available
Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection....
Article
Full-text available
Content-based image retrieval has been used in various application domains, but the semantic gap problem remains a challenge to be overcome. One possible way to overcome this problem is to represent the knowledge extracted from the low-level image features through semantic concepts. In this paper we describe how we use an image ontology to this end...
Conference Paper
Full-text available
Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of interest as part of the query; and the information does not have to be text. In this paper, we describe an investigatio...
Conference Paper
Full-text available
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be esti- mated. While several approaches for the estimation of col- lection size have been proposed, their accuracy ha...
Conference Paper
Full-text available
The increasing flow of information be- tween languages has led to a rise in the fre- quency of non-native or loan words, where terms of one language appear transliter- ated in another. Dealing with such out of vocabulary words is essential for suc- cessful cross-lingual information retrieval. For example, techniques such as stemming should not be a...
Conference Paper
Full-text available
Plagiarism is a widespread problem in assessment tasks; in computing courses, students often plagiarise source code. For all but the smallest classes, manual detection of such plagiarism is impractical, and, while automated tools are available, none has been applied to detect inter-lingual plagiarism, where source code is copied from one language t...
Conference Paper
Full-text available
Two common approaches in retrieving images from a collection are retrieval by text keywords and retrieval by visual content. However, it is widely recognised that it is impossible for keywords alone to fully describe visual content. This paper reports on the participation of the RMIT University group in the INEX 2005 multimedia track, where we inve...
Conference Paper
Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equiv- alents; however, unlike English, most affixes in Arabic are difficult to discrimina...
Conference Paper
Full-text available
Stemming words to (usually) remove suffixes has applications in text search, machine translation, document summarization, and text classification. For example, English stemming reduces the words "computer," "computing," "computation," and "computability" to their common morphological root, "comput-." In text search, this permits a search for "compu...
Conference Paper
Full-text available
Segmentation is the rst step in managing data for many information retrieval tasks. Automatic audio transcriptions and digital video footage are typically continuous data sources that must be pre-processed for segmentation into logical entities that can be stored, queried, and retrieved. Shot boundary detec- tion is a common low-level video segment...
Article
Full-text available
Results: We compare our alignment schemes, using different window sizes and penalty values, with results obtained by documents indexed using a search engine called Zettair. The following table shows that our alignment method can differentiate between parallel and non-parallel documents when compared to a search engine baseline. This differentiation...
Conference Paper
Full-text available
The widespread adoption of XML necessitates structure- aware systems that can eectiv ely retrieve information from XML doc- ument collections. This paper reports on the participation of the RMIT group in the INEX 2004 ad hoc track, where we investigate dieren t aspects of the XML retrieval task. Our preliminary analysis of CO and VCAS relevance ass...
Conference Paper
Full-text available
The copying of programming assignments is a widespread problem in academic institutions. Manual plagiarism detection is time-consuming, and current popular plagiarism detection systems are not scalable to large code repositories. While there are text-based plagiarism detection systems capable of handling millions of student papers, comparable syste...
Conference Paper
Full-text available
Segmenting digital video into its constituent basic semantic entities, or shots, is an important step for effective management and retrieval of video data. Recent automated techniques for detecting transitions between shots are highly effective on abrupt transitions. However, automated detection of gradual transitions, and the precise determination...
Conference Paper
Full-text available
Indonesia is the fourth most populous country and a close neighbour of Australia. However, despite media and intelligence interest in Indonesia, little work has been done on evaluating Information Retrieval techniques for Indonesian, and no standard testbed exists for such a purpose. An effective testbed should include a collection of documents, re...
Article
Full-text available
Digital video is widely used in multimedia databases and requires effective retrieval techniques. Shot bound-ary detection is a common first step in analysing video content. The effective detection of gradual transitions is an especially difficult task. Building upon our past research work, we have designed a novel decision stage for detection of g...
Conference Paper
Content-Based Image Retrieval (cbir) is the practical class of techniques used for information retrieval from large image collections. Many CBIR systems allow users to specify their information need by providing an example image. This query-by-example paradigm can be extended to support multiple example images. In this work, we present a large-scal...
Chapter
Content Based Image Retrieval (CBIR) systems that are able to “retrieve images of Clinton with Lewinsky” are unrealistic at present. However, this area has seen much research and development activity since IBM’s QBIC announcement in 1994. The CHITRA CBIR system under development at the RMIT and Monash Universities, addresses the need for a test bed...
Conference Paper
Full-text available
A major hurdle in practical content based image retrieval (CBIR) is conveying the user's information need to the system. One common method of query specification is to express the query using one or more example images. The authors consider whether using more examples improves the effectiveness of CBIR in meeting a user's information need. We show...
Conference Paper
Full-text available
Different scenarios of XML retrieval are analysed in the INEX 2005 ad hoc track, which reflect different query interpretations and user behaviours that may be observed during XML retrieval. The RMIT University group’s participation in the INEX 2005 ad hoc track investigates these XML retrieval scenarios. Our runs follow a hybrid XML retrieval appro...
Article
Full-text available
Plagiarism is a longstanding problem faced by academics. Detecting and processing cases of student plagiarism is a tedious and time-consuming task, and difficult to manage for the large class sizes common in the modern tertiary education environment. More importantly, policing does not address the underlying causes of academic dishonesty. In this p...
Article
Full-text available
Run overview We participated in the shot boundary detection and video search tasks. This page provides a summary of our experiments: Shot Boundary Detection Our approach uses the moving query window tech-nique [17, 18, 21, 22]. We applied the system that we used in 2004 [22] and varied algorithm parameters around the optimal settings that we obtain...
Article
With the large number of images available on the Internet, illegal distribution of images has become an issue for many digital artists. Because it is possible that the format or dimension of the image to be altered during the duplication process, conventional means of identifying duplicates, such as comparing file names or hash values, are inadequa...
Article
We investigated image retrieval using texture segmentation by genetic programming. In this study, we are interested with two textures: sky and grass textures. Single-step texture classification by genetic programming was used. Based on the result of texture segmentation, an image will be labelled as having the textures of interest or not. Then, the...
Article
Full-text available
Run overview We participated in the Shot Boundary Detection task. This page provides a summary of: (1) the approaches tested in the submitted runs; (2) differences in results between the runs; (3) the overall relative contribution of the techniques; and, (4) our overall conclusions. 1. Our approach to shot boundary detection uses the moving query w...

Network

Cited By