Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Environmental data are considered of utmost importance for human life, since weather conditions, air quality and pollen are strongly related to health issues and affect everyday activities. This paper addresses the problem of discovery of air quality and pollen forecast Web resources, which are usually presented in the form of heatmaps (i.e. graphical representation of matrix data with colors). Towards the solution of this problem, we propose a discovery methodology, which builds upon a general purpose search engine and a novel post processing heatmap recognition layer. The first step involves generation of domain-specific queries, which are submitted to the search engine, while the second involves an image classification step based on visual low level features to identify Web sites including heatmaps. Experimental results comparing various visual features combinations show that relevant environmental sites can be efficiently recognized and retrieved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the environmental domain, resource discovery has previously been addressed through the application of techniques from both categories. Early techniques of the first category (e.g., [19]) have relied only on textual evidence for the classification of the retrieval results, while more recent approaches [18] have used visual evidence for performing this post-retrieval filtering; however, the combination of multimedia evidence has not been considered. On the other hand, recent focussed crawling approaches [25] have taken into account both textual and visual evidence for selecting the links to follow during their traversal of the Web graph. ...
... This work proposes a framework for the discovery of Web resources that provide air quality measurements and forecasts based on techniques of the first category that combine multimedia evidence at several steps of the process. In particular, given the frequent presence of multimedia items relevant to the topic within such resources, namely heatmaps, it proposes the submission of domain-specific queries not only to the Web search component of general-purpose search engines (as done so far [22,20,19,18]), but also to their Image search component, and the fusion of the two result sets. These submitted queries are either formulated manually based on domain-specific terms that are empirically identified, or are also automatically expanded by applying machine learning techniques for extracting domain-specific expressions (referred to as 'keyword spices' [22]) from positive and negative samples of such Web resources. ...
... Regarding heatmaps, research has mainly focussed on the information extraction from them [7]. More recently, a method for heatmap recognition that uses SVMs to build classifiers based on several visual features (MPEG-7, SIFT, AHDH) has been investigated [18]. Our approach considers further state-of-the-art visual features (including SURF and SIFT descriptors using VLAD encoding, as well as features extracted from Convolutional Neural Networks) and employs both SVMs and also Logistic Regression Classifiers. ...
Conference Paper
This work proposes a framework for the discovery of environmental Web resources providing air quality measurements and forecasts. Motivated by the frequent occurrence of heatmaps in such Web resources, it exploits multimedia evidence at different stages of the discovery process. Domain-specific queries generated using empirical information and machine learning driven query expansion are submitted both to the Web and Image search services of a general-purpose search engine. Post-retrieval filtering is performed by combining textual and visual (heatmap-related) evidence in a supervised machine learning framework. Our experimental results indicate improvements in the effectiveness when performing heatmap recognition based on SURF and SIFT descriptors using VLAD encoding and when combining multimedia evidence in the discovery process.
... Heat maps are the primary way of representation in Kohonen's Self Organizing Maps which are employed e.g. in bioinformatics [18]. One of the most widespread application is for different environmental maps, starting from the weather visualization (temperature, atmospheric pressure) to terrain height [10]. ...
Conference Paper
Grammar inference methods allow us to create grammars and automata based on provided data. Those automata can be utilized as classifiers for yet unknown strings. Fuzzy sets theory allowed the implementation of a gradual level of strings' membership and made effective error handling feasible and automata classifiers more applicable in real-life tasks. In this paper, we reversed the currently existing approach-instead of focusing on the whole string membership, we made an approach to determine membership distribution throughout string letters and visualize it using a heat map.
... Heatmapping is a technique to distinctively represent data via color mapping which depends on the weight of data point within a distribution. Heatmap projection such as [12] has shown potential for overlaying on choropleth mapping especially when each geometry is restricted according to predefined political geographic boundaries. As shown in figure 1, the density of color ranges from blue (low density) to red (high density). ...
Conference Paper
Full-text available
Urban planners and policy makers often rely on data visual-ization and spatial data mapping tools to perceive the overall urban trends. The accumulation of historical and real-time urban data from many government and private organizations provides the opportunity for an integrated visual analytic platform. Data management and retrieval for geospa-tial visualization, correlations, and analysis of multiple data dimensions over a map constitute some of the main challenges when dealing with the heterogeneity of urban data from a variety of sources. In this paper, spatiotemporal aggregation strategies and approaches to accelerate the retrieval of spatial data are presented. The methods are tested on visualizing multivariate urban datasets from two cities in Australia that are aggregated from heterogeneous federated urban data providers. The aggregated spatial or temporal features can be visualized as a choropleth heatmap or extru-sion on map. Dynamic spatial window query in our visual analytics tool allows extraction of flat geometry objects optimized through materialized views from a database. Given the robust and scalable orchestration of geometries retrieval, this enables urban planners to perform interactive and dynamic multidimensional visual exploration over a map.
... To this end, we exploit an existing general purpose search engine (Yahoo BOSS 7 ) by submitting to it domain-specific queries. The results are post-processed using supervised classification based on textual (Moumtzidou, Vrochidis, Tonelli, Kompatsiaris, & Pianta, 2012) and visual information (Moumtzidou, Vrochidis, Chatzilari, & Kompatsiaris, 2013). The discovery is optimized by an administrative user, who interacts with the system to improve and validate the results . ...
Article
Data on observed and forecasted environmental conditions, such as weather, air quality and pollen, are offered in a great variety in the web and serve as basis for decisions taken by a wide range of the population. However, the value of these data is limited because their quality varies largely and because the burden of their interpretation in the light of a specific context and in the light of the specific needs of a user is left to the user herself. To remove this burden from the user, we propose an environmental Decision Support System (DSS) model with an ontology-based knowledge base as its integrative core. The availability of an ontological knowledge representation allows us to encode in a uniform format all knowledge that is involved (environmental background knowledge, the characteristic features of the profile of the user, the formal description of the user request, measured or forecasted environmental data, etc.) and apply advanced reasoning techniques on it. The result is an advanced DSS that provides high quality environmental information for personalized decision support.
... Such visualization is very popular to show dependencies between data in Kohonen's Self Organizing Maps which are very popular in bioinfonnatics [1]. They are particularly useful in environmental data visualization [2], where coordinate axis X and Y indicate the geographical longitude and latitude of every map point for a specific geographic projection. The environmental characteristics like concentration values of several air polutants or demographical infonnation are expressed by color. ...
Conference Paper
The automatic method of information extraction from heatmaps based on OCR, image processing and image recognition techniques is proposed. It is composed of the sequence of steps. First, the heatmap area is separated from other elements of the heatmap image. Next, the key and axis are recognized. To produce quick answers for a user query, the heatmap is stored in the form of a tree. The method was tested on the basis of several diverse heatmaps. The results are promising.
Thesis
The often cited information explosion is not limited to volatile network traffic and massive multimedia capture data. Structured and high quality data from diverse fields of study become easily and freely available, too. This is due to crowd sourced data collections, better sharing infrastructure, or more generally speaking user generated content of the Web 2.0 and the popular transparency and open data movements. At the same time as data generation is shifting to everyday casual users, data analysis is often still reserved to large companies specialized in content analysis and distribution such as today's internet giants Amazon, Google, and Facebook. Here, fully automatic algorithms analyze metadata and content to infer interests and believes of their users and present only matching navigation suggestions and advertisements. Besides the problem of creating a filter bubble, in which users never see conflicting information due to the reinforcement nature of history based navigation suggestions, the use of fully automatic approaches has inherent problems, e.g. being unable to find the unexpected and adopt to changes, which lead to the introduction of the Visual Analytics (VA) agenda. If users intend to perform their own analysis on the available data, they are often faced with either generic toolkits that cover a broad range of applicable domains and features or specialized VA systems that focus on one domain. Both are not suited to support casual users in their analysis as they don't match the users' goals and capabilities. The former tend to be complex and targeted to analysis professionals due to the large range of supported features and programmable visualization techniques. The latter trade general flexibility for improved ease of use and optimized interaction for a specific domain requirement. This work describes two approaches building on interactive visualization to reduce this gap between generic toolkits and domain-specific systems. The first one builds upon the idea that most data relevant for casual users are collections of entities with attributes. This least common denominator is commonly employed in faceted browsing scenarios and filter/flow environments. Thinking in sets of entities is natural and allows for a very direct visual interaction with the analysis subject and it stands for a common ground for adding analysis functionality to domain-specific visualization software. Encapsulating the interaction with sets of entities into a filter/flow graph component can be used to record analysis steps and intermediate results into an explicit structure to support collaboration, reporting, and reuse of filters and result sets. This generic analysis functionality is provided as a plugin-in component and was integrated into several domain-specific data visualization and analysis prototypes. This way, the plug-in benefits from the implicit domain knowledge of the host system (e.g. selection semantics and domain-specific visualization) while being used to structure and record the user's analysis process. The second approach directly exploits encoded domain knowledge in order to help casual users interacting with very specific domain data. By observing the interrelations in the ontology, the user interface can automatically be adjusted to indicate problems with invalid user input and transform the system's output to explain its relation to the user. Here, the domain related visualizations are personalized and orchestrated for each user based on user profiles and ontology information. In conclusion, this thesis introduces novel approaches at the boundary of generic analysis tools and their domain-specific context to extend the usage of visual analytics to casual users by exploiting domain knowledge for supporting analysis tasks, input validation, and personalized information visualization.
Article
Full-text available
There is a large amount of meteorological and air quality data available online. Often, different sources provide deviating and even contradicting data for the same geographical area and time. This implies that users need to evaluate the relative reliability of the information and then trust one of the sources. We present a novel data fusion method that merges the data from different sources for a given area and time, ensuring the best data quality. The method is a unique combination of land-use regression techniques, statistical air quality modelling and a well-known data fusion algorithm. We show experiments where a fused temperature forecast outperforms individual temperature forecasts from several providers. Also, we demonstrate that the local hourly NO2 concentration can be estimated accurately with our fusion method while a more conventional extrapolation method falls short. The method forms part of the prototype web-based service PESCaDO, designed to cater personalized environmental information to users.
Conference Paper
Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach that estimates the relevance of a hyperlink to an unvisited Web resource based on the combination of textual evidence representing its local context, namely the textual content appearing in its vicinity in the parent page, with visual evidence associated with its global context, namely the presence of images relevant to the topic within the parent page. The proposed focussed crawling approach is applied towards the discovery of environmental Web resources that provide air quality measurements and forecasts, since such measurements (and particularly the forecasts) are not only provided in textual form, but are also commonly encoded as multimedia, mainly in the form of heatmaps. Our evaluation experiments indicate the effectiveness of incorporating visual evidence in the link selection process applied by the focussed crawler over the use of textual features alone, particularly in conjunction with hyperlink exploration strategies that allow for the discovery of highly relevant pages that lie behind apparently irrelevant ones.
Conference Paper
Full-text available
It is common practice to disseminate Chemical Weather (air quality and meteorology) forecasts to the general public, via the internet, in the form of pre-processed images which differ in format, quality and presentation, without other forms of access to the original data. As the number of on-line available Chemical Weather (CW) forecasts is increasing, there are many geographical areas that are covered by different models, and their data could not be combined, compared, or used in any synergetic way by the end user, due to the aforementioned heterogeneity. This paper describes a series of methods for extracting and reconstructing data from heterogeneous air quality forecast images coming from different data providers, to allow for their unified harvesting, processing, transformation, storage and presentation in the Chemical Weather portal.
Conference Paper
Full-text available
Extraction and analysis of environmental information is very important, since it strongly affects everyday life. Nowadays there are already many free services providing environmental information in several formats including multimedia (e.g. map images). Although such presentation formats might be very informative for humans, they complicate the automatic extraction and processing of the underlying data. A characteristic example is the air quality and pollen forecasts, which are usually encoded in image maps, while the initial (numerical) pollutant concentrations remain unavailable. This work proposes a framework for the semi-automatic extraction of such information based on a template configuration tool, on Optical Character Recognition (OCR) techniques and on methodologies for data reconstruction from images. The system is tested with a different air quality and pollen forecast heatmaps demonstrating promising results.
Conference Paper
Full-text available
Ontology learning has become a major area of research whose goal is to facilitate the construction of ontologies by decreasing the amount of effort required to produce an ontology for a new domain. However, there are few studies that attempt to automate the entire ontology learning process from the collection of domain-specific literature, to text mining to build new ontologies or enrich existing ones. In this paper, we present a framework of ontology learning that enables us to retrieve documents from the Web using focused crawling in a biological domain, amphibian morphology. We use a SVM (support vector machine) classifier to identify domain-specific documents and perform text mining in order to extract useful information for the ontology enrichment process. This paper reports on the overall system architecture and our initial experiments on the focused crawler and document classification.
Conference Paper
Full-text available
Shot boundary detection The shot boundary detection system in 2007 is basically the same as that of last year. We make three major modifications in the system of this year. First, CUT detector and GT detector use block based RGB color histogram with the different parameters instead of the same ones. Secondly, we add a motion detection module to the GT detector so that we can remove the false alarms caused by camera motion or large object movements. Finally, we add a post-processing module based on SIFT feature after both CUT and GT detector. The evaluation results show that all these modifications bring performance improvements to the system. The brief introduction to each run is shown in the following table: Run_id Description Thu01 Baseline system: RGB4_48 for CUT and GT detector, no motion detector, no sift post-processing, only using development set of 2005 as training set Thu02 Same algorithm as thu01, but with RGB16_48 for CUT detector, RGB4_48 for GT detector Thu03 Same algorithm as thu02, but with SIFT post-processing for CUT Thu04 Same algorithm as thu03, but with Motion detector for GT
Conference Paper
Full-text available
Shot boundary detection The shot boundary detection system in 2007 is basically the same as that of last year. We make three major modifications in the system of this year. First, CUT detector and GT detector use block based RGB color histogram with the different parameters instead of the same ones. Secondly, we add a motion detection module to the GT detector to remove the false alarms caused by camera motion or large object movements. Finally, we add a post-processing module based on SIFT feature after both CUT and GT detector. The evaluation results show that all these modifications bring performance improvements to the system. The brief introduction to each run is shown in the following table: Run_id Description Thu01 Baseline system: RGB4_48 for CUT and GT detector, no motion detector, no sift post-processing, only using development set of 2005 as training set Thu02 Same algorithm as thu01, but with RGB16_48 for CUT detector, RGB4_48 for GT detector Thu03 Same algorithm as thu02, but with SIFT post-processing for CUT Thu04 Same algorithm as thu03, but with Motion detector for GT Thu05 Same algorithm as thu04, but with SIFT post-processing for GT Thu06 Same algorithm as thu05, but no SIFT processing for CUT Thu09 Same algorithm as thu05, but with different parameters thu11 Same algorithm as thu05, but with different parameters Thu13 Same algorithm as thu05, but with different parameters Thu14 Same algorithm and parameters as thu05, but trained with all the development data from 2003-2006 High-level feature extraction We try a novel approach, Multi-Label Multi-Feature learning (MLMF learning) to learn a joint-concept distribution on the regional level as an intermediate representation. Besides, we improve our Video diver indexing system by designing new features, comparing learning algorithms and exploring novel fusion algorithms. Based on these efforts in improving feature, learning and fusion algorithms, we achieve top results in HFE this year.
Article
Full-text available
The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress.
Conference Paper
Full-text available
The authors present an automated method for the digitization, vectorization and storage of land record maps. Map data is input into a vision system as raw data using a solid-state camera. After noise removal, the processed data is converted to vector data by an efficient algorithm. This point data that presents the key points can be stored on peripheral storage devices for future reconstruction and modification of the map.< >
Article
Full-text available
A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of classifiaction functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms. 1
Conference Paper
Analysis and processing of environmental information is considered of utmost importance for humanity. This article addresses the problem of discovery of web resources that provide environmental measurements. Towards the solution of this domain-specific search problem, we combine state-of-the-art search techniques together with advanced textual processing and supervised machine learning. Specifically, we generate domain-specific queries using empirical information and machine learning driven query expansion in order to enhance the initial queries with domain-specific terms. Multiple variations of these queries are submitted to a general-purpose web search engine in order to achieve a high recall performance and we employ a post processing module based on supervised machine learning to improve the precision of the final results. In this work, we focus on the discovery of weather forecast websites and we evaluate our technique by discovering weather nodes for south Finland.
Conference Paper
Raster map images (e.g., USGS) provide much information in digital form; however, the color assignments and pixel labels leave many serious ambiguities. A color histogram classification scheme is described, followed by the application of a tensor voting method to classify linear features in the map as well as intersections in linear feature networks. The major result is an excellent segmentation of roads, and road intersections are detected with about 93% recall and 66 % precision.
Article
This paper proposes a method for binary image retrieval, where the black-and-white image is represented by a novel feature named the adaptive hierarchical density histogram, which exploits the distribution of the image points on a two-dimensional area. This adaptive hierarchical decomposition technique employs the estimation of point density histograms of image regions, which are determined by a pyramidal grid that is recursively updated through the calculation of image geometric centroids. The extracted descriptor combines global and local properties and can be used in variant types of binary image databases. The validity of the introduced method, which demonstrates high accuracy, low computational cost and scalability, is both theoretically and experimentally shown, while comparison with several other prevailing approaches demonstrates its performance.
Conference Paper
VLFeat is an open and portable library of computer vision algorithms. It aims at facilitating fast prototyping and reproducible research for computer vision scientists and students. It includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization. The source code and interfaces are fully documented. The library integrates directly with MATLAB, a popular language for computer vision research.
Conference Paper
The separation of overlapping text and graphics is a challenging problem in document image analysis. This paper proposes a specific method of detecting and extracting characters that are touching graphics. It is based on the observation that the constituent strokes of characters are usually short segments in comparison with those of graphics. It combines line continuation with the feature line width to decompose and reconstruct segments underlying the region of intersection. Experimental results showed that the proposed method improved the percentage of correctly detected text as well as the accuracy of character recognition significantly.
Geographical information systems (GIS) provide capabilities for the mapping, management and analysis of cartographic information. Unlike most other disciplines, GIS technology was born from specialized applications. A comprehensive theory relating the various techniques used in these applications is only now emerging. By organizing the set of analytic methods into a mathematical structure, a generalized framework for cartographic modelling is developed. Within this framework, users logically order primitive operators on map variables in a manner analogous to traditional algebra and statistics. This paper describes the fundamental classes of operations used in computer-assisted map analysis. Several of the procedures are demonstrated using a fourth-generation computer language for personal computers.
Article
The Barnes (1973) objective map analysis scheme is employed to develop an interactive analysis package for assessing the impact of satellite-derived data on analyses of conventional meteorological data sets. The method permits modification of the values of input parameters in the objective analysis within objectively determined, internally set limits. The effects of the manipulations are rapidly displayed, and methods are included for assimilating the spatially clustered characteristics of satellite data and the various horizontal resolutions of the data types. Data sets from the SESAME rawinsonde wind data with uniform spatial distribution, with the same data set plus satellite cloud motion data, and a data set from the atmospheric sounder radiometer on the GOES satellite were analyzed as examples. The scheme is demonstrated to recover details after two iterations through the data.
Article
The MPEG-7 visual standard under development specifies content-based descriptors that allow users or agents (or search engines) to measure similarity in images or video based on visual criteria, and can be used to efficiently identify, filter, or browse images or video based on visual content. More specifically, MPEG-7 specifies color, texture, object shape, global motion, or object motion features for this purpose. This paper outlines the aim, methodologies, and broad details of the MPEG-7 standard development for visual content description
Article
MPEG-7, formally known as the Multimedia Content Description Interface, includes standardized tools (descriptors, description schemes, and language) enabling structural, detailed descriptions of audio-visual information at different granularity levels (region, image, video segment, collection) and in different areas (content description, management, organization, navigation, and user interaction). It aims to support and facilitate a wide range of applications, such as media portals, content broadcasting, and ubiquitous multimedia. We present a high-level overview of the MPEG-7 standard. We first discuss the scope, basic terminology, and potential applications. Next, we discuss the constituent components. Then, we compare the relationship with other standards to highlight its capabilities
Trecvid-2007 high-level feature task overview
  • W Kraaij
  • P Over
  • G Awad