Conference Paper

Extracting spatial information from social media in support of agricultural management decisions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Farmers face pressure to respond to unpredictable weather, the spread of pests, and other variable events on their farms. This paper proposes a framework for data aggregation from diverse sources that extracts named places impacted by events relevant to agricultural practices. Our vision is to couple natural language processing, geocoding, and existing geographic information retrieval techniques to increase the value of already-available data through aggregation, filtering, validation, and notifications, helping farmers make timely and informed decisions with greater ease.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This key can be used to provide a time series for mortar falling. * geocoding forms a fundamental part of spatial analysis in a variety of research disciplines especially in extracting spatial coordinates from social media [Golubovic, 2017]. The algorithm used to solve geocoding the data is similar to the gazetteer algorithm [Churches, 2002]. ...
... geocoding forms a fundamental part of spatial analysis in a variety of research disciplines especially in extracting spatial coordinates from social media [Golubovic, 2017]. The algorithm used to solve geocoding the data is similar to the gazetteer algorithm [Hill, 2000]. ...
Article
The paper analyzes the use of social media data in geographical information systems to map the areas most affected by mortar shells in the capital of Syria, Damascus, by using geocoded and parsed social media data in geographical information systems. This paper describes a created algorithm to collecting and store data from social media sites. For the data store both a NoSQL database to save JSON format document and an RDBMS is used to save other spatial data types. A python script was written to collect the data in social media based on certain keywords related to the search. A geocoding algorithm to locate social media posts that normalize, standardize and tokenize the text was developed. The result of the developed diagram provided a year by year from 2013 to 2018 maps for mortar shell falling locations in Damascus. These layers give an overview for the changing of the numbers of mortar shells falls or in hot spot analysis for the city. Finally, social media data can prove to be useful when creating maps for dynamic social phenomena, for example, mortar shells’ location falling in Damascus, Syria. Moreover, social media data provide easy, massive, and timestamped data which makes these phenomena easier to study.
... In the same paper, Larson defines spatial queries (also known as geographic queries or geospatial queries) as types of queries with spatial relationships to entities that are geometrically defined and spatially localised. GIS have only been made available to the general public in the last fifteen years, by companies such as Google and Microsoft who introduced, in 2005, the most used on-line GIR services, Google Maps 6 and Live Search Maps (today Bing Maps 7 ) respectively. The huge growth of the Internet and available data have led to many possibilities for GIR research and applications, such as georeferenced media or geotagged pictures posted on social networks. ...
... Similarly, authors in [11,25] exploited Flickr database to geolocalise videos. Finally, in the last few years, efforts have been made, using GIR systems, to extract spatial information from the Internet searches of users and from social media [6], and to integrate heterogeneous data sources to develop efficient multimodal systems [23]. ...
Article
Full-text available
The availability of videos has grown rapidly in recent years. Finding and browsing relevant information to be automatically extracted from videos is not an easy task, but today it is an indispensable feature due to the immense number of digital products available. In this paper, we present a system which provides a process to automatically extract information from videos. We describe a system solution that uses a re-trained OpenNLP model to locate all the places and famous people included in a specific video. The system obtains information from the Google Knowledge Graph related to relevant named entities such as places or famous people. In this paper we will also present the Automatic Georeferencing Video (AGV) system developed by RAI (Radiotelevisione italiana, which is the national public broadcasting company of Italy, owned by the Ministry of Economy and Finance) Teche for the European Project “La Città Educante” (The Educating City: teaching and learning processes in cross-media ecosystem) Our system contributes to The Educating City project by providing the technological environment to create statistical models for automatic named entity recognition (NER), and has been implemented in the field of education, in Italian initially. The system has been applied to the learning challenges facing the world of educational media and has demonstrated how beneficial combining topical news content with scientific content can be in education.
... Spatial information from social media is also being used to understand agricultural management decisions. (Golubovic et al., 2016). ...
Technical Report
Full-text available
In 2014, the Agency initially became aware of lepidopteran resistance to Bt traits in corn and cotton in the continental U.S. from unconfirmed published reports by academic scientists. The reports of resistance were observed specifically for fall armyworm (Spodoptera frugiperda), corn earworm (Helicoverpa zea), and western bean cutworm (Striacosta albicosta). In 2016, industry registrants of Bt PIPs submitted annual monitoring data to the Agency which confirmed lepidopteran resistance for southwestern corn borer (Diatraea grandiosella). Based on the information included with the monitoring data, the Agency concluded that there were major risk factors responsible for Bt resistance in these cases: 1) lack of available high dose traits; 2) use of single mode of action in Bt corn year-after-year; 3) use of corn seed blends in the southern U.S.; 4) poor refuge compliance for Bt corn in southern states; 5) continuous selection with the same traits expressed in Bt corn and Bt cotton in a given year; 6) shortcomings in current EPA recommended methodological approaches for monitoring for resistant field populations; and 7) challenges with identifying resistance with current diet bioassay methods. The FIFRA SAP was charged with providing recommendations to the Agency in considering options to reduce resistance risks for lepidopteran pests, increase the longevity of currently functional Bt traits and future technologies, and improve the current insect resistance management program for lepidopteran pests of Bt corn and cotton. The Panel addressed eight charge questions divided into resistance reports for lepidopteran pests of Bt PIPs (question 1); resistance monitoring for non-high dose pests (question 2); resistance risk of seed blend corn in the southern U.S. (question 3); Bt traits expressed in corn and cotton (question 4); resistance management for S. albicosta (question 5); mitigation of resistance (question 6); grower non-compliance with refuges in the southern U.S. (question 7); and new IRM framework for lepidopteran pests of Bt (question 8). The Panel provided the following overall summary of the major conclusions and recommendations detailed in the report.
... Although such devices are increasingly integrated into IoT solutions for agriculture (e.g. providing alerts, irrigation control, communication of sensor data [19,15,25,18,5,30]), there are no studies of which we are aware that use the devices themselves as thermometers. ...
Conference Paper
Full-text available
In the paper, we investigate using CPU temperature from small, low cost, single-board computers to predict out-door temperature in IoT-based precision agricultural settings. Temperature is a key metric in these settings that is used to inform and actuate farm operations such as irrigation scheduling, frost damage mitigation, and greenhouse management. Using cheap single-board computers as temperature sensors can drive down the cost of sensing in these applications and make it possible to monitor a large number of micro-climates concurrently. We have developed a system in which devices communicate their CPU measurements to an on-farm edge cloud. The edge cloud uses a combination of calibration, smoothing (noise removal), and linear regression to make predictions of the outdoor temperature at each device. We evaluate the accuracy of this approach for different temperature sensors, devices, and locations, as well as different training and calibration durations.
... There are also papers that have proposed "smart agriculture frameworks". Golubovic et al (2016) proposed a framework to aggregate data inputs and notify when some key events are likely to occur, such as difficult weather conditions and the spread of pests. In their framework, users could provide areas and events of interests by keywords. ...
Preprint
Full-text available
Nowadays, precision agriculture combined with modern information and communications technologies, is becoming more common in agricultural activities such as automated irrigation systems, precision planting, variable rate applications of nutrients and pesticides, and agricultural decision support systems. In the latter, crop management data analysis, based on machine learning and data mining, focuses mainly on how to efficiently forecast and improve crop yield. In recent years, raw and semi-processed agricultural data are usually collected using sensors, robots, satellites, weather stations, farm equipment, farmers and agribusinesses while the Internet of Things (IoT) should deliver the promise of wirelessly connecting objects and devices in the agricultural ecosystem. Agricultural data typically captures information about farming entities and operations. Every farming entity encapsulates an individual farming concept, such as field, crop, seed, soil, temperature, humidity, pest, and weed. Agricultural datasets are spatial, temporal, complex, heterogeneous, non-standardized, and very large. In particular, agricultural data is considered as Big Data in terms of volume, variety, velocity and veracity. Designing and developing a data warehouse for precision agriculture is a key foundation for establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. Some of the requirements for such an agricultural data warehouse are privacy, security, and real-time access among its stakeholders (e.g., farmers, farm equipment manufacturers, agribusinesses, co-operative societies, customers and possibly Government agencies). However, currently there are very few reports in the literature that focus on the design of efficient data warehouses with the view of enabling Agricultural Big Data analysis and data mining. In this paper ...
... More recently, many efforts about GIR systems have been conducted on resolving semantic ambiguities on place names [31], on extracting spatial information from the internet searches of users and from social media [22,45] and, on identification of spatial features in textual documents [57]. ...
Article
Full-text available
In this paper a system providing an efficient integration between Content-Based Image Retrieval (CBIR) and Geographic Information Retrieval (GIR) is presented. Over the years, many CBIR systems have been proposed to give a solution for an efficient use of multimedia/visual contents and other issues as performance, quality of retrieval, data heterogeneity, and multimodal information integration. The aim of the proposed approach is to prove that the use of geographic data can improve the results obtained by an image matching system based only on visual data. Our framework is composed of three parts, each of them described in detail in this paper: the first part is dedicated to CBIR, with an experimental comparison of a large number of different multimedia features to choose the one to use in the system implementation; in the second part the methodology to integrate geographic and multimedia data is showed; in the last part is presented a GIR system implementation using a “points of interest” search. An Android application has been developed for the client-side using Apache Solr as server side provider for the information retrieval functionalities. An experimental evaluation is carried out to demonstrate the effective improvement given by the combination of geographic and multimedia data. Our results have been obtained using a real dataset composed of artworks located in Naples’s museums.
Article
This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.
Conference Paper
Determining the type of places in location-based social networks will contribute to the success of various downstream tasks such as POI recommendation, location search, automatic place name database creation, and data cleaning. In this paper, we propose a multi-objective ensemble learning framework that (i) allows the accurate tagging of places into one of the three categories: public, private, or virtual, and (ii) identifying a set of solutions thus offering a wide range of possible applications. Based on the check-in records, we compute two types of place features from (i) specific patterns of individual places and (ii) latent relatedness among similar places. The features extracted from specific patterns (SP) are derived from all check-ins at a specific place. The features from latent relatedness (LR) are computed by building a graph of related places where similar types of places are connected by virtual edges. We conduct an experimental study based on a dataset of over 2.7M check-in records collected by crawling Foursquare-tagged tweets from Twitter. Experimental results demonstrate the effectiveness of our approach to this new problem and show the strength of taking various methods into account in feature extraction. Moreover, we demonstrate how place type tagging can be beneficial for place name recommendation services.
Article
Full-text available
This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.
Article
Full-text available
Gazetteers are key components of georeferenced information systems, including applications such as Web-based mapping services. Existing gazetteers lack the capabilities to fully integrate user-contributed and vernacular geographic infor- mation, as well as to support complex queries. To address these issues, a next generation gazetteer should leverage for- mal semantics, harvesting of implicit geographic information { such as geotagged photos { as well as models of trust for contributors. In this paper, we discuss these requirements in detail. We elucidate how existing standards can be in- tegrated to realize a gazetteer infrastructure allowing for bottom-up contribution as well as information exchange be- tween dierent gazetteers. We show how to ensure the qual- ity of user-contributed information and demonstrate how to improve querying and navigation using semantics-based in- formation retrieval.
Conference Paper
Full-text available
Most current statistical natural language process- ing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sam- pling, a simple Monte Carlo method used to per- form approximate inference in factored probabilis- tic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorpo- rate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consis- tency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.
Conference Paper
With the rise of human sensor observation as a major source of geospatial information, the traditional assessment of information quality based on parameters like accuracy, consistency and completeness is shifting to new measures. In volunteered geographic information (VGI) these conventional parameters are either lacking or not explicit. Regarding human observation quality as fitness for purpose, we propose to use trust and reputation as proxy measures of it. Trustworthy observations then take precedence over less trustworthy observations. Further, we propose that trust and reputation have spatial and temporal dimensions and we build computational models of trust for quality assessment including these dimensions. We present the case study of the H2.0 VGI project for water quality management. Through agent based modeling, the study has established the validity of a spatio-temporal trust model for assessing the trustworthiness and hence the quality of human observations. We first introduce a temporally sensitive trust model and then discuss the extension of the temporal model with spatial dimensions and their effects on the computational trust model.
Conference Paper
The successful execution of location-based and feature-based queries on spatial databases requires the construction of spatial indexes on the spatial attributes. This is not simple when the data is unstructured as is the case when the data is a collection of documents such as news articles, which is the domain of discourse, where the spatial attribute consists of text that can be (but is not required to be) interpreted as the names of locations. In other words, spatial data is specified using text (known as a toponym) instead of geometry, which means that there is some ambiguity involved. The process of identifying and disambiguating references to geographic locations is known as geotagging and involves using a combination of internal document structure and external knowledge, including a document-independent model of the audience's vocabulary of geographic locations, termed its spatial lexicon. In contrast to previous work, a new spatial lexicon model is presented that distinguishes between a global lexicon of locations known to all audiences, and an audience-specific local lexicon. Generic methods for inferring audiences' local lexicons are described. Evaluations of this inference method and the overall geotagging procedure indicate that establishing local lexicons cannot be overlooked, especially given the increasing prevalence of highly local data sources on the Internet, and will enable the construction of more accurate spatial indexes.
An agenda for the next generation gazetteer: Geographic information contribution and retrieval
  • C Kessler
  • K Janowicz
  • M Bishr
Kessler, C., Janowicz, K., and Bishr, M. (2009). An agenda for the next generation gazetteer: Geographic information contribution and retrieval. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.
Construction and First Analysis of a Corpus for the Evaluation and Training of Microblog/Twitter Geoparsers
  • J O Wallgrun
  • F Hardisty
  • A M Maceachren
  • M Karimzadeh
  • Y Ju
  • S Pezanowski
Wallgrun, J. O., Hardisty, F., MacEachren, A. M., Karimzadeh, M., Ju, Y., and Pezanowski, S. (2014). Construction and First Analysis of a Corpus for the Evaluation and Training of Microblog/Twitter Geoparsers. Proceedings of the 8th Workshop on Geographic Information Retrieval.