Chapter

Specifying Spatial and Temporal Characteristics of Increased Activity of Users of E-Participation Services

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Electronic participation data has a great potential for its application in studies of urban processes and systems. The popularity of e-participation services, which citizens use to report problems of urban improvement, housing and communal services, led to formation of massive datasets describing civic activity and subjective evaluation of urban environment quality. E-participation data has certain features that create distortions in the results of some research, e.g. when indirect evaluation of environmental or socioeconomic characteristics is concerned. One of the sources of such distortions are superusers. They are a small group of users whose activity is abnormally high. This abnormal activity has a significant impact on the distribution of messages. The fact that activity of e-participation service users is not equally distributed is well-known, however, a universal method of excluding the abnormal activity of superusers has never been proposed. This paper studies the distribution of activity of e-participation service users and proposes several methods of identifying and excluding peaks of increased activity of superusers in different territories and time intervals. The proposed methods were tested on data from the Russian portal “Our Saint Petersburg”. As a result, we have defined the optimal approaches to processing e-participation datasets for studies which are sensitive to the unequal distribution of subjective data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... These layers were used to count the number of points of each dataset in overlapping greenery polygons, individually for each type of place and messages. In addition we counted the number of users of -Nash Sankt-Peterburg‖ portal to identify cases of superuser influence in data [35] and area of polygons. ...
Article
Condition monitoring and timely repair of residential buildings is an important task when ensuring a comfortable life in cities. In the case of large metropolitan areas, it is a difficult task to perform continuous objective condition monitoring for tens of thousands of residential buildings by efforts of experts. However, residential infrastructure health can be predicted on the basis of indirect data. These can be objective building parameters or subjective data on citizens’ complaints about deterioration. In cities today, it is possible to collect such data in machine-readable form from various information systems. This article proposes a method to predict external deterioration of buildings on the basis of indirect data, using machine learning and SMILE Low-coding platform. Based on the results of method approbation, which used data of a metropolis, the significance of electronic participation data and objective parameters of objects for façade deterioration forecast was assessed. Options for further research are proposed to improve the quality of deterioration predicting by using data on citizens’ complaints about infrastructure damage.
Article
Full-text available
Social media (SM) can be an invaluable resource in terms of understanding and managing the effects of catastrophic disasters. In order to use SM platforms for public participatory (PP) mapping of emergency management activities, a bias investigation should be undertaken with regard to the data related to the study area (urban, regional or national, etc.) to determine the spatial data dynamics. Thus, such determinations can be made on how SM can be used and interpreted in terms of PP. In this study, the city of Istanbul was chosen for social media data research area, as it is one of the most crowded cities in the world and expecting a major earthquake. The methodology for the data investigation is: 1. Obtain data and engage sampling, 2. Identify the representation and temporal biases in the data and normalize it in response to representation bias, 3. Identify general anomalies and spatial anomalies, 4. Manipulate the trend of the dataset with the discretization of anomalies and 5. Examine the spatiotemporal bias. Using this bias investigation methodology, citizen footprint dynamics in the city were determined and reference maps (most likely regional anomaly maps, representation maps, time-space bias maps, etc.) were produced. The outcomes of the study can be summarized in four steps. First, highly active users generate the majority of the data and removing this data as a general approach within a pseudo-cleaning process means concealing a large amount of data. Second, data normalization in terms of activity levels, changes the anomaly outcome resulting from diverse representation levels of users. Third, spatiotemporally normalized data present strong spatial anomaly tendency in some parts of the central area. Fourth, trend data is dense in the central area and the spatiotemporal bias assessments show the data density varies in terms of the time of day, day of week and season of the year. The methodology proposed in this study can be used to extract the unbiased daily routines of the social media data of the regions for the normal days and this can be referred for the emergency or unexpected event cases to detect the change or impacts.
Article
Full-text available
Many social media researchers and data scientists collected geo-tagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percentage (1%) of active Twitter users can create a large portion (16%) of geo-tagged tweets. Second, there is a significant amount (57.3%) of geo-tagged tweets located outside the Twitter Streaming API's bounding box in San Diego. Third, we can detect spam, bot, cyborg tweets (data noises) by examining the "source" metadata field. The portion of data noises in geo-tagged tweets is significant (29.42% in San Diego, CA and 53.47% in Columbus, OH) in our case study. Finally, the majority of geo-tagged tweets are not created by the generic Twitter apps in Android or iPhone devices, but by other platforms, such as Instagram and Foursquare. We recommend a multi-step procedure to remove these noises for the future research projects utilizing geo-tagged tweets.
Article
Full-text available
Cities across the United States are implementing information communication technologies in an effort to improve government services. One such innovation in e-government is the creation of 311 systems, offering a centralized platform where citizens can request services, report non-emergency concerns, and obtain information about the city via hotline, mobile, or web-based applications. The NYC 311 service request system represents one of the most significant links between citizens and city government, accounting for more than 8,000,000 requests annually. These systems are generating massive amounts of data that, when properly managed, cleaned, and mined, can yield significant insights into the real-time condition of the city. Increasingly, these data are being used to develop predictive models of citizen concerns and problem conditions within the city. However, predictive models trained on these data can suffer from biases in the propensity to make a request that can vary based on socio-economic and demographic characteristics of an area, cultural differences that can affect citizens' willingness to interact with their government, and differential access to Internet connectivity. Using more than 20,000,000 311 requests - together with building violation data from the NYC Department of Buildings and the NYC Department of Housing Preservation and Development; property data from NYC Department of City Planning; and demographic and socioeconomic data from the U.S. Census American Community Survey - we develop a two-step methodology to evaluate the propensity to complain: (1) we predict, using a gradient boosting regression model, the likelihood of heating and hot water violations for a given building, and (2) we then compare the actual complaint volume for buildings with predicted violations to quantify discrepancies across the City.
Article
Full-text available
New forms of data are now widely used in social sciences, and much debate surrounds their ideal application to the study of crime problems. Limitations associated with this data, including the subjective bias in reporting are often a point of this debate. In this article, we argue that by re-conceptualizing such data and focusing on their mode of production of crowdsourcing, this bias can be understood as a reflection of people’s subjective experiences with their environments. To illustrate, we apply the theoretical framework of signal crimes to empirical analysis of crowdsourced data from an online problem reporting website. We show how this approach facilitates new insight into people’s experiences and discuss implications for advancing research on perception of crime and place.
Article
Full-text available
Location-based social networks (LBSN) have provided new possibilities for re-searchers to gain knowledge about human spatiotemporal behavior, and to make predictionsabout how people might behave through space and time in the future. An important require-ment of successfully utilizing LBSN in these regards is a thorough understanding of therespective datasets, including their inherent potential as well as their limitations. Specifically,when it comes to predictions, we must know what we can actually expect from the data, andhow we could maximize their usefulness. Yet, this knowledge is still largely lacking from theliterature. Hence, this work explores one particular aspect which is the theoretical predictabilityof LBSN datasets. The uncovered predictability is represented with an interval. The lowerbound of the interval corresponds to the amount of regular behaviors that can easily beanticipated, and represents the correct predication rate that any algorithm should be able toachieve. The upper bound corresponds to the amount of information that is contained in thedataset, and represents the maximum correct prediction rate that cannot be exceeded by anyalgorithms. Three Foursquare datasets from three American cities are studied as an example. Itis found that, within our investigated datasets, the lower bound of predictability of the humanspatiotemporal behavior is 27%, and the upper bound is 92%. Hence, the inherent potentials ofthe dataset for predicting human spatiotemporal behavior are clarified, and the revealedinterval allows a realistic assessment of the quality of predictions and thus of associatedalgorithms. Additionally, in order to provide further insight into the practical use of the dataset,the relationship between the predictability and the check-in frequencies are investigated fromthree different perspectives. It was found that the individual perspective provides no significantcorrelations between the predictability and the check-in frequency. In contrast, the same twoquantities are found to be negatively correlated from temporal and spatial perspectives. Ourstudy further indicates that the heavily frequented contexts and some extraordinary geographic features such as airports could be good starting points for effective improvements of predictionalgorithms. In general, this research provides novel knowledge regarding the nature of theLBSN dataset and practical insights for a more reasonable utilization of the dataset.
Article
Full-text available
While urban systems demonstrate high spatial heterogeneity, many urban planning, economic and political decisions heavily rely on a deep understanding of local neighborhood contexts. We show that the structure of 311 Service Requests enables one possible way of building a unique signature of the local urban context, thus being able to serve as a low-cost decision support tool for urban stakeholders. Considering examples of New York City, Boston and Chicago, we demonstrate how 311 Service Requests recorded and categorized by type in each neighborhood can be utilized to generate a meaningful classification of locations across the city, based on distinctive socioeconomic profiles. Moreover, the 311-based classification of urban neighborhoods can present sufficient information to model various socioeconomic features. Finally, we show that these characteristics are capable of predicting future trends in comparative local real estate prices. We demonstrate 311 Service Requests data can be used to monitor and predict socioeconomic performance of urban neighborhoods, allowing urban stakeholders to quantify the impacts of their interventions.
Article
Full-text available
The relevance of geographic information to mobile users must be evaluated by taking into account the usage context. This paper assumes that emerging Location-based Social Networks (LBSNs) contain contextual information rich enough to be used in order to contextualize such an evaluation process. This assumption is demonstrated through an exploratory analysis of a Foursquare check-in dataset, which reveals the impacts of two contextual factors—temporal and spatial—on mobility patterns. This paper then proposes an approach that may be used to contextualize the evaluation of geographic information’s relevance. The proposed algorithm links a priori relevance to the contextualized relevance using the hidden impacts of contextual factors. Improved performance from the experiments carried out confirms the validity of the proposed approach, as well as the benefits of utilizing contextual information within the relevance evaluation process.
Article
Full-text available
The collection of large-scale administrative records in electronic form by many cities provides a new opportunity for the measurement and longitudinal tracking of neighborhood characteristics, but one that will require novel methodologies that convert such data into research-relevant measures. The authors illustrate these challenges by developing measures of “broken windows” from Boston’s constituent relationship management (CRM) system (aka 311 hotline). A 16-month archive of the CRM database contains more than 300,000 address-based requests for city services, many of which reference physical incivilities (e.g., graffiti removal). The authors carry out three ecometric analyses, each building on the previous one. Analysis 1 examines the content of the measure, identifying 28 items that constitute two independent constructs, private neglect and public denigration. Analysis 2 assesses the validity of the measure by using investigator-initiated neighborhood audits to examine the “civic response rate” across neighborhoods. Indicators of civic response were then extracted from the CRM database so that measurement adjustments could be automated. These adjustments were calibrated against measures of litter from the objective audits. Analysis 3 examines the reliability of the composite measure of physical disorder at different spatiotemporal windows, finding that census tracts can be measured at two-month intervals and census block groups at six-month intervals. The final measures are highly detailed, can be tracked longitudinally, and are virtually costless. This framework thus provides an example of how new forms of large-scale administrative data can yield ecometric measurement for urban science while illustrating the methodological challenges that must be addressed.
Conference Paper
Full-text available
Location-sharing services such as Foursquare provide a rich source of information about the visits of users to locations. In the case of Foursquare, users voluntarily ‘check in’ to places they visit using a mobile application. An analysis of these data may reveal differences in users personality in terms of their mobility habits, preferred places, and action and location patterns. This knowledge about user behaviour can be used, in addition to information about their preferences, to improve current recommendation systems for mobile platforms.
Chapter
Electronic participation tools are becoming increasingly popular among citizens as a means of communication with the authorities. Residents actively use dedicated websites and mobile applications to send electronic appeals on problems of the urban environment. In some big cities, it can be hundreds of thousands and even millions of messages annually. Big data allows for the analysis of civic activity and subjective perception of the environment by residents.
Chapter
Active development of information and communication technologies in recent decades has significantly affected modern life in cities. Civil society has obtained e-participation tools, such as tools for sending electronic appeals on problems of the urban environment to the authorities to initiate their solving. At the same time, local web communities have emerged and are now widely spread due to social media. Within the framework of the research, it was suggested that the number and concentration of appeals on problems of the urban environment in individual local areas could be directly related to the existence of active web communities of residents in these territories. To test this hypothesis, appeals sent by residents of St. Petersburg for 2 years were analyzed. As a result, zones of high and low civic activity were identified, and local communities existing within the boundaries of these zones were studied. The hypothesis regarding the connection between the existence of local web communities in a certain urban area and civic activity was confirmed. At the same time, no clear correlation was identified between the headcount of local web communities and the number of appeals sent by residents of the territories to which these local communities belong. Further studies using alternative sources of appeals and new territories as examples will make it possible to supplement the existing data and to obtain a more accurate assessment of the relationship of the described factors.
Article
Civic technologies levy advances in digital tools to promote civic engagement, giving people a voice to participate in public decision-making. While democratizing participation, the use of such civic tech also leaves behind a digital trace of the behavior of its users. This article uses such a digital trace to explore spatial patterns in active guardianship of public space. Through mapping people’s participation in a platform for reporting neighborhood concerns (a form of digitally enabled guardianship), the spatial range of guardianship is unpacked using exploratory spatial data analysis. Typologies for guardianship behavior are then created using k-means clustering. Results provide an insight into the heterogeneity of spatial behavior of different groups of guardians outside the home environment. Guardians appear to not be limited to activity within a neighborhood, and instead cover a larger awareness space with nodes and paths, and also show distinct patterns, indicating heterogeneity in guardianship patterns. Recommendations are made for to consider operationalizing guardians as heterogeneous, and active in their entire activity space, rather than homogeneous groups assigned as crime prevention forces to a residential area.
Article
Local governments operate 311 service request lines across the United States, and the publicly available data from these lines provide a continuously measured, geographically fine-grained, and non-self-reported measure of citizens’ interactions with government. It seems a promising measure of neighborhood political participation. However, these data are empirically and theoretically different from many common citizen-level participation measures. We compare geographically aggregated 311 call data with three other measures of political and civic participation: voter turnout, political donations, and census return rates. We show that rates of 311 calls are negatively related to lower cost activities (voter turnout and census return rates), but positively related to the high-cost activity of campaign donation. We caution against interpreting 311 data as a generic measure of political engagement or participation, at least in the absence of high-quality controls for neighborhood condition. However, we argue that these data are still potentially useful for researchers, because they are by definition a measure of the service demands that neighborhoods place on city governments.
Article
The paper introduces a methodology for using a city’s administrative data to study custodianship, or those behaviors aimed at preventing physical disorder in the public space. Custodianship was operationalized through requests for city services regarding public maintenance (e.g., potholes), provided by the database generated by Boston, MA’s hotline for requesting city services (i.e., 311). Users can register with the system, permitting analysis of individual differences in custodianship (N = 12,361), including frequency of reports, variety of issues reported, and geographical range of reports. Home location (available for N = 7,433) was combined with Census statistics to infer demographic characteristics. Most (76%) reported one case. There was no evidence that individuals specialize on a single type of issue. Most reported issues over a narrow geographic range (80% within 2 blocks of home). Homeowners were three times more likely than renters to report public issues. A technique for estimating the home locations of users is also tested.
Book
The Concise Encyclopedia of Statistics presents the essential information about statistical tests, concepts, and analytical methods in language that is accessible to practitioners and students of the vast community using statistics in medicine, engineering, physical science, life science, social science, and business/economics.
Article
In large scale online multi-user communities, the phenomenon of 'participation inequality,' has been described as generally following a more or less 90-9-1 rule [9]. In this paper, we examine crowdsourcing participation levels inside the enterprise (within a company's firewall) and show that it is possible to achieve a more equitable distribution of 33-66-1. Accordingly, we propose a SCOUT ((S)uper Contributor, (C)ontributor, and (OUT)lier)) model for describing user participation based on quantifiable effort-level metrics. In support of this framework, we present an analysis that measures the quantity of contributions correlated with responses to motivation and incentives. In conclusion, SCOUT provides the task-based categories to characterize participation inequality that is evident in online communities, and crucially, also demonstrates the inequality curve (and associated characteristics) in the enterprise domain.
Noise and the city: leveraging crowdsourced big data to examine the spatiotemporal relationship between urban development and noise annoyance
  • A Hong
  • B Kim
  • M Widener
Smart-city concept: functioning of feedback mechanisms in the context of e-participation of citizens
  • A Chugunov