ArticlePublisher preview available

MixedTrails: Bayesian Hypotheses Comparison on Heterogeneous Sequential Data

Authors:
  • GESIS - Leibniz Institute of the Social Sciences
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Sequential traces of user data are frequently observed online and offline, e.g.,as sequences of visited websites or as sequences of locations captured by GPS. However,understanding factors explaining the production of sequence data is a challenging task,especially since the data generation is often not homogeneous. For example, navigation behavior might change in different phases of a website visit, or movement behavior may vary between groups of user. In this work, we tackle this task and propose MixedTrails, a Bayesian approach for comparing the plausibility of hypotheses regarding the generative processes of heterogeneous sequence data. Each hypothesis represents a belief about transition probabilities between a set of states that can vary between groups of observed transitions.For example, when trying to understand human movement in a city, a hypothesis assuming tourists to be more likely to move towards points of interests than locals, can be shown to be more plausible with observed data than a hypothesis assuming the opposite. Our approach incorporates these beliefs as Bayesian priors in a generative mixed transition Markov chain model, and compares their plausibility utilizing Bayes factors. We discuss analytical and approximate inference methods for calculating the marginal likelihoods for Bayes factors,give guidance on interpreting the results, and illustrate our approach with several experiments on synthetic and empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind of analysis for studying sequential data in many application areas.
Hypotheses for heterogeneous sequence data. In MixedTrails , we formulate hypotheses about heterogeneous sequence data. e.g., in the soccer example, we define two hypotheses: the homogeneous hypothesis Hhom\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {hom}$$\end{document}a assumes that players just randomly pass the ball around; the heterogeneous hypothesis Hhet\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {het}$$\end{document}, b assumes an offensive strategy in the first half of the game and a defensive strategy in the second half, cf. Fig. 1. This is formalized based on two components: group assignment probabilitiesγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\gamma }$$\end{document}, i.e., probability distributions over a set of groups for each transition, and a belief matrix of group transition probabilitiesϕg\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\phi }_g$$\end{document} for each group g. The soccer example features a special case, where group assignments are deterministic, i.e., the probabilities are either 0 or 1
… 
Synthetic data results. We compare homogeneous (Hlink\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {link}$$\end{document}) and heterogeneous hypotheses (Hlink-colored\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {link-colored}$$\end{document}, Hcolor\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {color}$$\end{document} and Hmem\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_\text {mem}$$\end{document}) on three synthetic datasets (Dlink\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_\text {link}$$\end{document}, Dcolor\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_\text {color}$$\end{document} and Dmem\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_\text {mem}$$\end{document}). We observe that the hypotheses that are fitting the respective datasets work best, illustrating that the MixedTrails approach can identify the correct ordering of the defined hypotheses. For details on interpreting the plots, see Sect. 4.1
… 
This content is subject to copyright. Terms and conditions apply.
Data Min Knowl Disc (2017) 31:1359–1390
DOI 10.1007/s10618-017-0518-x
MixedTrails: Bayesian hypothesis comparison on
heterogeneous sequential data
Martin Becker1·Florian Lemmerich2·
Philipp Singer2·Markus Strohmaier2·
Andreas Hotho1
Received: 18 December 2016 / Accepted: 5 June 2017 / Published online: 7 July 2017
© The Author(s) 2017
Abstract Sequential traces of user data are frequently observed online and offline,
e.g., as sequences of visited websites or as sequences of locations captured by GPS.
However, understanding factors explaining the production of sequence data is a chal-
lenging task, especially since the data generation is often not homogeneous. For
example, navigation behavior might change in different phases of browsing a website
or movement behavior may vary between groups of users. In this work, we tackle
this task and propose MixedTrails, a Bayesian approach for comparing the plausibil-
ity of hypotheses regarding the generative processes of heterogeneous sequence data.
Each hypothesis is derived from existing literature, theory, or intuition and represents
a belief about transition probabilities between a set of states that can vary between
groups of observed transitions. For example, when trying to understand human move-
ment in a city and given some data, a hypothesis assuming tourists to be more likely
Responsible editors: Kurt Driessens, Dragi Kocev, Marko Robnik-Šikonja, Myra Spiliopoulou
BMartin Becker
becker@informatik.uni-wuerzburg.de
Florian Lemmerich
florian.lemmerich@gesis.org
Philipp Singer
me@philippsinger.com
Markus Strohmaier
markus.strohmaier@gesis.org
Andreas Hotho
hotho@informatik.uni-wuerzburg.de
1Data Mining and Information Retrieval Group, University of Würzburg, Würzburg, Germany
2GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... To establish if one of the hypotheses is better than another we compare the marginal likelihood (the higher, the better) across a range of concentration factors (scaled by the number of states). For more details we refer to [2,9]. ...
... For future work, it may be interesting to extend the transition models as applied in this work, for example, by further investigating the influence of the road network in the context of the data provided by OpenStreetMap, or by formulating heterogeneous hypotheses to explain the overall behavior of users during the APIC campaign [2]. Finally, by employing data from other participatory sensing campaigns as well as subjective information from users may provide the necessary background information to formulate and compare hypotheses that enable further insights into human navigation behavior as well as their incentives and goals in the context of environmental studies. ...
Conference Paper
Air pollution in urban areas has become a major issue and has attracted significant public attention. As a consequence, many citizens have started campaigns for measuring the air quality of their personal environment using mobile devices. In this study, we adapt HypTrails-a Bayesian method for comparing hypotheses about human trails-in order to investigate mobility patterns from such campaigns. In particular, we derive an approach to apply HypTrails to continuous, temporally dense navigation paths as is characteristic for GPS tracks. This allows us to directly study the behavioral processes of participants. We showcase our method on the citizen science campaign APIC (the AirProbe International Challenge) yielding promising results: We find differing mobility patterns of users in restricted and unrestricted environments, and extend previous work by showing that roads and road types play an important role explaining the observed paths. This gives first insights into movement patterns of urban air quality exploration. Ultimately, we believe that our approach can help to better interpret data collected in the context of participatory sensing campaigns, and to develop new theories about the motivational processes of volunteers.
... This is called sequential pattern mining, where the goal is to develop algorithms that quickly find the most frequent subsequences in large sequence data [1,24,18,38]. Some work addresses this problem based on statistical methods, e.g., using Markov modeling techniques [22,36,45,10,44,34,23], hypothesis testing [39,6,43], or information-theoretic methods to detect "surprising" subsequences [25,13,7]. Applications include the detection of common patterns in user trajectories [44,35], testing hypotheses about generative processes of trajectory data [39], or finding clusters in sequence data [10,34]. ...
... In the below experiments, we compare HYPA to a simple frequency-based anomaly detection (FBAD) of our own design. We note that despite similar problem settings, the methods for hypothesis testing on human trails presented in [39,6] are not directly comparable with our work because the output is Bayesian evidence for a hypothesis on an entire dataset (a single number), whereas we are interested in edge-level analysis. Further, we did not compare with a method like [36] because, while based on detecting significant deviations from a Markov chain model, this method assumes that the data is given as one long sequence and detects anomalous subsequences, which does not correspond to any of the datasets we analyze here. ...
... This is closely related to sequential pattern mining [1], e.g., algorithms to quickly find the most frequent subsequences in large sequence data [20,43]. Other works address this problem based on statistical methods, e.g., using Markov modelling techniques [12,25,37,39,50,53], hypothesis testing [8,44], or information-theoretic methods to detect "surprising" subsequences [9,15,27]. Applications include the detection of common patterns in user trajectories [38,50], testing hypotheses about generative processes of trajectory data [44], or finding clusters in click streams and other sequence data [12,37]. ...
... In the below experiments, we compare HYPA to a simple frequencybased anomaly detection (FBAD) of our own design. We note that despite similar problem settings, the methods for hypothesis testing on human trails presented in [8,44] are not directly comparable with our work because the output is Bayesian evidence for a hypothesis on an entire dataset (a single number), whereas we are interested in edge-level analysis. However, in future work we could use HYPA to generate hypotheses to be tested using these methods. ...
Article
The unsupervised detection of anomalies in time series data has important applications, e.g., in user behavioural modelling, fraud detection, and cybersecurity. Anomaly detection has been extensively studied in categorical sequences, however we often have access to time series data that contain paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies we must account for the fact that such data contain a large number of independent observations of short paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem we introduce a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph, which provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.
... This is closely related to sequential pattern mining [1], e.g., algorithms to quickly find the most frequent subsequences in large sequence data [20,43]. Other works address this problem based on statistical methods, e.g., using Markov modelling techniques [12,25,37,39,50,53], hypothesis testing [8,44], or information-theoretic methods to detect "surprising" subsequences [9,15,27]. Applications include the detection of common patterns in user trajectories [38,50], testing hypotheses about generative processes of trajectory data [44], or finding clusters in click streams and other sequence data [12,37]. ...
... In the below experiments, we compare HYPA to a simple frequencybased anomaly detection (FBAD) of our own design. We note that despite similar problem settings, the methods for hypothesis testing on human trails presented in [8,44] are not directly comparable with our work because the output is Bayesian evidence for a hypothesis on an entire dataset (a single number), whereas we are interested in edge-level analysis. However, in future work we could use HYPA to generate hypotheses to be tested using these methods. ...
Preprint
The unsupervised detection of anomalies in time series data has important applications, e.g., in user behavioural modelling, fraud detection, and cybersecurity. Anomaly detection has been extensively studied in categorical sequences, however we often have access to time series data that contain paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies we must account for the fact that such data contain a large number of independent observations of short paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem we introduce a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph, which provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.
... Like Lemmerich et al. (2016); Song (2017); Bueno et al. (2020), we take an approach where candidate subgroups are evaluated using a bottom-up, heuristic search through the descriptive space. In comparison, Becker et al. (2017) take a top-down approach where the subgroups are hypothesised beforehand based on theory and evaluated using Bayes Factors. Kiseleva et al. (2013) hypothesise two groups of sequences based on descriptive information and distributional characteristics. ...
Article
Full-text available
Discrete Markov chains are frequently used to analyse transition behaviour in sequential data. Here, the transition probabilities can be estimated using varying order Markov chains, where order k specifies the length of the sequence history that is used to model these probabilities. Generally, such a model is fitted to the entire dataset, but in practice it is likely that some heterogeneity in the data exists and that some sequences would be better modelled with alternative parameter values, or with a Markov chain of a different order. We use the framework of Exceptional Model Mining (EMM) to discover these exceptionally behaving sequences. In particular, we propose an EMM model class that allows for discovering subgroups with transition behaviour of varying order. To that end, we propose three new quality measures based on information-theoretic scoring functions. Our findings from controlled experiments show that all three quality measures find exceptional transition behaviour of varying order and are reasonably sensitive. The quality measure based on Akaike’s Information Criterion is most robust for the number of observations. We furthermore add to existing work by seeking for subgroups of sequences, as opposite to subgroups of transitions. Since we use sequence-level descriptive attributes, we form subgroups of entire sequences, which is practically relevant in situations where you want to identify the originators of exceptional sequences, such as patients. We show this relevance by analysing sequences of blood glucose values of adult persons with diabetes type 2. In the experiments, we find subgroups of patients based on age and glycated haemoglobin (HbA1c), a measure known to correlate with average blood glucose values. Clinicians and domain experts confirmed the transition behaviour as estimated by the fitted Markov chain models.
... Zhang et al. [121] propose a learning scheme that offers a recursive algorithm to explore the distribution of class density for the Bayesian estimation and an automated approach to select powerful discriminant functions for the classification of high-dimensional data, while Celotto [122] proposes a unified visual approach to compare and classify a large subset of Bayesian confirmation measures. In the work of Becker et al. [123], analytical and approximate inference methods are discussed to calculate the marginal probabilities of Bayes factors, providing guidance on the interpretation of results and offering new types of analysis to study sequential data in many application areas. ...
Article
Full-text available
Data mining is a technological and scientific field that, over the years, has been gaining more importance in many areas, attracting scientists, developers, and researchers around the world. The reason for this enthusiasm derives from the remarkable benefits of its usefulness, such as the exploitation of large databases and the use of the information extracted from them in an intelligent way through the analysis and discovery of knowledge. This document provides a review of the predictive data mining techniques used for the diagnosis and detection of faults in electric equipment, which constitutes the pillar of any industrialized country. Starting from the year 2000 to the present, a revision of the methods used in the tasks of classification and regression for the diagnosis of electric equipment is carried out. Current research on data mining techniques is also listed and discussed according to the results obtained by different authors.
... In order to extract the exceptional transition behaviors, the proposed algorithm mines for subgroups whose fitted markov transition' matrix significantly differs (using an adapted manhattan distance) from the one computed over the entire population. Similarly, HypTrails (Singer et al., 2015) extended to MixedTrails (Becker et al., 2017) operationalizes bayesian model comparsion on simple markov chains (HypTrails) and hetergeneous mixed comparison markov chains (MixedTrails). Although, these methods do not consider descriptive attributes to extract groups but rather evaluate the transition behavior of an input group. ...
Thesis
Full-text available
With the rapid proliferation of data platforms collecting and curating data related to various domains such as governments data, education data, environment data or product ratings, more and more data are available online. This offers an unparalleled opportunity to study the behavior of individuals and the interactions between them. In the political sphere, being able to query datasets of voting records provides interesting insights for data journalists and political analysts. In particular, such data can be leveraged for the investigation of exceptionally consensual/controversial topics. Consider data describing the voting behavior in the European Parliament (EP). Such a dataset records the votes of each member (MEP) in voting sessions held in the parliament, as well as information on the parliamentarians (e.g., gender, national party, European party alliance) and the sessions (e.g., topic, date). This dataset offers opportunities to study the agreement or disagreement of coherent subgroups, especially to highlight unexpected behavior. It is to be expected that on the majority of voting sessions, MEPs will vote along the lines of their European party alliance. However, when matters are of interest to a specific nation within Europe, alignments may change and agreements can be formed or dissolved. For instance, when a legislative procedure on fishing rights is put before the MEPs, the island nation of the UK can be expected to agree on a specific course of action regardless of their party alliance, fostering an exceptional agreement where strong polarization exists otherwise. In this thesis, we aim to discover such exceptional (dis)agreement patterns not only in voting data but also in more generic data, called behavioral data, which involves individuals performing observable actions on entities. We devise two novel methods which offer complementary angles of exceptional (dis)agreement in behavioral data: within and between groups. These two approaches called Debunk and Deviant, ideally, enables the implementation of a sufficiently comprehensive tool to highlight, summarize and analyze exceptional comportments in behavioral data. We thoroughly investigate the qualitative and quantitative performances of the devised methods. Furthermore, we motivate their usage in the context of computational journalism.
Article
Full-text available
The tracking of tourist movements is an essential aspect in the management of sustainable tourist destinations. The current information and communication technologies provide innovative ways of collecting data on tourist movements, but it is still necessary to evaluate tools and methods of study for this challenge. At this point, mobile technologies are the best candidate for this task. Given the relevance of the topic, this paper proposes a mapping science analysis of publications on “movement of tourists” and “traceability.” It has been carried out in the two main sources WOS and SCOPUS. The term “traceability” is brought from industry and technology areas to be applied to the tourist movement/mobility tracking and management. The methodological scheme is based on a selection of search criteria with combinations of terms. The sources of specialized information in applied social sciences and technology were then selected. From there, the searches have been executed for their subsequent analysis in three stages—(I) relevance analysis filtering the results to obtain the most pertinent; (II) analysis of articles with similarity thematic, authors, journals or citations; (III) analysis of selected papers as input for the mapping analysis using Citespace. The automatic naming of clusters under the selected processing confirms that the analysis of movements is a valid scientific trend but research-oriented from the perspective of traceability is non-existent, so this approach is novel and complementary to existing ones and a potential contribution to knowledge about tourist movements. Finally, a set of methodological considerations and a classification of information capture tools are proposed. In this classification, mobile technology is the best option to enable tourist movement analysis.
Conference Paper
While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. We systematically analyze effects of link properties on the popularity of links. By utilizing mixed-effects hurdle models supplemented with descriptive insights, we find evidence of user preference towards links leading to the periphery of the network, towards links leading to semantically similar articles, and towards links in the top and left-side of the screen. We integrate these findings as Bayesian priors into a navigational Markov chain model and by doing so successfully improve the model fits. We further adapt and improve the well-known classic PageRank algorithm that assumes random navigation by accounting for observed navigational preferences of users in a weighted variation. This work facilitates understanding navigational click behavior and thus can contribute to improving link structures and algorithms utilizing these structures.
Conference Paper
While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. We systematically analyze effects of link properties on the popularity of links. By utilizing mixed-effects hurdle models supplemented with descriptive insights, we find evidence of user preference towards links leading to the periphery of the network, towards links leading to semantically similar articles, and towards links in the top and left-side of the screen. We integrate these findings as Bayesian priors into a navigational Markov chain model and by doing so successfully improve the model fits. We further adapt and improve the well-known classic PageRank algorithm that assumes random navigation by accounting for observed navigational preferences of users in a weighted variation. This work facilitates understanding navigational click behavior and thus can contribute to improving link structures and algorithms utilizing these structures.
Conference Paper
Which song will Smith listen to next? Which restaurant will Alice go to tomorrow? Which product will John click next? These applications have in common the prediction of user trajectories that are in a constant state of flux over a hidden network (e.g. website links, geographic location). Moreover, what users are doing now may be unrelated to what they will be doing in an hour from now. Mindful of these challenges we propose TribeFlow, a method designed to cope with the complex challenges of learning personalized predictive models of non-stationary, transient, and time-heterogeneous user trajectories. TribeFlow is a general method that can perform next product recommendation, next song recommendation, next location prediction, and general arbitrary-length user trajectory prediction without domain-specific knowledge. TribeFlow is more accurate and up to 413x faster than top competitors.
Conference Paper
HypTrails is a bayesian approach for comparing different hypotheses about human trails on the web. While a standard implementation exists, it exposes performance issues when working with large-scale data. In this paper, we propose a distributed implementation of HypTrails based on Apache Spark taking advantage of several structural properties inherent to HypTrails. The performance improves substantially. Our implementation is publicly available.
Conference Paper
We present a new method for detecting interpretable subgroups with exceptional transition behavior in sequential data. Identifying such patterns has many potential applications, e.g., for studying human mobility or analyzing the behavior of internet users. To tackle this task, we employ exceptional model mining, which is a general approach for identifying interpretable data subsets that exhibit unusual interactions between a set of target attributes with respect to a certain model class. Although exceptional model mining provides a well-suited framework for our problem, previously investigated model classes cannot capture transition behavior. To that end, we introduce first-order Markov chains as a novel model class for exceptional model mining and present a new interestingness measure that quantifies the exceptionality of transition subgroups. The measure compares the distance between the Markov transition matrix of a subgroup and the respective matrix of the entire data with the distance of random dataset samples. In addition, our method can be adapted to find subgroups that match or contradict given transition hypotheses. We demonstrate that our method is consistently able to recover subgroups with exceptional transition models from synthetic data and illustrate its potential in two application examples. Our work is relevant for researchers and practitioners interested in detecting exceptional transition behavior in sequential data.
Article
While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. We systematically analyze effects of link properties on the popularity of links. By utilizing mixed-effects hurdle models supplemented with descriptive insights, we find evidence of user preference towards links leading to the periphery of the network, towards links leading to semantically similar articles, and towards links in the top and left-side of the screen. We integrate these findings as Bayesian priors into a navigational Markov chain model and by doing so successfully improve the model fits. We further adapt and improve the well-known classic PageRank algorithm that assumes random navigation by accounting for observed navigational preferences of users in a weighted variation. This work facilitates understanding navigational click behavior and thus can contribute to improving link structures and algorithms utilizing these structures.
Conference Paper
Understanding human movement trajectories represents an important problem that has implications for a range of societal challenges such as city planning and evolution, public transport or crime. In this paper, we focus on geo-temporal photo trails from four different cities (Berlin, London, Los Angeles, New York) derived from Flickr that are produced by humans when taking sequences of photos in urban areas. We apply a Bayesian approach called HypTrails to assess different explanations of how the trails are produced. Our results suggest that there are common processes underlying the photo trails observed across the studied cities. Furthermore, information extracted from social media, in the form of concepts and usage statistics from Wikipedia, allows for constructing explanations for human movement trajectories.