Chapter

The Infinite Hidden Markov Model

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The proceedings of the 2001 Neural Information Processing Systems (NIPS) Conference. The annual conference on Neural Information Processing Systems (NIPS) is the flagship conference on neural computation. The conference is interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, vision, speech and signal processing, reinforcement learning and control, implementations, and diverse applications. Only about 30 percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. These proceedings contain all of the papers that were presented at the 2001 conference. Bradford Books imprint

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Here we present a methodology that allows us to infer network structures that may change between observations in a nonparametric framework whilst modelling the sequential nature of the data. To that end we have employed the infinite hidden Markov model (iHMM) of Beal et al. [1], also known as the hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) [25], in particular the "Sticky" extension of Fox et al. [5], in conjunction with a Bayesian network model of the gene regulatory network structure. The HDP-HMM allows the number of different states of the network structure to adapt as necessary to explain the observed data, including a potentially infinite number of states, of course restricted in practice by the finite number of experimental observations. ...
... To model a hidden state sequence that evolves over time we apply the methodology first introduced in Beal et al. [1] whereby a finite state Hidden Markov Model, consisting of a set of hidden states s 1 , . . . , s n over some alphabet 1 . . . ...
... The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) [1,25,26] instead applies a Dirichlet Process prior to the transition probabilities π k· out of each of the states, and uses a hierarchical structure to couple the distributions between the individual states to ensure a shared set of potential states into which transitions can be made across all of the π. This allows for an unlimited number of potential states, of course limited in practice by the number of observed data points. ...
Preprint
When analysing gene expression time series data an often overlooked but crucial aspect of the model is that the regulatory network structure may change over time. Whilst some approaches have addressed this problem previously in the literature, many are not well suited to the sequential nature of the data. Here we present a method that allows us to infer regulatory network structures that may vary between time points, utilising a set of hidden states that describe the network structure at a given time point. To model the distribution of the hidden states we have applied the Hierarchical Dirichlet Process Hideen Markov Model, a nonparametric extension of the traditional Hidden Markov Model, that does not require us to fix the number of hidden states in advance. We apply our method to exisiting microarray expression data as well as demonstrating is efficacy on simulated test data.
... Bayesian non-parametric treatment of HMMs/HSMMs automates the number of states selection procedure by Bayesian inference in a model with infinite number of states (Beal et al. 2002;Johnson and Willsky 2013). Niekum et al. (2012) used the Beta Process Autoregressive HMM for learning from unstructured demonstrations. ...
... Interested readers can find details of DPs and HDPs for specifying an infinite set of conditional transition distribution priors in Teh et al. (2006). HDP-HMM (Beal et al. 2002;Van Gael et al. 2008) is an infinite state Bayesian non-parametric generalization of the HMM with HDP prior on the transition distribution. In this model, the state transition distribution for each state follows a Dirichlet process G i ∼ DP(α, G 0 ) with concentration parameter α and shared base distribution G 0 , such that G 0 is the global Dirichlet process G 0 ∼ DP(γ, H) with concentration parameter γ and base distribution H. ...
Preprint
Small variance asymptotics is emerging as a useful technique for inference in large scale Bayesian non-parametric mixture models. This paper analyses the online learning of robot manipulation tasks with Bayesian non-parametric mixture models under small variance asymptotics. The analysis yields a scalable online sequence clustering (SOSC) algorithm that is non-parametric in the number of clusters and the subspace dimension of each cluster. SOSC groups the new datapoint in its low dimensional subspace by online inference in a non-parametric mixture of probabilistic principal component analyzers (MPPCA) based on Dirichlet process, and captures the state transition and state duration information online in a hidden semi-Markov model (HSMM) based on hierarchical Dirichlet process. A task-parameterized formulation of our approach autonomously adapts the model to changing environmental situations during manipulation. We apply the algorithm in a teleoperation setting to recognize the intention of the operator and remotely adjust the movement of the robot using the learned model. The generative model is used to synthesize both time-independent and time-dependent behaviours by relying on the principles of shared and autonomous control. Experiments with the Baxter robot yield parsimonious clusters that adapt online with new demonstrations and assist the operator in performing remote manipulation tasks.
... Hidden Markov Models (HMMs) have been particularly successful for learning temporal dynamics of an underlying process [17]. Several modifications for HMMs have been proposed, such as integrated HMM (IHMM) [18] which integrated several parameters to three hyper-parameters to model countably infinite hidden state sequences, integrated hierarchical HMM (IHHMM) [19] extended HMMs to an infinite number of hierarchical levels, and [20] applied a forwardbackward algorithm to reduce model complexity through the order of operations. However, Markov Models with hidden states usually rely on iterative learning algorithms that may be computationally expensive. ...
Preprint
This paper presents a novel data-driven technique based on the spatiotemporal pattern network (STPN) for energy/power prediction for complex dynamical systems. Built on symbolic dynamic filtering, the STPN framework is used to capture not only the individual system characteristics but also the pair-wise causal dependencies among different sub-systems. For quantifying the causal dependency, a mutual information based metric is presented. An energy prediction approach is subsequently proposed based on the STPN framework. For validating the proposed scheme, two case studies are presented, one involving wind turbine power prediction (supply side energy) using the Western Wind Integration data set generated by the National Renewable Energy Laboratory (NREL) for identifying the spatiotemporal characteristics, and the other, residential electric energy disaggregation (demand side energy) using the Building America 2010 data set from NREL for exploring the temporal features. In the energy disaggregation context, convex programming techniques beyond the STPN framework are developed and applied to achieve improved disaggregation performance.
... Variable K implicitly depends on parameters τ , γ and ς that have the following standard weak informative priors: τ ∼ G(1, 0.01), γ ∼ G(1, 0.01) and ς ∼ B (1, 1), where G(·, ·) indicates the gamma distribution in terms of shape and scale and B(·, ·) is the beta distribution. These priors allow to update the latent discrete time series {z t } T t=1 and all parameters of the sHDP-HMM using only Gibbs steps (see Fox et al., 2011;Beal et al., 2002). ...
Preprint
Winds from the North-West quadrant and lack of precipitation are known to lead to an increase of PM10 concentrations over a residential neighborhood in the city of Taranto (Italy). In 2012 the local government prescribed a reduction of industrial emissions by 10% every time such meteorological conditions are forecasted 72 hours in advance. Wind forecasting is addressed using the Weather Research and Forecasting (WRF) atmospheric simulation system by the Regional Environmental Protection Agency. In the context of distributions-oriented forecast verification, we propose a comprehensive model-based inferential approach to investigate the ability of the WRF system to forecast the local wind speed and direction allowing different performances for unknown weather regimes. Ground-observed and WRF-forecasted wind speed and direction at a relevant location are jointly modeled as a 4-dimensional time series with an unknown finite number of states characterized by homogeneous distributional behavior. The proposed model relies on a mixture of joint projected and skew normal distributions with time-dependent states, where the temporal evolution of the state membership follows a first order Markov process. Parameter estimates, including the number of states, are obtained by a Bayesian MCMC-based method. Results provide useful insights on the performance of WRF forecasts in relation to different combinations of wind speed and direction.
... A number of Bayesian nonparametric approaches embrace the infinite-dimensional nature of the problem using extensions of Dirichlet processes citepferguson1973. Among the most well-known, The Hierarchical Dirichlet process (Beal et al., 2002;Teh et al., 2006) builds a hierarchical model using the Dirichlet process, allowing it to share information between samples. Tomlinson and Escobar (1999) provide an early description of a similar model. ...
Preprint
Bayesian hierarchical models are used to share information between related samples and obtain more accurate estimates of sample-level parameters, common structure, and variation between samples. When the parameter of interest is the distribution or density of a continuous variable, a hierarchical model for continuous distributions is required. A number of such models have been described in the literature using extensions of the Dirichlet process and related processes, typically as a distribution on the parameters of a mixing kernel. We propose a new hierarchical model based on the P\'olya tree, which allows direct modeling of densities and enjoys some computational advantages over the Dirichlet process. The P\'olya tree also allows more flexible modeling of the variation between samples, providing more informed shrinkage and permitting posterior inference on the dispersion function, which quantifies the variation among sample densities. We also show how the model can be extended to cluster samples in situations where the observed samples are believed to have been drawn from several latent populations.
... Strand Prior p µ ν κ 1 κ 2 λ 1.000 −2.0 2.5 5.33 21.33 5.33 Note that our model is not to be confused with the hidden Markov Dirichlet process (HMDP) proposed by Xing and Sohn (2007). The HMDP is an implementation of a hidden Markov model with an infinite state space, originally proposed by Beal, Ghahramani and Rasmussen (2002). Their model is an instance of the Hierarchical Dirichlet Process (HDP) of Teh et al. (2006), whereas our DPM-HMM is a standard Dirichlet process with a novel centering distribution. ...
Preprint
By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.
... 2 Though not mentioned by Mochihashi et al. (2009) or Neubig et al. (2010, this construction is not exact, since transitions in a Bayesian HMM are exchangeable but not independent (Beal et al., 2001): if a word occurs twice in an utterance, its probability is slightly higher the second time. For single utterances, this bias is small and easy to correct for using a Metropolis-Hastings acceptance check (Börschinger and Johnson, 2012) using the path probability from the HMM as the proposal. ...
... The parameters of the rest of the DGP depend on the discrete state ψ t . The objective is to infer the sequence of underlying discrete states that best "explains" the observed data (Ostendorf et al., 1996;Ghahramani & Hinton, 2000;Beal et al., 2001;Fox et al., 2007;Van Gael et al., 2008;Linderman et al., 2017). In this context, non-stationarity arises from the switching behaviour of the underlying discrete process. ...
Preprint
Full-text available
We propose a unifying framework for methods that perform Bayesian online learning in non-stationary environments. We call the framework BONE, which stands for (B)ayesian (O)nline learning in (N)on-stationary (E)nvironments. BONE provides a common structure to tackle a variety of problems, including online continual learning, prequential forecasting, and contextual bandits. The framework requires specifying three modelling choices: (i) a model for measurements (e.g., a neural network), (ii) an auxiliary process to model non-stationarity (e.g., the time since the last changepoint), and (iii) a conditional prior over model parameters (e.g., a multivariate Gaussian). The framework also requires two algorithmic choices, which we use to carry out approximate inference under this framework: (i) an algorithm to estimate beliefs (posterior distribution) about the model parameters given the auxiliary variable, and (ii) an algorithm to estimate beliefs about the auxiliary variable. We show how this modularity allows us to write many different existing methods as instances of BONE; we also use this framework to propose a new method. We then experimentally compare existing methods with our proposed new method on several datasets; we provide insights into the situations that make one method more suitable than another for a given task.
... Viterbi algorithm [29] is closer approach to ours; it estimates the actual value of hidden state x by leveraging dynamic programming. While the state space X in HMM can be extended to a set of countably many states [30], HMM still focuses on modeling the transition of discrete states, while this paper considers a set of uncountably many states. ...
Article
Full-text available
This study delves into the domain of dynamical systems, specifically the forecasting of dynamical time series defined through an evolution function. Traditional approaches in this area predict the future behavior of dynamical systems by inferring the evolution function. However, these methods may confront obstacles due to the presence of missing variables, which are usually attributed to challenges in measurement and a partial understanding of the system of interest. To overcome this obstacle, we introduce the autoregressive with slack time series (ARS) model, that simultaneously estimates the evolution function and imputes missing variables as a slack time series. Assuming time-invariance and linearity in the (underlying) entire dynamical time series, our experiments demonstrate the ARS model’s capability to forecast future time series. From a theoretical perspective, we prove that a 2-dimensional time-invariant and linear system can be reconstructed by utilizing observations from a single, partially observed dimension of the system.
... This makes it ill-suited to characterising the dynamic and idiosyncratic progression through training. To address these issues, we adopted the HMM framework to capture abrupt changes, except that (i) along with the motivational factors, the latent states can describe what the animal knows about the task at any point; that (ii) we used a semi-Markov model so that latent states can persist for non-exponentially distributed numbers of trials; and that (iii) the states come from a Bayesian non-parametric structure, allowing for a degree of behavioural complexity that is only constrained by an inbuilt Occam's razor, and enabling the introduction of new states for suddenly appearing new behaviours (Beal, Ghahramani, & Rasmussen, 2001;Gershman & Blei, 2012;Heald, Lengyel, & Wolpert, 2021;Johnson & Willsky, 2013;Teh, Jordan, Beal, & Blei, 2006). ...
Preprint
Full-text available
Learning to exploit the contingencies of a complex experiment is not an easy task for animals. Individuals learn in an idiosyncratic manner, revising their approaches multiple times as they are shaped, or shape themselves, and potentially end up with different strategies. Their long-run learning curves are therefore a tantalizing target for the sort of individualized quantitative characterizations that sophisticated modelling can provide. However, any such model requires a flexible and extensible structure which can capture radically new behaviours as well as slow changes in existing ones. To this end, we suggest a dynamic input-output infinite hidden semi-Markov model, whose latent states are associated with specific components of behaviour. This model includes an infinite number of potential states and so has the capacity to describe substantially new behaviours by unearthing extra states; while dynamics in the model allow it to capture more modest adaptations to existing behaviours. We individually fit the model to data collected from more than 100 mice as they learned a contrast detection task over tens of sessions and around fifteen thousand trials each. Despite large individual differences, we found that most animals progressed through three major stages of learning, the transitions between which were marked by distinct additions to task understanding. We furthermore showed that marked changes in behaviour are much more likely to occur at the very beginning of sessions, i.e. after a period of rest, and that response biases in earlier stages are not predictive of biases later on in this task.
... One of the ways of examining crowd behavior was used in [72]. The authors proposed a way for detecting abnormal behavior from sensor data using a Hidden Markov Model [73], which is a statistical method based on a stochastic model used to model randomly changing systems. ...
Article
Full-text available
Recently, our world witnessed major events that attracted a lot of attention towards the importance of automatic crowd scene analysis. For example, the COVID-19 breakout and public events require an automatic system to manage, count, secure, and track a crowd that shares the same area. However, analyzing crowd scenes is very challenging due to heavy occlusion, complex behaviors, and posture changes. This paper surveys deep learning-based methods for analyzing crowded scenes. The reviewed methods are categorized as (1) crowd counting and (2) crowd actions recognition. Moreover, crowd scene datasets are surveyed. In additional to the above surveys, this paper proposes an evaluation metric for crowd scene analysis methods. This metric estimates the difference between calculated crowed count and actual count in crowd scene videos.
Preprint
I suggest an approach that helps the online marketers to target their Gamification elements to users by modifying the order of the list of tasks that they send to users. It is more realistic and flexible as it allows the model to learn more parameters when the online marketers collect more data. The targeting approach is scalable and quick, and it can be used over streaming data.
Article
Full-text available
Due to the emergence of graph convolutional networks (GCNs), the skeleton-based action recognition has achieved remarkable results. However, the current models for skeleton-based action analysis treat skeleton sequences as a series of graphs, aggregating features of the entire sequence by alternately extracting spatial and temporal features, i.e., using a 2D (spatial features) plus 1D (temporal features) approach for feature extraction. This undoubtedly overlooks the complex spatiotemporal fusion relationships between joints during motion, making it challenging for models to capture the connections between different temporal frames and joints. In this paper, we propose a Multimodal Graph Self-Attention Network (MGSAN), which combines GCNs with self-attention to model the spatiotemporal relationships between skeleton sequences. Firstly, we design graph self-attention (GSA) blocks to capture the intrinsic topology and long-term temporal dependencies between joints. Secondly, we propose a multi-scale spatio-temporal convolutional network for channel-wise topology modeling (CW-TCN) to model short-term smooth temporal information of joint movements. Finally, we propose a multimodal fusion strategy to fuse joint, joint movement, and bone flow, providing the model with a richer set of multimodal features to make better predictions. The proposed MGSAN achieves state-of-the-art performance on three large-scale skeleton-based action recognition datasets, with accuracy of 93.1% on NTU RGB+D 60 cross-subject benchmark, 90.3% on NTU RGB+D 120 cross-subject benchmark, and 97.0% on the NW-UCLA dataset. Code is available at https://github.com/lizaowo/MGSAN.
Article
Full-text available
In this study, we investigate the construction of hidden Markov models (HMMs) with copulas serving as the emission distributions. Additionally, we relax the traditional assumption that the number of hidden states must be predetermined before model fitting. Instead, in our approach, the number of states is estimated simultaneously with other model parameters of the copula-HMM when datasets are applied. This is achieved by incorporating the hierarchical Dirichlet process as a prior during the Bayesian inference procedure. We provide a comprehensive algorithm for this methodology, including a detailed implementable example using the t-copula, which is a novel contribution not previously available in the copula literature. The proposed estimator was validated through simulation studies, which demonstrated its superiority over traditional BIC-based approaches for model selection in HMMs. Furthermore, we applied this methodology to real data to examine the dependence structure among stock markets.
Article
Full-text available
Regular expressions (short form regex) find their application in program script synthesis, machine translation, information extraction and web applications, such as input validations. Their expressiveness and flexibility make them decidedly the best tool for many challenging text extraction tasks. Writing regex manually has been labeled as a laborious, time consuming and error prone task even for skilled programmers. An abundance of regex generation from text queries at online platforms mainly Stackoverflow and Quora signifies the automatic regex synthesis problem. Despite their popularity, a criminal lack of comprehensive literature study on the problem has also been observed. We intend to perform a detailed review of a variety of methods available for regex synthesis, repair, and learn beneficial lessons for appropriate datasets with one earnest goal: to synthesize resource efficient and correct regexes for given textual description.
Article
Full-text available
Nonhuman primates (NHPs) exhibit complex and diverse behavior that typifies advanced cognitive function and social communication, but quantitative and systematical measure of this natural nonverbal processing has been a technical challenge. Specifically, a method is required to automatically segment time series of behavior into elemental motion motifs, much like finding meaningful words in character strings. Here, we propose a solution called SyntacticMotionParser (SMP), a general-purpose unsupervised behavior parsing algorithm using a nonparametric Bayesian model. Using three-dimensional posture-tracking data from NHPs, SMP automatically outputs an optimized sequence of latent motion motifs classified into the most likely number of states. When applied to behavioral datasets from common marmosets and rhesus monkeys, SMP outperformed conventional posture-clustering models and detected a set of behavioral ethograms from publicly available data. SMP also quantified and visualized the behavioral effects of chemogenetic neural manipulations. SMP thus has the potential to dramatically improve our understanding of natural NHP behavior in a variety of contexts.
Article
Full-text available
Time series segmentation has attracted more interests in recent years, which aims to segment time series into different segments, each reflects a state of the monitored objects. Although there have been many surveys on time series segmentation, most of them focus more on change point detection (CPD) methods and overlook the advances in boundary detection (BD) and state detection (SD) methods. In this paper, we categorize time series segmentation methods into CPD, BD, and SD methods, with a specific focus on recent advances in BD and SD methods. Within the scope of BD and SD, we subdivide the methods based on their underlying models/techniques and focus on the milestones that have shaped the development trajectory of each category. As a conclusion, we found that: (1) Existing methods failed to provide sufficient support for online working, with only a few methods supporting online deployment; (2) Most existing methods require the specification of parameters, which hinders their ability to work adaptively; (3) Existing SD methods do not attach importance to accurate detection of boundary points in evaluation, which may lead to limitations in boundary point detection. We highlight the ability to working online and adaptively as important attributes of segmentation methods, the boundary detection accuracy as a neglected metrics for SD methods.
Article
Full-text available
Dynamic functional connectivity investigates how the interactions among brain regions vary over the course of an fMRI experiment. Such transitions between different individual connectivity states can be modulated by changes in underlying physiological mechanisms that drive functional network dynamics, e.g., changes in attention or cognitive effort. In this paper, we develop a multi-subject Bayesian framework where the estimation of dynamic functional networks is informed by time-varying exogenous physiological covariates that are simultaneously recorded in each subject during the fMRI experiment. More specifically, we consider a dynamic Gaussian graphical model approach where a non-homogeneous hidden Markov model is employed to classify the fMRI time series into latent neurological states. We assume the state-transition probabilities to vary over time and across subjects as a function of the underlying covariates, allowing for the estimation of recurrent connectivity patterns and the sharing of networks among the subjects. We further assume sparsity in the network structures via shrinkage priors, and achieve edge selection in the estimated graph structures by introducing a multi-comparison procedure for shrinkage-based inferences with Bayesian false discovery rate control. We evaluate the performances of our method vs alternative approaches on synthetic data. We apply our modeling framework on a resting-state experiment where fMRI data have been collected concurrently with pupillometry measurements, as a proxy of cognitive processing, and assess the heterogeneity of the effects of changes in pupil dilation on the subjects’ propensity to change connectivity states. The heterogeneity of state occupancy across subjects provides an understanding of the relationship between increased pupil dilation and transitions toward different cognitive states.
Article
As the digital world continues to expand, traditional search engines face difficulties in the ability to provide users with excellent and rapid information retrieval. Additionally, users have shown a low tolerance for retrieving irrelevant information and are increasingly expecting to be served with personalized information that meets their needs and preference. Personalized environments are of many advantages to applications such as e-commerce and research due to their need to provide the best possible results to the users as fast as possible. Personalized web environments that customize search results to specific users based on their behavior and preference while searching are becoming increasingly popular to solve this problem. Short-term and long-term user preferences are constructed from data on actions and top queries, visited content, and user preferences. Lots of websites already use this kind of personalized search, and it's making a big difference in how people use the internet
Article
Multi-function radars are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online processing framework for parameter estimation and change point detection of MFR work modes. At first, this paper designed a fully-conjugate Bayesian Non-Parametric Hidden Markov Model with a designed prior distribution (agile BNP-HMM) to represent the MFR pulse agility characteristics. Then, the proposed framework is constructed by two main parts. The first part is the agile BNP-HMM model for automatically inferring the number of HMM hidden states and emission distribution of the corresponding hidden states. An error lower bound is derived for estimation performance and the proposed algorithm is shown to be closer to the bound compared with baseline methods. The second part combines the streaming Bayesian updating to facilitate computation, and designed an online work mode change detection framework based upon the weighted sequential probability ratio test. We demonstrate that the proposed framework is consistently highly effective and robust to baseline methods on diverse simulated radar signal data and real-life benchmark datasets. The source code is available at https://github.com/JiadiBao/Agile-BNP-HMM .
Article
In this work, we offer a new approach to integrating the Amazigh language, which is a less-resourced language, into an isolated speech recognition system by exploiting the Kaldi open-source platform. Our designed system is able to recognize the ten first Amazigh digits and ten daily must-used Amazigh isolated words, which present typical syllabic structure and which are considered a good representative sample of the Amazigh language. The designed speech system was implemented using Hidden Markov Models (HMMs) with different number of Gaussian distributions. In addition, we evaluated our created system performance by varying the feature extraction methods in order to determine the optimal method for maximum performance. The best-obtained result is 93.96% was obtained with Mel Frequency Cepstral Coefficients (MFCCs) technique.
Article
Full-text available
Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This paper adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.
Article
Full-text available
Prediction has a central role in the foundations of Bayesian statistics and is now the main focus in many areas of machine learning, in contrast to the more classical focus on inference. We discuss that, in the basic setting of random sampling—that is, in the Bayesian approach, exchangeability—uncertainty expressed by the posterior distribution and credible intervals can indeed be understood in terms of prediction. The posterior law on the unknown distribution is centred on the predictive distribution and we prove that it is marginally asymptotically Gaussian with variance depending on the predictive updates , i.e. on how the predictive rule incorporates information as new observations become available. This allows to obtain asymptotic credible intervals only based on the predictive rule (without having to specify the model and the prior law), sheds light on frequentist coverage as related to the predictive learning rule, and, we believe, opens a new perspective towards a notion of predictive efficiency that seems to call for further research. This article is part of the theme issue ‘Bayesian inference: challenges, perspectives, and prospects’.
Article
Full-text available
Powered by advances in information and internet technologies, network-based applications have developed rapidly, and cybersecurity has grown more critical. Inspired by Reinforcement Learning (RL) success in many domains, this paper proposes an Intrusion Detection System (IDS) to improve cybersecurity. The IDS based on two RL algorithms, i.e., Deep Q-Learning and Policy Gradient, is carefully formulated, strategically designed, and thoroughly evaluated at the packet-level and flow-level using the CICDDoS2019 dataset. Compared to other research work in a similar line of research, this paper is focused on providing a systematic and complete design paradigm of IDS based on RL algorithms, at both the packet and flow levels. For the packet-level RL-based IDS, first, the session data are transformed into images via an image embedding method proposed in this work. A comparison between 1D-Convolutional Neural Networks (1D-CNN) and CNN for extracting features from these images (for further RL agent training) is drawn from the quantitative results. In addition, an anomaly detection module is designed to detect unknown network traffic. For flow-level IDS, a Conditional Generative Adversarial Network (CGAN) and the ε-greedy strategy are adopted in designing the exploration module for RL agent training. To improve the robustness of the intrusion detection, a sample agent with a complement reward policy of the RL agent is introduced for the purpose of adversarial training. The experimental results of the proposed RL-based IDS show improved results over the state-of-the-art algorithms presented in the literature for packet-level and flow-level IDS.
Article
Full-text available
Information about glacier hydrology is important for understanding glacier and ice sheet dynamics. However, our knowledge about water pathways and pressure remains limited, as in situ observations are sparse and methods for direct area-wide observations are limited due to the extreme and hard-to-access nature of the environment. In this paper, we present a method that allows for in situ data collection in englacial channels using sensing drifters. Furthermore, we demonstrate a model that takes the collected data and reconstructs the planar subsurface water flow paths providing spatial reference to the continuous water pressure measurements. We showcase this method by reconstructing the 2D topology and the water pressure distribution of a free-flowing englacial channel in Austre Brøggerbreen (Svalbard). The approach uses inertial measurements from submersible sensing drifters and reconstructs the water flow path between given start and end coordinates. Validation of the method was done on a separate supraglacial channel, showing an average error of 3.9 m and the total channel length error of 29 m (6.5 %). At the englacial channel, the average error is 12.1 m; the length error is 107 m (11.6 %); and the water pressure standard deviation is 3.4 hPa (0.3 %). Our method allows for mapping of subsurface water flow paths and spatially referencing the pressure distribution within. Further, our method would be extendable to the reconstruction of other, previously underexplored subsurface fluid flow paths such as pipelines or karst caves.
Article
Full-text available
More than 5% of the people around the world are deaf and have severe difficulties in communicating with normal people according to the World Health Organization (WHO). They face a real challenge to express anything without an interpreter for their signs. Nowadays, there are a lot of studies related to Sign Language Recognition (SLR) that aims to reduce this gap between deaf and normal people as it can replace the need for an interpreter. However, there are a lot of challenges facing the sign recognition systems such as low accuracy, complicated gestures, high-level noise, and the ability to operate under variant circumstances with the ability to generalize or to be locked to such limitations. Hence, many researchers proposed different solutions to overcome these problems. Each language has its signs and it can be very challenging to cover all the languages’ signs. The current study objectives: (i) presenting a dataset of 20 Arabic words, and (ii) proposing a deep learning (DL) architecture by combining convolutional neural network (CNN) and recurrent neural network (RNN). The suggested architecture reported 98% accuracy on the presented dataset. It also reported 93.4% and 98.8% for the top-1 and top-5 accuracies on the UCF-101 dataset.
Article
Full-text available
This paper aims to provide a comprehensive critical overview on how entities and their interactions in Complex Networked Systems (CNS) are modelled across disciplines as they approach their ultimate goal of creating a Digital Twin (DT) that perfectly matches the reality. We propose four complexity dimensions for the network representation and five generations of models for the dynamics modelling to describe the increasing complexity level of the CNS that will be developed towards achieving DT (e.g. CNS dynamics modelled offline in the 1st generation v.s. CNS dynamics modelled simultaneously with a two-way real time feedback between reality and the CNS in the 5th generation). Based on that, we propose a new framework to conceptually compare diverse existing modelling paradigms from different perspectives and create unified assessment criteria to evaluate their respective capabilities of reaching such an ultimate goal. Using the proposed criteria, we also appraise how far the reviewed current state-of-the-art approaches are from the idealised DTs. Finally, we identify and propose potential directions and ways of building a DT-orientated CNS based on the convergence and integration of CNS and DT utilising a variety of cross-disciplinary techniques.
Article
Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.
Article
Full text available from: https://www.uni-due.de/imperia/md/content/srs/forschung/veroeffentlichungen/deng11cop.pdf || Current research and development in recognizing and predicting driving behaviors plays an important role in the development of Advanced Driver Assistance Systems (ADAS). For this reason, many machine learning approaches have been developed and applied. Hidden Markov Model (HMM) is a suitable algorithm due to its ability to handle time series data and state transition descriptions. Therefore, this contribution will focus on a review of HMM and its applications. The aim of this contribution is to analyze the current state of various driving behavior models and related HMM-based algorithms. By examining the current available approaches, a review is provided with respect to: i) influencing factors of driving behaviors corresponding to the research objectives of different driving models, ii) summarizing HMM related methods applied to driving behavior studies, and iii) discussing limitations, issues, and future potential works of the HMM-based algorithms. Conclusions with respect to the development of intelligent driving assistant system and vehicle dynamics control systems are given. || Link to our repository: https://www.uni-due.de/srs/veroeffentlichungen-srs.php?Jahr=alle
Chapter
The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) has been used widely as a natural Bayesian nonparametric extension of the classical Hidden Markov Model for learning from sequential and time-series data. A sticky extension of the HDP-HMM has been proposed to strengthen the self-persistence probability in the HDP-HMM. However, the sticky HDP-HMM entangles the strength of the self-persistence prior and transition prior together, limiting its expressiveness. Here, we propose a more general model: the disentangled sticky HDP-HMM (DS-HDP-HMM). We develop novel Gibbs sampling algorithms for efficient inference in this model. We show that the disentangled sticky HDP-HMM outperforms the sticky HDP-HMM and HDP-HMM on both synthetic and real data, and apply the new approach to analyze neural data and segment behavioral video data.
Article
The evaluation of pilot brain activity is very important for flight safety. This study proposes a Hidden semi-Markov Model with Hierarchical prior to detect brain activity under different flight tasks. A dynamic student mixture model is proposed to detect the outlier of emission probability of HSMM. Instantaneous spectrum features are also extracted from EEG signals. Compared with other latent variable models, the proposed model shows excellent performance for the automatic inference of brain cognitive activity of pilots. The results indicate that the consideration of hierarchical model and the emission probability with t{t} mixture model improves the recognition performance for Pilots’ fatigue cognitive level.
Chapter
While the handwritten character recognition has reached a point of maturity, the recognition of handwritten mathematics is still a challenging problem. The problem usually consists of three major parts: strokes segmentation, single symbol recognition and structural analysis. In this paper, we present a review on handwritten mathematical expression recognition to show how the recognition technique is developed. In particular, we put emphasis on the differences between systems.
Chapter
Full-text available
Continuous collection of physiological data from wearable sensors enables temporal characterization of individual behaviors. Understanding the relation between an individual’s behavioral patterns and psychological states can help identify strategies to improve quality of life. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor signals and interpreting them. Here, we use a non-parametric Bayesian approach to model sensor data from multiple people and discover the dynamic behaviors they share. We apply this method to data collected from sensors worn by a population of hospital workers and show that the learned states can cluster participants into meaningful groups and better predict their cognitive and psychological states. This method offers a way to learn interpretable compact behavioral representations from multivariate sensor signals.
Article
Appropriate time series modeling of complex diffusion in soft matter systems on the microsecond time scale can provide a path towards inferring transport mechanisms and predicting bulk properties characteristic of much longer time scales. In this work we apply nonparametric Bayesian time series analysis, more specifically the sticky hierarchical Dirichlet process autoregressive hidden Markov model (HDP-AR-HMM) to solute center-of-mass trajectories generated from long molecular dynamics (MD) simulations in a cross-linked inverted hexagonal phase lyotropic liquid crystal (LLC) membrane in order to automatically detect a variety of solute dynamical modes. We can better understand the mechanisms controlling these dynamical modes by grouping the states identified by the HDP-AR-HMM into clusters based on multiple metrics aimed at distinguishing solute behavior based on their fluctuations, dwell times in each state and position within the inhomogeneous membrane structure. We analyze predominate clusters in order to relate their dynamical parameters to physical interactions between solutes and the membrane. Along with parameters of individual states, the HDP-AR-HMM simultaneously infers a transition matrix which allows us to stochastically propagate solute behavior from all of the independent trajectories onto arbitrary length time scales while still preserving the qualitative behavior characteristic of the MD trajectories. This affords a direct connection to important macroscopic observables used to characterize performance like solute flux and selectivity. Overall, this work provides a promising way to simultaneously identify transport mechanisms in nanoporous materials and project complex diffusive behavior on long time scales. Our enhanced understanding of the diverse range of solute behavior allows us to hypothesize design changes to LLC monomers aimed towards controlling the rates of solute passage, thus improving the selective performance of LLC membranes.
ResearchGate has not been able to resolve any references for this publication.