IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society

Published by Institute of Electrical and Electronics Engineers
Online ISSN: 1941-0492
Print ISSN: 1083-4419
Publications
This paper develops a stability analysis and controller synthesis methodology for a discrete affine fuzzy system based on the convex optimization techniques. In analysis, the stability condition under which the affine fuzzy system is quadratically stable is derived. Then, the condition Is recast in the formulation of Linear Matrix Inequalities (LMI) and numerically addressed. The emphasis of this paper, however, is on the synthesis of fuzzy controller based on the derived stability condition. In synthesis, the stabilizability condition turns out to be in the formulation of nonconvex matrix inequalities and is solved numerically in an iterative manner. Discrete iterative LMI (ILMI) approach is proposed to obtain the feasible solution for the synthesis of the affine fuzzy system. Finally, the applicability of the suggested methodology is demonstrated via some examples and computer simulations.
 
To date, most of the existing bargaining models are designed for supporting negotiation in only one market involving only two types of participants (buyers and sellers). This work devises a complex negotiation mechanism that supports negotiation activities among three types of participants in multiple interrelated markets. The complex negotiation mechanism consists of: 1) a bargaining-position-estimation (BPE) strategy for the multilateral negotiations between consumer and broker agents in a service market and 2) a regression-based coordination (RBC) strategy for concurrent negotiations between broker and provider agents in multiple resource markets. The negotiation outcomes between broker and provider agents in a resource market can potentially influence the negotiation outcomes between broker and consumer agents in a service market. Empirical results show that agents adopting the BPE strategy can better respond to different market conditions than agents adopting the time-dependent strategy because they do not make excessive (respectively, inadequate) amounts of concessions in favorable (respectively, unfavorable) markets. In the concurrent negotiations in multiple resource markets, empirical results show that broker agents adopting the RBC strategy achieved significantly higher utilities, higher success rates, and faster negotiation speed than broker agents adopting the utility-oriented and patient coordination strategies.
 
This paper discusses the trade-off between accuracy, reliability and computing time in global optimization. Particular compromises provided by traditional methods (Quasi-Newton and Nelder-Mead's simplex methods) and genetic algorithms are addressed and illustrated by a particular application in the field of nonlinear system identification. Subsequently, new hybrid methods are designed, combining principles from genetic algorithms and "hill-climbing" methods in order to find a better compromise to the trade-off. Inspired by biology and especially by the manner in which living beings adapt themselves to their environment, these hybrid methods involve two interwoven levels of optimization, namely evolution (genetic algorithms) and individual learning (Quasi-Newton), which cooperate in a global process of optimization. One of these hybrid methods appears to join the group of state-of-the-art global optimization methods: it combines the reliability properties of the genetic algorithms with the accuracy of Quasi-Newton method, while requiring a computation time only slightly higher than the latter.
 
In this paper, we first present a simple but effective L1-norm-based two-dimensional principal component analysis (2DPCA). Traditional L2-norm-based least squares criterion is sensitive to outliers, while the newly proposed L1-norm 2DPCA is robust. Experimental results demonstrate its advantages.
 
Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. Virtually all of the existing vision systems for facial muscle action detection deal only with frontal-view face images and cannot handle temporal dynamics of facial actions. In this paper, we present a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences. We exploit particle filtering to track 15 facial points in an input face-profile sequence, and we introduce facial-action-dynamics recognition from continuous video input using temporal rules. The algorithm performs both automatic segmentation of an input video into facial expressions pictured and recognition of temporal segments (i.e., onset, apex, offset) of 27 AUs occurring alone or in a combination in the input face-profile video. A recognition rate of 87% is achieved.
 
Measurement of volume and surface area of the frontal, parietal, temporal and occipital lobes from magnetic resonance (MR) images shows promise as a method for use in diagnosis of dementia. This article presents a novel computer-aided system for automatically segmenting the cerebral lobes from 3T human brain MR images. Until now, the anatomical definition of cerebral lobes on the cerebral cortex is somewhat vague for use in automatic delineation of boundary lines, and there is no definition of cerebral lobes in the interior of the cerebrum. Therefore, we have developed a new method for defining cerebral lobes on the cerebral cortex and in the interior of the cerebrum. The proposed method determines the boundaries between the lobes by deforming initial surfaces. The initial surfaces are automatically determined based on user-given landmarks. They are smoothed and deformed so that the deforming boundaries run along the hourglass portion of the three-dimensional shape of the cerebrum with fuzzy rule-based active contour and surface models. The cerebrum is divided into the cerebral lobes according to the boundaries determined using this method. The reproducibility of our system with a given subject was assessed by examining the variability of volume and surface area in three healthy subjects, with measurements performed by three beginners and one expert user. The experimental results show that our system segments the cerebral lobes with high reproducibility.
 
Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built. ELM works for the "generalized" single-hidden-layer feedforward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to SVM, polynomial network, and the conventional feedforward neural networks. This paper shows the following: 1) ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly; 2) from the optimization method point of view, ELM has milder optimization constraints compared to LS-SVM and PSVM; 3) in theory, compared to ELM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity; and 4) in theory, ELM can approximate any target continuous function and classify any disjoint regions. As verified by the simulation results, ELM tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multiclass cases) generalization performance at much faster learning speed (up to thousands times) than traditional SVM and LS-SVM.
 
The remarkable capability of living organisms to adapt to unknown environments is due to learning mechanisms that are totally different from the current artificial machine-learning paradigm. Computational media composed of identical elements that have simple activity rules play a major role in biological control, such as the activities of neurons in brains and the molecular interactions in intracellular control. As a result of integrations of the individual activities of the computational media, new behavioral patterns emerge to adapt to changing environments. We previously implemented this feature of biological controls in a form of machine learning and succeeded to realize bipedal walking without the robot model or trajectory planning. Despite the success of bipedal walking, it was a puzzle as to why the individual activities of the computational media could achieve the global behavior. In this paper, we answer this question by taking a statistical approach that connects the individual activities of computational media to global network behaviors. We show that the individual activities can generate optimized behaviors from a particular global viewpoint, i.e., autonomous rhythm generation and learning of balanced postures, without using global performance indices.
 
A novel fuzzy neural network, the pseudo outer-product-based fuzzy neural network using the singleton fuzzifier together with the approximate analogical reasoning schema, is proposed in this paper. The network is referred to as the singleton fuzzifier POPFNN-AARS, the singleton fuzzifier POPFNN-AARS employs the approximate analogical reasoning schema (AARS) instead of the commonly used truth value restriction (TVR) method. This makes the structure and learning algorithms of the singleton fuzzifier POPFNN-AARS simple and conceptually clearer than those of the POPFNN-TVR model. Different similarity measures (SM) and modification functions (FM) for AARS are investigated. The structures and learning algorithms of the proposed singleton fuzzifer POPFNN-AARS are presented. Several sets of real-life data are used to test the performance of the singleton fuzzifier POPFNN-AARS and their experimental results are presented for detailed discussion.
 
The general task of abduction is to infer a hypothesis that best explains a set of data. A typical subtask of this is to synthesize a composite hypothesis that best explains the entire data from elementary hypotheses which can explain portions of it. The synthesis subtask of abduction is computationally expensive, more so in the presence of certain types of interactions between the elementary hypotheses. In this paper, we first formulate the abduction task as a nonmonotonic constrained-optimization problem. We then consider a special version of the general abduction task that is linear and monotonic. Next, we describe a neural network based on the Hopfield model of computation for the special version of the abduction task. The connections in this network are symmetric, the energy function contains product forms, and the minimization of this function requires a network of order greater than two. We then discuss another neural architecture which is composed of functional modules that reflect the structure of the abduction task. The connections in this second-order network are asymmetric. We conclude with a discussion of how the second architecture may be extended to address the general abduction task.
 
In his paper, we introduce a model of generalization and specialization of information granules. The information granules themselves are modeled as fuzzy sets or fuzzy relations. The generalization is realized by or-ing fuzzy sets while the specialization is completed through logic and operation. These two logic operators are realized using triangular norms (that is t- and a-norms). We elaborate on two (top-down and bottom-up) strategies of constructing information granules that arise as results of generalization and specialization. Various triangular norms are experimented with and some conclusions based on numeric studies are derived.
 
In this paper, we investigate how to design an efficient heuristic algorithm under the guideline of the backbone and the fat, in the context of the p-median problem. Given a problem instance, the backbone variables are defined as the variables shared by all optimal solutions, and the fat variables are defined as the variables that are absent from every optimal solution. Identification of the backbone (fat) variables is essential for the heuristic algorithms exploiting such structures. Since the existing exact identification method, i.e., limit crossing (LC), is time consuming and sensitive to the upper bounds, it is hard to incorporate LC into heuristic algorithm design. In this paper, we develop the accelerated-LC (ALC)-based multilevel algorithm (ALCMA). In contrast to LC which repeatedly runs the time-consuming Lagrangian relaxation (LR) procedure, ALC is introduced in ALCMA such that LR is performed only once, and every backbone (fat) variable can be determined in O(1) time. Meanwhile, the upper bound sensitivity is eliminated by a dynamic pseudo upper bound mechanism. By combining ALC with the pseudo upper bound, ALCMA can efficiently find high-quality solutions within a series of reduced search spaces. Extensive empirical results demonstrate that ALCMA outperforms existing heuristic algorithms in terms of the average solution quality.
 
In this paper, a cyclic-motion generation (CMG) scheme at the acceleration level is proposed to remedy the joint-angle drift phenomenon of redundant robot manipulators which are controlled at the joint-acceleration level or torque level. To achieve this, a cyclic-motion criterion at the joint-acceleration level is exploited. This criterion, together with the joint-angle limits, joint-velocity limits, and joint-acceleration limits, is considered into the scheme formulation. In addition, the neural-dynamic method of Zhang et al. is employed to explain and analyze the effectiveness of the proposed criterion. Then, the scheme is reformulated as a quadratic program, which is solved by a primal-dual neural network. Furthermore, four tracking path simulations verify the effectiveness and accuracy of the proposed acceleration-level CMG scheme. Moreover, the comparisons between the proposed acceleration-level CMG scheme and the velocity-level scheme demonstrate that the former is safer and more applicable. The experiment on a physical robot system further verifies the physical realizability of the proposed acceleration-level CMG scheme.
 
Fuzzy systems are excellent approximators of known functions or for the dynamic response of a physical system. We propose a new approach to approximate any known function by a Takagi-Sugeno-Kang fuzzy system with a guaranteed upper bound on the approximation error. The new approach is also used to approximately represent the behavior of a dynamic system from its input-output pairs using experimental data with known error bounds. We provide sufficient conditions for this class of fuzzy systems to be universal approximators with specified error bounds. The new conditions require a smaller number of membership functions than all previously published conditions. We illustrate the new results and compare them to published error bounds through numerical examples.
 
Stochastic discrimination (SD) depends on a discriminant function for classification. An improved SD is introduced to reduce the error rate of the standard SD in the context of a two-class classification problem. The learning procedure of the improved SD consists of two stages. Initially a standard SD, but with shorter learning period is carried out to identify an important space where all the misclassified samples are located. Then the standard SD is modified by 1) restricting sampling in the important space, and 2) introducing a new discriminant function for samples in the important space. It is shown by mathematical derivation that the new discriminant function has the same mean, but with a smaller variance than that of the standard SD for samples in the important space. It is also analyzed that the smaller the variance of the discriminant function, the lower the error rate of the classifier. Consequently, the proposed improved SD improves standard SD by its capability of achieving higher classification accuracy. Illustrative examples are provided to demonstrate the effectiveness of the proposed improved SD.
 
The generation of three-dimensional (3-D) digital models produced by optical technologies in some cases involves metric errors. This happens when small high-resolution 3-D images are assembled together in order to model a large object. In some applications, as for example 3-D modeling of Cultural Heritage, the problem of metric accuracy is a major issue and no methods are currently available for enhancing it. The authors present a procedure by which the metric reliability of the 3-D model, obtained through iterative alignments of many range maps, can be guaranteed to a known acceptable level. The goal is the integration of the 3-D range camera system with a close range digital photogrammetry technique. The basic idea is to generate a global coordinate system determined by the digital photogrammetric procedure, measuring the spatial coordinates of optical targets placed around the object to be modeled. Such coordinates, set as reference points, allow the proper rigid motion of few key range maps, including a portion of the targets, in the global reference system defined by photogrammetry. The other 3-D images are normally aligned around these locked images with usual iterative algorithms. Experimental results on an anthropomorphic test object, comparing the conventional and the proposed alignment method, are finally reported.
 
Traditional robot calibration implements model and modeless methods. The compensation of position error in modeless method is to move the end-effector of robot to the target position in the workspace, and to find the position error of that target position by using a bilinear interpolation method based on the neighboring 4-point's errors around the target position. A camera or other measurement devices can be utilized to find or measure this position error, and compensate this error with the interpolation result. This paper provides a novel fuzzy interpolation method to improve the compensation accuracy obtained by using a bilinear interpolation method. A dynamic online fuzzy inference system is implemented to meet the needs of fast real-time control system and calibration environment. The simulated results show that the compensation accuracy can be greatly improved by using this fuzzy interpolation method compared with the bilinear interpolation method.
 
An evolutionary approach to designing accurate classifiers with a compact fuzzy-rule base using a scatter partition of feature space is proposed, in which all the elements of the fuzzy classifier design problem have been moved in parameters of a complex optimization problem. An intelligent genetic algorithm (IGA) is used to effectively solve the design problem of fuzzy classifiers with many tuning parameters. The merits of the proposed method are threefold: 1) the proposed method has high search ability to efficiently find fuzzy rule-based systems with high fitness values, 2) obtained fuzzy rules have high interpretability, and 3) obtained compact classifiers have high classification accuracy on unseen test patterns. The sensitivity of control parameters of the proposed method is empirically analyzed to show the robustness of the IGA-based method. The performance comparison and statistical analysis of experimental results using ten-fold cross validation show that the IGA-based method without heuristics is efficient in designing accurate and compact fuzzy classifiers using 11 well-known data sets with numerical attribute values.
 
We consider the microaggregation problem (MAP) that involves partitioning a set of individual records in a microdata file into a number of mutually exclusive and exhaustive groups. This problem, which seeks for the best partition of the microdata file, is known to be NP-hard and has been tackled using many heuristic solutions. In this paper, we present the first reported fixed-structure-stochastic-automata-based solution to this problem. The newly proposed method leads to a lower value of the information loss (IL), obtains a better tradeoff between the IL and the disclosure risk (DR) when compared with state-of-the-art methods, and leads to a superior value of the scoring index, which is a criterion involving a combination of the IL and the DR. The scheme has been implemented, tested, and evaluated for different real-life and simulated data sets. The results clearly demonstrate the applicability of learning automata to the MAP and its ability to yield a solution that obtains the best tradeoff between IL and DR when compared with the state of the art.
 
Ant colony optimization (ACO) has widely been applied to solve combinatorial optimization problems in recent years. There are few studies, however, on its convergence time, which reflects how many iteration times ACO algorithms spend in converging to the optimal solution. Based on the absorbing Markov chain model, we analyze the ACO convergence time in this paper. First, we present a general result for the estimation of convergence time to reveal the relationship between convergence time and pheromone rate. This general result is then extended to a two-step analysis of the convergence time, which includes the following: 1) the iteration time that the pheromone rate spends on reaching the objective value and 2) the convergence time that is calculated with the objective pheromone rate in expectation. Furthermore, four brief ACO algorithms are investigated by using the proposed theoretical results as case studies. Finally, the conclusions of the case studies that the pheromone rate and its deviation determine the expected convergence time are numerically verified with the experiment results of four one-ant ACO algorithms and four ten-ant ACO algorithms.
 
This paper investigates the joint-structured-sparsity-based methods for transient acoustic signal classification with multiple measurements. By joint structured sparsity, we not only use the sparsity prior for each measurement but we also exploit the structural information across the sparse representation vectors of multiple measurements. Several different sparse prior models are investigated in this paper to exploit the correlations among the multiple measurements with the notion of the joint structured sparsity for improving the classification accuracy. Specifically, we propose models with the joint structured sparsity under different assumptions: same sparse code model, common sparse pattern model, and a newly proposed joint dynamic sparse model. For the joint dynamic sparse model, we also develop an efficient greedy algorithm to solve it. Extensive experiments are carried out on real acoustic data sets, and the results are compared with the conventional discriminative classifiers in order to verify the effectiveness of the proposed method.
 
We introduce a method for strategy acquisition in nonzero-sum n -player games and empirically validate it by applying it to a well-known benchmark problem in this domain, namely, the double-auction market. Many existing approaches to strategy acquisition focus on attempting to find strategies that are robust in the sense that they are good all-round performers against all-comers. We argue that, in many economic and multiagent scenarios, the robustness criterion is inappropriate; in contrast, our method focuses on searching for strategies that are likely to be adopted by participating agents, which is formalized as the size of a strategy's basins of attraction under the replicator dynamics.
 
This paper presents an imitation learning system capable of learning tasks in a complex dynamic real-time environment. In this paper, we argue that social learning should be thought of as a special case of general skill learning, and that the biases it presents to the skill learning problem radically simplify learning for species with sufficient innate predisposition to harness this power. We decompose skill learning into four subproblems, then show how a modification of Roy's CELL system can address all these problems simultaneously. Our system is demonstrated working in the domain of a real-time virtual-reality game, Unreal Tournament.
 
Compared to object-based registration, feature-based registration is much less complex. However, in order for feature-based registration to work, the two image stacks under consideration must have the same acquisition tilt angle and the same anatomical location - two requirements that are not always fulfilled. In this paper, we propose a technique that reconstructs two sets of medical images acquired with different acquisition angles and anatomical cross sections into one set of images of identical scanning orientation and positions. The space correlation information among the two image stacks is first extracted and is used to correct the tilt angle and anatomical position differences in the image stacks. Satisfactory reconstruction results were presented to prove our points.
 
The trend is the regional feature of a surface 
Exploration of object surface
A curve on the sectional plane
The modeling process Finally the iterative modeling process is illustrated in Fig. 5, in which the symbols Si (i=1, 2, ..., 6) represent the states of: S1: Acquisition of a view S2: Reconstruction of the 3-D local model S3: Registration and fusion with the global model S4: Model analysis and completion condition checking S5: Ranking for selection of exploration directions S6: Computing trend surface and determining next view S7: Moving the robot to the new viewpoint.
The next viewpoint and 3D surface obtained
A novel method is proposed in this paper for automatic acquisition of three-dimensional (3-D) models of unknown objects by an active vision system, in which the vision sensor is to be moved from one viewpoint to the next around the target to obtain its complete model. In each step, sensing parameters are determined automatically for incrementally building the 3-D target models. The method is developed by analyzing the target's trend surface, which is the regional feature of a surface for describing the global tendency of change. While previous approaches to trend analysis are usually focused on generating polynomial equations for interpreting regression surfaces in three dimensions, this paper proposes a new mathematical model for predicting the unknown area of the object surface. A uniform surface model is established by analyzing the surface curvatures. Furthermore, a criterion is defined to determine the exploration direction, and an algorithm is developed for determining the parameters of the next view. Implementation of the method is carried out to validate the proposed method.
 
We present a new approach for online incremental word acquisition and grammar learning by humanoid robots. Using no data set provided in advance, the proposed system grounds language in a physical context, as mediated by its perceptual capacities. It is carried out using show-and-tell procedures, interacting with its human partner. Moreover, this procedure is open-ended for new words and multiword utterances. These facilities are supported by a self-organizing incremental neural network, which can execute online unsupervised classification and topology learning. Embodied with a mental imagery, the system also learns by both top-down and bottom-up processes, which are the syntactic structures that are contained in utterances. Thereby, it performs simple grammar learning. Under such a multimodal scheme, the robot is able to describe online a given physical context (both static and dynamic) through natural language expressions. It can also perform actions through verbal interactions with its human partner.
 
In this research, a novel vehicle-borne system of measuring three-dimensional (3-D) urban data using single-row laser range scanners is proposed. Two single-row laser range scanners are mounted on the roof of a vehicle, doing horizontal and vertical profiling respectively. As the vehicle moves ahead, a horizontal and a vertical range profile of the surroundings are captured at each odometer trigger. The freedom of vehicle motion is reduced from six to three by assuming that the ground surface is flat and smooth so resulting in the vehicle moving on almost the same horizontal plane. Horizontal range profiles, which have an overwhelming overlay between successive ones, are registered to trace vehicle location and attitude. Vertical range profiles are aligned to the coordinate system of the horizontal one according to the physical geometry between the pair of laser range scanners, and subsequently to a global coordinate system to make up 3-D data. An experiment is conducted where 3-D data of a real urban scene is obtained by registering and integrating 2412 horizontal and vertical range profiles. Two ground truths are used in examination. They are the outputs of a GPS/INS/Odometer based positioning system and a 1:500 digital map of the testing site. Accuracy and efficiency of the method in measuring 3-D urban scene is demonstrated.
 
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods.
 
The conveyance and recognition of affect and emotion partially determine how people interact with others and how they carry out and perform in their day-to-day activities. Hence, it is becoming necessary to endow technology with the ability to recognize users' affective states to increase the technologies' effectiveness. This paper makes three contributions to this research area. First, we demonstrate recognition models that automatically recognize affective states and affective dimensions from non-acted body postures instead of acted postures. The scenario selected for the training and testing of the automatic recognition models is a body-movement-based video game. Second, when attributing affective labels and dimension levels to the postures represented as faceless avatars, the level of agreement for observers was above chance level. Finally, with the use of the labels and affective dimension levels assigned by the observers as ground truth and the observers' level of agreement as base rate, automatic recognition models grounded on low-level posture descriptions were built and tested for their ability to generalize to new observers and postures using random repeated subsampling validation. The automatic recognition models achieve recognition percentages comparable to the human base rates as hypothesized.
 
This paper presents our response to the first international challenge on facial emotion recognition and analysis. We propose to combine different types of features to automatically detect action units (AUs) in facial images. We use one multikernel support vector machine (SVM) for each AU we want to detect. The first kernel matrix is computed using local Gabor binary pattern histograms and a histogram intersection kernel. The second kernel matrix is computed from active appearance model coefficients and a radial basis function kernel. During the training step, we combine these two types of features using the recently proposed SimpleMKL algorithm. SVM outputs are then averaged to exploit temporal information in the sequence. To evaluate our system, we perform deep experimentation on several key issues: influence of features and kernel function in histogram-based SVM approaches, influence of spatially independent information versus geometric local appearance information and benefits of combining both, sensitivity to training data, and interest of temporal context adaptation. We also compare our results with those of the other participants and try to explain why our method had the best performance during the facial expression recognition and analysis challenge.
 
In expression recognition and many other computer vision applications, the recognition performance is greatly improved by adding a layer of nonlinear texture filters between the raw input pixels and the classifier. The function of this layer is typically known as feature extraction. Popular filter types for this layer are Gabor energy filters (GEFs) and local binary patterns (LBPs). Recent work [1] suggests that adding a second layer of nonlinear filters on top of the first layer may be beneficial. However, it is unclear what is the best architecture of layers and selection of filters. In this paper, we present a thorough empirical analysis of the performance of single-layer and dual-layer texture-based approaches for action unit recognition. For the single hidden layer case, GEFs perform consistently better than LBPs, which may be due to their robustness to jitter and illumination noise as well as to their ability to encode texture at multiple resolutions. For dual-layer case, we confirm that, while small, the benefit of adding this second layer is reliable and consistent across data sets. Interestingly for this second layer, LBPs appear to perform better than GEFs.
 
In a clinical setting, pain is reported either through patient self-report or via an observer. Such measures are problematic as they are: 1) subjective, and 2) give no specific timing information. Coding pain as a series of facial action units (AUs) can avoid these issues as it can be used to gain an objective measure of pain on a frame-by-frame basis. Using video data from patients with shoulder injuries, in this paper, we describe an active appearance model (AAM)-based system that can automatically detect the frames in video in which a patient is in pain. This pain data set highlights the many challenges associated with spontaneous emotion detection, particularly that of expression and head movement due to the patient's reaction to pain. In this paper, we show that the AAM can deal with these movements and can achieve significant improvements in both the AU and pain detection performance compared to the current-state-of-the-art approaches which utilize similarity-normalized appearance features only.
 
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domain to assist recognition tasks in the target domain. The TTM is well characterized by two aspects: 1) it uses the bag-of-words model trained from the auxiliary domain to represent videos in the target domain; and 2) it assumes each human action is a mixture of a set of topics and uses the topics learned from the auxiliary domain to regularize the topic estimation in the target domain, wherein the regularization is the summation of Kullback-Leibler divergences between topic pairs of the two domains. The utilization of the auxiliary domain knowledge improves the generalization ability of the learned topic model. Experiments on Weizmann and KTH human action databases suggest the effectiveness of the proposed TTM for cross-domain human action recognition.
 
A common viewpoint-free framework that fuses pose recovery and classification for action and gait recognition is presented in this paper. First, a markerless pose recovery method is adopted to automatically capture the 3-D human joint and pose parameter sequences from volume data. Second, multiple configuration features (combination of joints) and movement features (position, orientation, and height of the body) are extracted from the recovered 3-D human joint and pose parameter sequences. A hidden Markov model (HMM) and an exemplar-based HMM are then used to model the movement features and configuration features, respectively. Finally, actions are classified by a hierarchical classifier that fuses the movement features and the configuration features, and persons are recognized from their gait sequences with the configuration features. The effectiveness of the proposed approach is demonstrated with experiments on the Institut National de Recherche en Informatique et Automatique Xmas Motion Acquisition Sequences data set.
 
Overview of the setup.
Graphical representation of the model. The affordance network is represented by three different sets of variables: actions (A), object features (F i ) and effects (E i ). Each word w i may depend on any subset of A, F i and E i . 
We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, robot perceptions, and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment. The model takes verbal descriptions of a task as the input and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions, and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot's own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation's environment as well as to shed some light as to how this challenging process develops with human infants.
 
Algorithmic process flow diagram for the integrated P-A learning framework. 
Typical symbol processing module output, with all visualizable predicates (signs, cars, road topologies) represented by icons with alpha- blending proportional to fuzzy confidence (expected path trajectories are indicated by yellow arrows, intentions by red arrows). 
Perception-action (P-A) learning is an approach to cognitive system building that seeks to reduce the complexity associated with conventional environment-representation/action-planning approaches. Instead, actions are directly mapped onto the perceptual transitions that they bring about, eliminating the need for intermediate representation and significantly reducing training requirements. We here set out a very general learning framework for cognitive systems in which online learning of the P-A mapping may be conducted within a symbolic processing context, so that complex contextual reasoning can influence the P-A mapping. In utilizing a variational calculus approach to define a suitable objective function, the P-A mapping can be treated as an online learning problem via gradient descent using partial derivatives. Our central theoretical result is to demonstrate top-down modulation of low-level perceptual confidences via the Jacobian of the higher levels of a subsumptive P-A hierarchy. Thus, the separation of the Jacobian as a multiplying factor between levels within the objective function naturally enables the integration of abstract symbolic manipulation in the form of fuzzy deductive logic into the P-A mapping learning. We experimentally demonstrate that the resulting framework achieves significantly better accuracy than using P-A learning without top-down modulation. We also demonstrate that it permits novel forms of context-dependent multilevel P-A mapping, applying the mechanism in the context of an intelligent driver assistance system.
 
The geometry of hand pointing and camera projection.
Bilocal image line tracing in the stereo pair through head and hand localization. Black pixels indicate background. 
(a) The image stripe summarizing the whole fresco of the “Cappella dei Magi” at Palazzo Medici-Riccardi, Florence. (b) Chapel’s ground-plan: each wall contains a part of the fresco. 
Example of display for interactive fruition of large size art images in the Palazzo Medici Riccardi. (a) Portion of the fresco and its selectable regions. (b) Hypertextual information displayed in response to a selection. 
We present a nonintrusive system based on computer vision for human-computer interaction in three-dimensional (3-D) environments controlled by hand pointing gestures. Users are allowed to walk around in a room and manipulate information displayed on its walls by using their own hands as pointing devices. Once captured and tracked in real-time using stereo vision, hand pointing gestures are remapped onto the current point of interest, thus reproducing in an advanced interaction scenario the "drag and click" behavior of traditional mice. The system, called PointAt (patent pending), enjoys a careful modeling of both user and optical subsystem, and visual algorithms for self-calibration and adaptation to both user peculiarities and environmental changes. The concluding sections provide an insight into system characteristics, performance, and relevance for real applications.
 
This paper addresses the problem of human-action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. An appropriate distance metric between two collections of spatiotemporal salient points is introduced, which is based on the chamfer distance and an iterative linear time-warping technique that deals with time expansion or time-compression issues. A classification scheme that is based on relevance vector machines and on the proposed distance measure is proposed. Results on real image sequences from a small database depicting people performing 19 aerobic exercises are presented.
 
This paper presents an activation scheme for use with Hopfield neural network algorithms that guarantees a valid solution for a particular category of problems. The technique monitors the appropriate neurons and heuristically controls their activation function. As a result it has been possible to eliminate several constraint terms from the energy function that normally would have been required to drive the network toward a valid solution. This saves time and eliminates the need for empirically determining a larger number of constants. This technique has been applied to the combinatorial optimization problem called hierarchical digraph visualization that arises in many application areas where it is necessary to visually realize the relationship between entities in complex systems. Results are presented that compare this new approach with a more traditional neural network approach as well as heuristic approaches, performance improvement in terms of the solution quality as well as execution time relative to both alternative techniques was achieved.
 
Enhancing the robustness and interpretability of a multilayer perceptron (MLP) with a sigmoid activation function is a challenging topic. As a particular MLP, additive TS-type MLP (ATSMLP) can be interpreted based on single-stage fuzzy IF-THEN rules, but its robustness will be degraded with the increase in the number of intermediate layers. This paper presents a new MLP model called cascaded ATSMLP (CATSMLP), where the ATSMLPs are organized in a cascaded way. The proposed CATSMLP is a universal approximator and is also proven to be functionally equivalent to a fuzzy inference system based on syllogistic fuzzy reasoning. Therefore, the CATSMLP may be interpreted based on syllogistic fuzzy reasoning in a theoretical sense. Meanwhile, due to the fact that syllogistic fuzzy reasoning has distinctive advantage over single-stage IF-THEN fuzzy reasoning in robustness, this paper proves in an indirect way that the CATSMLP is more robust than the ATSMLP in an upper-bound sense. Several experiments were conducted to confirm such a claim.
 
The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization.
 
We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.
 
In this paper, a model of cerebellar function is implemented and evaluated in the control of a robot eye actuated by pneumatic artificial muscles. The investigated control problem is stabilization of the visual image in response to disturbances. This is analogous to the vestibuloocular reflex (VOR) in humans. The cerebellar model is structurally based on the adaptive filter, and the learning rule is computationally analogous to least-mean squares, where parameter adaptation at the parallel fiber/Purkinje cell synapse is driven by the correlation of the sensory error signal (carried by the climbing fiber) and the motor command signal. Convergence of the algorithm is first analyzed in simulation on a model of the robot and then tested online in both one and two degrees of freedom. The results show that this model of neural function successfully works on a real-world problem, providing empirical evidence for validating: 1) the generic cerebellar learning algorithm; 2) the function of the cerebellum in the VOR; and 3) the signal transmission between functional neural components of the VOR.
 
This paper focuses on the development of adaptive fuzzy neural network control (AFNNC), including indirect and direct frameworks for an n-link robot manipulator, to achieve high-precision position tracking. In general, it is difficult to adopt a model-based design to achieve this control objective due to the uncertainties in practical applications, such as friction forces, external disturbances, and parameter variations. In order to cope with this problem, an indirect AFNNC (IAFNNC) scheme and a direct AFNNC (DAFNNC) strategy are investigated without the requirement of prior system information. In these model-free control topologies, a continuous-time Takagi-Sugeno (T-S) dynamic fuzzy model with online learning ability is constructed to represent the system dynamics of an n-link robot manipulator. In the IAFNNC, an FNN estimator is designed to tune the nonlinear dynamic function vector in fuzzy local models, and then, the estimative vector is used to indirectly develop a stable IAFNNC law. In the DAFNNC, an FNN controller is directly designed to imitate a predetermined model-based stabilizing control law, and then, the stable control performance can be achieved by only using joint position information. All the IAFNNC and DAFNNC laws and the corresponding adaptive tuning algorithms for FNN weights are established in the sense of Lyapunov stability analyses to ensure the stable control performance. Numerical simulations and experimental results of a two-link robot manipulator actuated by dc servomotors are given to verify the effectiveness and robustness of the proposed methodologies. In addition, the superiority of the proposed control schemes is indicated in comparison with proportional-differential control, fuzzy-model-based control, T-S-type FNN control, and robust neural fuzzy network control systems.
 
This paper deals with the reliable linear quadratic (LQ) fuzzy control problem for continuous-time nonlinear systems with actuator faults. The Takagi-Sugeno (T-S) fuzzy model is employed to represent a nonlinear system. By using multiple Lyapunov functions, an improved linear matrix inequality (LMI) method for the design of reliable LQ fuzzy controllers is investigated, which reduces the conservatism of using a single Lyapunov function. The different upper bounds on the LQ performance cost function for the normal and different actuator fault cases are provided. A suboptimal reliable LQ fuzzy controller is given by means of an LMI optimization procedure, which can not only guarantee the stability of the closed-loop overall fuzzy system for all cases, but also provide an optimized upper bound on a weighted average LQ performance cost function. Finally, numerical simulations on the chaotic Lorenz system are given to illustrate the application of the proposed design method.
 
This paper addresses two scalability problems related to the cognitive map of packets in ad hoc cognitive packet networks and proposes a solution. Previous works have included latency as part of the routing goal of smart packets, which requires packets to collect their arrival time at each node in a path. Such a requirement resulted in a packet overhead proportional to the path length. The second problem is that the multiplicative form of path availability, which was employed to measure resources, loses accuracy in long paths. To solve these problems, new goals are proposed in this paper. These goals are linear functions of low-overhead metrics and can provide similar performance results with lower cost. One direct result shown in simulation is that smart packets driven by a linear function of path length and buffer occupancy can effectively balance the traffic of multiple flows without the large overhead that would be needed if round-trip delay was used. In addition, energy-aware routing is also studied under this scheme as well as link selection based on their expected level of security.
 
Top-cited authors
Guang-Bin Huang
  • Nanyang Technological University
Xiaojian Ding
Hongming Zhou
  • Nanyang Technological University
Guanrong Chen
  • City University of Hong Kong
Wenwu Yu
  • Southeast University