Article

Characterizing data patterns with core-periphery network modeling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Traditional classification techniques usually classify data samples according to the physical organization, such as similarity, distance, and distribution, of the data features, which lack a general and explicit mechanism to represent data classes with semantic data patterns. Therefore, the incorporation of data pattern formation in classification is still a challenge problem. Meanwhile, data classification techniques can only work well when data features present high level of similarity in the feature space within each class. Such a hypothesis is not always satisfied, since, in real-world applications, we frequently encounter the following situation: On one hand, the data samples of some classes (usually representing the normal cases) present well defined patterns; on the other hand, the data features of other classes (usually representing abnormal classes) present large variance, i.e., low similarity within each class. Such a situation makes data classification a difficult task. In this paper, we present a novel solution to deal with the above mentioned problems based on the mesostructure of a complex network, built from the original data set. Specifically, we construct a core–periphery network from the training data set in such way that the normal class is represented by the core sub-network and the abnormal class is characterized by the peripheral sub-network. The testing data sample is classified to the core class if it gets a high coreness value; otherwise, it is classified to the periphery class. The proposed method is tested on an artificial data set and then applied to classify x-ray images for COVID-19 diagnosis, which presents high classification precision. In this way, we introduce a novel method to describe data pattern of the data “without pattern” through a network approach, contributing to the general solution of classification.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It is demonstrated in the transmission of rumors or information in social networks, [11,12] in the organization of the human connection in neurodevelopment, [13,14] in the transportation networks of airline flights, [15] and in the characterizing data patterns. [16] The core of the network is often regarded to be comprised of the densely inter-connected high-degree nodes, which impact adaptability, flexibility, and controllability. [17,18] Lots of profiling methods have been proposed based on optimizing a suitable fitness function using the coreness value to define the density of links inside the core, [2] referring to a quality index with respect to the size of the expected core and the fuzziness of the boundary, [19] or applying Markov chains to describe random walks to index the coreness of individual nodes. ...
... Then, we used the objective function to check the core of the dolphin network. Via the simulation, we obtain the core set, which is {2,10,14,15,16,18,21,28,30, 34, 37, 38, 39, 41, 43, 44, 46, 48, 51, 52, 55, 58}. The additional node is node 51, whose ...
Article
Full-text available
Many networks exhibit the core/periphery structure. Core/periphery structure is a type of meso-scale structure that consists of densely connected core nodes and sparsely connected peripheral nodes. Core nodes tend to be well-connected, both among themselves and to peripheral nodes, which tend not to be well-connected to other nodes. In this brief report, we propose a new method to detect the core of a network by the centrality of each node. It is discovered that such nodes with non-negative centralities often consist in the core of the networks. The simulation is carried out on different real networks. The results are checked by the objective function. The checked results may show the effectiveness of the simulation results by the centralities of the nodes on the real networks. Furthermore, we discuss the characters of networks with the single core/periphery structure and point out the scope of the application of our method at the end of this paper.
Article
Full-text available
Detecting a community in a network is a matter of discerning the distinct features and connections of a group of members that are different from those in other communities. The ability to do this is of great significance in network analysis. However, beyond the classic spectral clustering and statistical inference methods, there have been significant developments with deep learning techniques for community detection in recent years--particularly when it comes to handling high-dimensional network data. Hence, a comprehensive review of the latest progress in community detection through deep learning is timely. To frame the survey, we have devised a new taxonomy covering different state-of-the-art methods, including deep learning models based on deep neural networks (DNNs), deep nonnegative matrix factorization, and deep sparse filtering. The main category, i.e., DNNs, is further divided into convolutional networks, graph attention networks, generative adversarial networks, and autoencoders. The popular benchmark datasets, evaluation metrics, and open-source implementations to address experimentation settings are also summarized. This is followed by a discussion on the practical applications of community detection in various domains. The survey concludes with suggestions of challenging topics that would make for fruitful future research directions in this fast-growing deep learning field.
Article
Full-text available
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as languages used to query and validate knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We conclude with high-level future research directions for knowledge graphs.
Article
Full-text available
Significance The problem of fitting low-dimensional manifolds to high-dimensional data has been extensively studied from both theoretical and computational perspectives. As datasets get more heterogeneous and complicated, so must the spaces that are used to approximate them. Stratified spaces, built out of manifold pieces coherently glued together, form natural candidates for such geometric models. The key difficulty encountered when fitting stratified spaces to data is that none of the sampled points can be expected to lie exactly on the low-dimensional singular strata. The present work uses local cohomology to overcome this difficulty. Here, we describe an efficient and practical framework for singularity detection from finite samples and demonstrate its ability to detect interfaces in real and simulated data.
Article
Full-text available
COVID-19 is a worldwide epidemic, as announced by the World Health Organization (WHO) in March 2020. Machine learning (ML) methods can play vital roles in identifying COVID-19 patients by visually analyzing their chest x-ray images. In this paper, a new ML-method proposed to classify the chest x-ray images into two classes, COVID-19 patient or non-COVID-19 person. The features extracted from the chest x-ray images using new Fractional Multichannel Exponent Moments (FrMEMs). A parallel multi-core computational framework utilized to accelerate the computational process. Then, a modified Manta-Ray Foraging Optimization based on differential evolution used to select the most significant features. The proposed method evaluated using two COVID-19 x-ray datasets. The proposed method achieved accuracy rates of 96.09% and 98.09% for the first and second datasets, respectively.
Article
Full-text available
The core–periphery structure is one of the key concepts in the structural analysis of complex networks. It consists of a partitioning of the node set of a given graph or network into two groups, called core and periphery, where the core nodes induce a well-connected subgraph and share connections with peripheral nodes, while the peripheral nodes are loosely connected to the core nodes and other peripheral nodes. We propose a polynomial-time algorithm to detect core–periphery structures in networks having a symmetric adjacency matrix. The core set is defined as the solution of a combinatorial optimization problem, which has a pleasant symmetry with respect to graph complementation. We provide a complete description of the optimal solutions to that problem and an exact and efficient algorithm to compute them. The proposed approach is extended to networks with loops and oriented edges. Numerical simulations are carried out on both synthetic and real-world networks to demonstrate the effectiveness and practicability of the proposed algorithm.
Article
Full-text available
A network with core-periphery structure consists of core nodes that are densely interconnected. In contrast to community structure, which is a different meso-scale structure of networks, core nodes can be connected to peripheral nodes and peripheral nodes are not densely interconnected. Although core-periphery structure sounds reasonable, we argue that it is merely accounted for by heterogeneous degree distributions, if one partitions a network into a single core block and a single periphery block, which the famous Borgatti-Everett algorithm and many succeeding algorithms assume. In other words, there is a strong tendency that high-degree and low-degree nodes are judged to be core and peripheral nodes, respectively. To discuss core-periphery structure beyond the expectation of the node's degree (as described by the configuration model), we propose that one needs to assume at least one block of nodes apart from the focal core-periphery structure, such as a different core-periphery pair, community or nodes not belonging to any meso-scale structure. We propose a scalable algorithm to detect pairs of core and periphery in networks, controlling for the effect of the node's degree. We illustrate our algorithm using various empirical networks.
Article
Full-text available
With a core-periphery structure of networks, core nodes are densely interconnected, peripheral nodes are connected to core nodes to different extents, and peripheral nodes are sparsely interconnected. Core-periphery structure composed of a single core and periphery has been identified for various networks. However, analogous to the observation that many empirical networks are composed of densely interconnected groups of nodes, i.e., communities, a network may be better regarded as a collection of multiple cores and peripheries. We propose a scalable algorithm to detect multiple non-overlapping groups of core-periphery structure in a network. We illustrate our algorithm using synthesised and empirical networks. For example, we find distinct core-periphery pairs with different political leanings in a network of political blogs and separation between international and domestic subnetworks of airports in some single countries in a world-wide airport network.
Article
Full-text available
Core-periphery structure and community structure are two typical meso-scale structures in complex networks. Though the community detection has been extensively investigated from different perspectives, the definition and the detection of core-periphery structure have not been attracted enough attention. Furthermore, the detection problem of the core-periphery and community structure was separately investigated previously. In this paper, we develop a unified framework to simultaneously detect core-periphery structure and community structure in complex networks. Moreover, there are several extra advantages of our algorithm: our method can detect not only single but also multiple core-periphery structures; the overlapping nodes belonging to different communities can be identified; by adjusting the size of core, different scales of core-periphery structures can be detected. The good performance of the method has been validated on synthetic and real complex networks. So we provide a basic framework to detect the two typical meso-scale structures: core-periphery structure and community structure.
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Article
Full-text available
Recent studies uncovered important core/periphery network structures characterizing complex sets of cooperative and competitive interactions between network nodes, be they proteins, cells, species or humans. Better characterization of the structure, dynamics and function of core/periphery networks is a key step of our understanding cellular functions, species adaptation, social and market changes. Here we summarize the current knowledge of the structure and dynamics of "traditional" core/periphery networks, rich-clubs, nested, bow-tie and onion networks. Comparing core/periphery structures with network modules, we discriminate between global and local cores. The core/periphery network organization lies in the middle of several extreme properties, such as random/condensed structures, clique/star configurations, network symmetry/asymmetry, network assortativity/disassortativity, as well as network hierarchy/anti-hierarchy. These properties of high complexity together with the large degeneracy of core pathways ensuring cooperation and providing multiple options of network flow re-channelling greatly contribute to the high robustness of complex systems. Core processes enable a coordinated response to various stimuli, decrease noise, and evolve slowly. The integrative function of network cores is an important step in the development of a large variety of complex organisms and organizations. In addition to these important features and several decades of research interest, studies on core/periphery networks still have a number of unexplored areas.
Article
Full-text available
Disclosing the main features of the structure of a network is crucial to understand a number of static and dynamic properties, such as robustness to failures, spreading dynamics, or collective behaviours. Among the possible characterizations, the core-periphery paradigm models the network as the union of a dense core with a sparsely connected periphery, highlighting the role of each node on the basis of its topological position. Here we show that the core-periphery structure can effectively be profiled by elaborating the behaviour of a random walker. A curve-the core-periphery profile-and a numerical indicator are derived, providing a global topological portrait. Simultaneously, a coreness value is attributed to each node, qualifying its position and role. The application to social, technological, economical, and biological networks reveals the power of this technique in disclosing the overall network structure and the peculiar role of some specific nodes.
Article
Full-text available
A common but informal notion in social network analysis and other fields is the concept of a core/periphery structure. The intuitive conception entails a dense, cohesive core and a sparse, unconnected periphery. This paper seeks to formalize the intuitive notion of a core/periphery structure and suggests algorithms for detecting this structure, along with statistical tests for testing a priori hypotheses. Different models are presented for different kinds of graphs (directed and undirected, valued and nonvalued). In addition, the close relation of the continuous models developed to certain centrality measures is discussed.
Article
Full-text available
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
Article
Full-text available
to difierentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the efiectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the difierent existing techniques in that category are variants of the basic tech- nique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the difierent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Article
Full-text available
The quad tree is a data structure appropriate for storing information to be retrieved on composite keys. We discuss the specific case of two-dimensional retrieval, although the structure is easily generalised to arbitrary dimensions. Algorithms are given both for staightforward insertion and for a type of balanced insertion into quad trees. Empirical analyses show that the average time for insertion is logarithmic with the tree size. An algorithm for retrieval within regions is presented along with data from empirical studies which imply that searching is reasonably efficient. We define an optimized tree and present an algorithm to accomplish optimization in n log n time. Searching is guaranteed to be fast in optimized trees. Remaining problems include those of deletion from quad trees and merging of quad trees, which seem to be inherently difficult operations.
Article
Full-text available
We analytically describe the architecture of randomly damaged uncorrelated networks as a set of successively enclosed substructures--k-cores. The k-core is the largest subgraph where vertices have at least k interconnections. We find the structure of k-cores, their sizes, and their birthpoints--the bootstrap percolation thresholds. We show that in networks with a finite mean number zeta2 of the second-nearest neighbors, the emergence of a k-core is a hybrid phase transition. In contrast, if zeta2 diverges, the networks contain an infinite sequence of k-cores which are ultrarobust against random damage.
Article
Full-text available
We study a map of the Internet (at the autonomous systems level), by introducing and using the method of k-shell decomposition and the methods of percolation theory and fractal geometry, to find a model for the structure of the Internet. In particular, our analysis uses information on the connectivity of the network shells to separate, in a unique (no parameters) way, the Internet into three subcomponents: (i) a nucleus that is a small (≈100 nodes), very well connected globally distributed subgraph; (ii) a fractal subcomponent that is able to connect the bulk of the Internet without congesting the nucleus, with self-similar properties and critical exponents predicted from percolation theory; and (iii) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. We show that our method of decomposition is robust and provides insight into the underlying structure of the Internet and its functional consequences. Our approach of decomposing the network is general and also useful when studying other complex networks. • fractals • networks • percolation
Chapter
In real world data classification tasks, we always face the situations where the data samples of the normal cases present a well defined pattern and the features of abnormal data samples vary from one to another, i.e., do not show a regular pattern. Up to now, the general data classification hypothesis requires the data features within each class to present a certain level of similarity. Therefore, such real situations violate the classic classification condition and make it a hard task. In this paper, we present a novel solution for this kind of problems through a network approach. Specifically, we construct a core-periphery network from the training data set in such way that core node set is formed by the normal data samples and peripheral node set contains the abnormal samples of the training data set. The classification is made by checking the coreness of the testing data samples. The proposed method is applied to classify radiographic image for COVID-19 diagnosis. Computer simulations show promising results of the method. The main contribution is to introduce a general scheme to characterize pattern formation of the data “without pattern”.
Article
The COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus 2, emerged into a world being rapidly transformed by artificial intelligence (AI) based on big data, computational power and neural networks. The gaze of these networks has in recent years turned increasingly towards applications in healthcare. It was perhaps inevitable that COVID-19, a global disease propagating health and economic devastation, should capture the attention and resources of the world's computer scientists in academia and industry. The potential for AI to support the response to the pandemic has been proposed across a wide range of clinical and societal challenges, including disease forecasting, surveillance and antiviral drug discovery. This is likely to continue as the impact of the pandemic unfolds on the world's people, industries and economy but a surprising observation on the current pandemic has been the limited impact AI has had to date in the management of COVID-19. This correspondence focuses on exploring potential reasons behind the lack of successful adoption of AI models developed for COVID-19 diagnosis and prognosis, in front-line healthcare services. We highlight the moving clinical needs that models have had to address at different stages of the epidemic, and explain the importance of translating models to reflect local healthcare environments. We argue that both basic and applied research are essential to accelerate the potential of AI models, and this is particularly so during a rapidly evolving pandemic. This perspective on the response to COVID-19, may provide a glimpse into how the global scientific community should react to combat future disease outbreaks more effectively.
Article
Intermediate-scale (or "meso-scale") structures in networks have received considerable attention, as the algorithmic detection of such structures makes it possible to discover network features that are not apparent either at the local scale of nodes and edges or at the global scale of summary statistics. Numerous types of meso-scale structures can occur in networks, but investigations of such features have focused predominantly on the identification and study of community structure. In this paper, we develop a new method to investigate the meso-scale feature known as core-periphery structure, which entails identifying densely connected core nodes and sparsely connected peripheral nodes. In contrast to communities, the nodes in a core are also reasonably well-connected to those in a network's periphery. Our new method of computing core-periphery structure can identify multiple cores in a network and takes into account different possible core structures. We illustrate the differences between our method and several existing methods for identifying which nodes belong to a core, and we use our technique to examine core-periphery structure in examples of friendship, collaboration, transportation, and voting networks. For this new SIGEST version of our paper, we also discuss our work's relevance in the context of recent developments in the study of core-periphery structure.
Article
Data classification is a common task, which can be performed by both computers and human beings. However, a fundamental difference between them can be observed: computer-based classification considers only physical features (e.g., similarity, distance, or distribution) of input data; by contrast, brain-based classification takes into account not only physical features, but also the organizational structure of data. In this paper, we figure out the data organizational structure for classification using complex networks constructed from training data. Specifically, an unlabeled instance is classified by the importance concept characterized by Google’s PageRank measure of the underlying data networks. Before a test data instance is classified, a network is constructed from vector-based data set and the test instance is inserted into the network in a proper manner. To this end, we also propose a measure, called spatio-structural differential efficiency , to combine the physical and topological features of the input data. Such a method allows for the classification technique to capture a variety of data patterns using the unique importance measure. Extensive experiments demonstrate that the proposed technique has promising predictive performance on the detection of heart abnormalities.
Article
Resolving a network of hubs Graphs are a pervasive tool for modeling and analyzing network data throughout the sciences. Benson et al. developed an algorithmic framework for studying how complex networks are organized by higher-order connectivity patterns (see the Perspective by Pržulj and Malod-Dognin). Motifs in transportation networks reveal hubs and geographical elements not readily achievable by other methods. A motif previously suggested as important for neuronal networks is part of a “rich club” of subnetworks. Science , this issue p. 163 ; see also p. 123
Article
Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mech-anisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Article
Traditional supervised data classification considers only physical features (e.g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Article
Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.
Article
Networks may, or may not, be wired to have a core that is both itself densely connected and central in terms of graph distance. In this study we propose a coefficient to measure if the network has such a clear-cut core-periphery dichotomy. We measure this coefficient for a number of real-world and model networks and find that different classes of networks have their characteristic values. Among other things we conclude that geographically embedded transportation networks have a strong core-periphery structure. We proceed to study radial statistics of the core, i.e., properties of the neighborhoods of the core vertices for increasing n. We find that almost all networks have unexpectedly many edges within n neighborhoods at a certain distance from the core suggesting an effective radius for nontrivial network processes.
Article
Geographical curves are so involved in their detail that their lengths are often infinite or, rather, undefinable. However, many are statistically "selfsimilar," meaning that each portion can be considered a reduced-scale image of the whole. In that case, the degree of complication can be described by a quantity D that has many properties of a "dimension," though it is fractional; that is, it exceeds the value unity associated with the ordinary, rectifiable, curves.