Article

Centrality measures-based algorithm to visualize a maximal common induced subgraph in large communication networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Communication networks are ubiquitous, increasingly complex, and dynamic. Predicting and visualizing common patterns in such a huge graph data of communication network is an essential task to understand active patterns evolved in the network. In this work, the problem is to find an active pattern in a communication network which is modeled as detection of a maximal common induced subgraph (CIS). The state of the communication network is captured as a time series of graphs which has periodic snapshots of logical communications within the network. A new centrality measure is proposed to assess the variation in successive graphs and to identify the behavior of each node in the time series graph. It extents help in the process of selecting a suitable candidate vertex for maximality in each step of the proposed algorithm. This paper is a pioneer attempt in using centrality measures to detect a maximal CIS of the huge graph database, which gives promising effect in the resultant graph in terms of large number of vertices. The algorithm has polynomial time complexity, and the efficiency of the algorithm is demonstrated by a series of experiments with synthetic graph datasets of different orders. The performance of real-time datasets further ensured the competence of the proposed algorithm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Centrality-measures are already discussed in the literature Nirmala et al. (2016). But, the election of content router on the basis of optimality feature is not done. ...
Article
Full-text available
Content-centric networking (CCN) is gradually becoming the alternative approach to the traditional Internet architecture through enlightening information (content) distribution on the Internet with content names. The growing rate of Internet traffic has adapted a content-centric architecture to better serve the user requirement of accessing a content. For enhancing content delivery, ubiquitous in-network caching is utilized to store a content in each and every router by the side of the content delivery path. From the study, it is evaluated that a better performance can be achieved when caching is done by a subset of CRs instead of all CRs in a content delivery path. Motivated by this, we proposed an adaptive neuro-fuzzy inference system-based caching (ANFIS-BC) scheme for CCN to improve the cache performance. The proposed ANFIS-BC scheme utilizes the feature of centrality-measures for selection of a router for caching in a network. Our results demonstrated that the ANFIS-BC scheme consistently achieves better caching gain across the multiple network topologies.
... Visualization of communication networks is becoming increasingly more important and a challenging task as communication network grows drastically in size. Hence, it requires an efficient method to analyze the network [21]. In CCN, for efficient caching CR must be selected in such a way that maximum throughput can be achieved with less overhead. ...
Article
Full-text available
Content-centric networking (CCN) is gradually becoming an alternative approach to the conventional Internet architecture through the distribution of enlightening information (named as content) on the Internet. It is evaluated that the better performance can be achieved by caching is done on a subset of content routers instead of all the routers in the content delivery path. The subset of a content router must be selected in a manner such that maximum cache performance can be achieved. Motivated by this, we propose a Centrality-measures based algorithm (CMBA) for selection of an appropriate content router for caching of the contents. The Centrality-measures are based on the question: ”Who are the most important or central content router in the network for the caching of contents?”. We found that our novel CMBA could improve content cache performance along the content delivery path by using only a subset of available content routers. Our results recommend that our proposed work consistently achieves better caching gain across the multiple network topologies.
Article
The process of ranking molecular chemical compounds is a challenging task as it has degeneracy behavior. It is a novel idea to apply graph theory based concepts to predict the rank of each molecular chemical graph, which will reduce the effect of getting the same topological index for more than one molecular compound. The proposed work introduces a cumulative centrality index for a graph which is calculated based on degree, eigenvector, Katz, closeness, betweenness, harmonic and subgraph centralities of each vertex (atom) of the input chemical graph. Based on the obtained results, it is possible to assure that the cumulative centrality index has a significant role in ranking the molecular chemical graphs. This work focuses only on all seven identity constitutional graphs with 6 vertices to illustrate the rank process using the concept of cumulative centrality index.
Article
The physical and biological properties of a chemical molecule entity are related to its structure. One of the basic widely accepted principles in chemistry is that compounds with similar structures frequently share similar physicochemical properties and biological activities. The process of finding structural similarities between chemical structures of molecules helps to identify the common behavior of these molecules. A familiar approach to capture the structural similarity between two chemical compounds is to detect a maximal Common Connected vertex induced Subgraph (CCS) in their molecular chemical graphs. The proposed algorithm detects a maximal CCS by checking the induced property of the vertices which are collected by performing a DFS search on the tensor product graph of two input molecular chemical graphs. The DFS search will start from the node which has the highest eigenvector centrality in the tensor product graph. The significance of the proposed work is that it uses eigenvector centrality to predict the root node of the DFS search tree, so that the resulting sugraph gets more number of nodes (i.e. large size maximal CCS). The experimental results on synthetic and real chemical database, further ensure the competence of the proposed algorithm when compared with the existing works.
Article
In this study, we create an emotion lexicon for the Hindi language called Hindi EmotionNet. It can assign emotional affinity to words in IndoWordNet. This lexicon contains 3,839 emotion words, with 1,246 positive and 2,399 negative words. We also introduce ambiguous (217 words) and neutral (95 words) emotions to Hindi. Positive emotion words covered nine types of positive emotions, negative emotion words covered eleven types of negative emotions, ambiguous emotion words covered seven types of ambiguous emotions, and neutral emotion words covered two neutral emotions. The proposed Hindi EmotionNet was then applied to opinion classification and emotion classification. We introduce a centrality-based approach for emotion classification that uses degree, closeness, betweenness, and page rank as centrality measures. We also created a dataset of Hindi based on screenplays, stories, and blogs in the language. We translated emotion data from SemEval 2017 into Hindi for further comparison. The proposed approach delivered promising results on opinion and emotion classification, with an accuracy of 85.78% for the former and 75.91% for the latter.
Article
In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k-nearest neighbor classification performance.
Conference Paper
Full-text available
This paper considers the maximum common subgraph problem, which is to find a connected graph with the maximum number of edges that is isomorphic to a subgraph of each of the two input graphs. This paper presents a dynamic programming algorithm for computing the maximum common subgraph of two outerplanar graphs whose maximum vertex degree is bounded by a constant, where it is known that the problem is NP-hard even for outerplanar graphs of unbounded degree. Although the algorithm repeatedly modifies input graphs, it is shown that the number of relevant subproblems is polynomially bounded and thus the algorithm works in polynomial time.
Conference Paper
Full-text available
The Maximum Common Subgraph (MCS) problem appears in many guises and in a wide variety of applications. The usual goal is to take as inputs two graphs, of order m and n, respectively, and find the largest induced subgraph contained in both of them. MCS is frequently solved by reduction to the problem of finding a maximum clique in the order mn association graph, which is a particular form of product graph built from the inputs. In this paper a new algorithm, termed “clique branching,” is proposed that exploits a special structure inherent in the association graph. This structure contains a large number of naturally-ordered cliques that are present in the association graph’s complement. A detailed analysis shows that the proposed algorithm requires O((m+1)n ) time, which is a superior worst-case bound to those known for previously-analyzed algorithms in the setting of the MCS problem.
Article
Full-text available
Properties of a chemical entity, both physical and biological, are related to its structure. Since compound similarity can be used to infer properties of novel compounds, in chemoinformatics much attention has been paid to ways of calculating structural similarity. A useful metric to capture the structural similarity between compounds is the relative size of the Maximum Common Subgraph (MCS). The MCS is the largest substructure present in a pair of compounds, when represented as graphs. However, in practice it is difficult to employ such a metric, since calculation of the MCS becomes computationally intractable when it is large. We propose a novel algorithm that significantly reduces computation time for finding large MCSs, compared to a number of state-of-the-art approaches. The use of this algorithm is demonstrated in an application predicting the transcriptional response of breast cancer cell lines to different drug-like compounds, at a scale which is challenging for the most efficient MCS-algorithms to date. In this application 714 compounds were compared.
Article
Full-text available
In managing huge-enterprise communication networks, the ability to measure similarity is an important performance monitoring function. It is possible to draw certain significant conclusions regarding effective utilization of networks by characterizing a computer network as a time series of graphs with IP addresses as nodes and communication between nodes as edges. Measuring similarity of graphs is a significant task in mining the graph data for matching, comparing, and evaluating patterns in huge graph databases. The problem of finding the nodes in the communication network which are always active can be formulated as a Maximum Common Subgraph (MCS) detection problem. This paper presents a Divisive Clustering MCS detection algorithm (DC-MCS) to find all maximum comomn sub-graphs of k graphs in a graph database. The uniqueness of this algorithm lies in the facts that it considers any number of input graphs can and it scans the graph database only once. The series of experiments performed and the comparison of empirical results with the existing algorithms further ensure the efficiency of the proposed algorithm.
Conference Paper
Full-text available
A graph g is called a maximum common subgraph of two graphs, g 1 and g 2, if there exists no other common subgraph of g 1 and g 2 that has more nodes than g. For the maximum common subgraph problem, exact and inexact algorithms are known from the literature. Nevertheless, until now no effort has been done for characterizing their performance. In this paper, two exact algorithms for maximum common subgraph detection are described. Moreover a database containing randomly connected pairs of graphs, having a maximum common graph of at least two nodes, is presented, and the performance of the two algorithms is evaluated on this database.
Conference Paper
Full-text available
This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.
Article
Full-text available
Graphs are an extremely general and powerful data structure. In pat- tern recognition and computer vision, graphs are used to represent pat- terns to be recognized or classified. Detection of maximum common sub- graph (MCS) is useful for matching, comparing and evaluate the similarity of patterns. MCS is a well known NP-complete problem for which optimal and suboptimal algorithms are known from the literature. Nevertheless, until now no effort has been done for characterizing their performance. The lack of a large database of graphs makes the task of comparing the performance of different graph matching algorithms difficult, and often the selection of an algorithm is made on the basis of a few experimental re- sults available. In this paper, three optimal and well-known algorithms for maximum common subgraph detection are described. Moreover a large database containing various categories of pairs of graphs (e.g. random graphs, meshes, bounded valence graphs), is presented, and the perfor- mance of the three algorithms is evaluated on this database.
Article
Full-text available
In this paper we describe a classification method that allows the use of graph-based representations of data instead of traditional vector-based representations. We compare the vector approach combined with the k-Nearest Neighbor (k-NN) algorithm to the graph-matching approach when classifying three different web document collections, using the leave-one-out approach for measuring classification accuracy. We also compare the performance of different graph distance measures as well as various document representations that utilize graphs. The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.
Article
Full-text available
This paper presents an approximate Maximum Common Subgraph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse; have many different labels; and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common subgraphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.
Article
The intuitive background for measures of structural centrality in social networks is reviewed and existing measures are evaluated in terms of their consistency with intuitions and their interpretability.
Article
Measuring similarity of graphs is an important task in graph mining for matching, comparing, and evaluating patterns in huge graph databases. In managing huge-enterprise communication networks, the ability to measure similarity is an important performance monitoring function. It is possible to draw certain significant conclusions regarding effective utilization of networks by characterizing a computer network as a time series of graphs with IP addresses as nodes and communication between nodes as edges. The maximum common subnets of k network time series graphs give a measure of the utilization of network nodes at different intervals of time. The problem of finding the nodes in the communication network which are always active can be formulated as a Maximum Common Subgraph (MCS) detection problem which would be useful for various decision making tasks such as devising better routing algorithms. This paper presents a novel MCS detection algorithm that introduces a new heap-based data structure to find all MCS of k graphs in a graph database efficiently. The series of experiments performed and the comparison of empirical results with the existing algorithms further ensure the efficiency of the proposed algorithm.
Article
This text takes a focused and comprehensive look at mining data represented as a graph, with the latest findings and applications in both theory and practice provided. Even if you have minimal background in analyzing graph data, with this book you'll be able to represent data as graphs, extract patterns and concepts from the data, and apply the methodologies presented in the text to real datasets. There is a misprint with the link to the accompanying Web page for this book. For those readers who would like to experiment with the techniques found in this book or test their own ideas on graph data, the Web page for the book should be http://www.eecs.wsu.edu/MGD.
Article
Web data extractors are used to extract data from web documents in order to feed automated processes. In this article, we propose a technique that works on two or more web documents generated by the same server-side template and learns a regular expression that models it and can later be used to extract data from similar documents. The technique builds on the hypothesis that the template introduces some shared patterns that do not provide any relevant data and can thus be ignored. We have evaluated and compared our technique to others in the literature on a large collection of web documents; our results demonstrate that our proposal performs better than the others and that input errors do not have a negative impact on its effectiveness; furthermore, its efficiency can be easily boosted by means of a couple of parameters, without sacrificing its effectiveness.
Article
Many applications see huge demands of finding important changing areas in evolving graphs. In this paper, given a series of snapshots of an evolving graph, we model and develop algorithms to capture the most frequently changing component (MFCC). Motivated by the intuition that the MFCC should capture the densest area of changes in an evolving graph, we propose a simple yet effective model. Using only one parameter, users can control tradeoffs between the “density” of the changes and the size of the detected area. We verify the effectiveness and the efficiency of our approach on real data sets systematically.
Article
Extracting information from web documents has become a research area in which new proposals sprout out year after year. This has motivated several researchers to work on surveys that attempt to provide an overall picture of the many existing proposals. Unfortunately, none of these surveys provide a complete picture, because they do not take region extractors into account. These tools are kind of preprocessors, because they help information extractors focus on the regions of a web document that contain relevant information. With the increasing complexity of web documents, region extractors are becoming a must to extract information from many websites. Beyond information extraction, region extractors have also found their way into information retrieval, focused web crawling, topic distillation, adaptive content delivery, mashups, and metasearch engines. In this paper, we survey the existing proposals regarding region extractors and compare them side by side.
Article
The intuitive background for measures of structural centrality in social networks is reviewed and existing measures are evaluated in terms of their consistency with intuitions and their interpretability.Three distinct intuitive conceptions of centrality are uncovered and existing measures are refined to embody these conceptions. Three measures are developed for each concept, one absolute and one relative measure of the centrality of positions in a network, and one reflecting the degree of centralization of the entire network. The implications of these measures for the experimental study of small groups is examined.
Article
A model-based handwritten Chinese character recognition (HCCR) system is proposed. The characters are represented by attributed relational graphs (ARG) using strokes as ARG vertices. A number of vector relational attributes are also used in the representation to improve the performance of the translation and scale invariant and rotation sensitive recognition system. Since the ETL-8 database is very noisy and broken strokes are commonly encountered, a suitable homomorphic energy function is proposed that allows the segments of a broken stroke of a test character to be matched to the corresponding model stroke. The homomorphic ARG matching energy is minimised using the self-organising Hopfield neural networks [1] [Suganthan, P.N., Teoh, E.K., Mital, D.P., A self-organising Hopfield network for attributed relational graph matching, Image and Vision Computing, 13(1) (1995) 61–73]. An effective formulation is introduced to determine the matching score. The formulation does not penalise the matching scores of test characters with broken strokes. Experiments were performed with 100 classes of characters in the ETL-8 database and 98.9% recognition accuracy has been achieved.
Conference Paper
Hierarchical image structures are abundant in computer vision, and have been used to encode part structure, scale spaces, and a variety of multiresolution features. In this paper, we describe a unified framework for both indexing and matching such structures. First, we describe an indexing mechanism that maps the topological structure of a directed acyclic graph (DAG) into a low-dimensional vector space. Based on a novel eigenvalue characterization of a DAG, this topological signature allows us to efficiently retrieve a small set of candidates from a database of models. To accommodate occlusion and local deformation, local evidence is accumulated in each of the DAG’s topological subspaces. Given a small set of candidate models, we will next describe a matching algorithm that exploits this same topological signature to compute, in the presence of noise and occlusion, the largest isomorphic subgraph between the image structure and the candidate model structure which, in turn, yields a measure of similarity which can be used to rank the candidates. We demonstrate the approach with a series of indexing and matching experiments in the domains of 2-D and (view-based) 3-D generic object recognition.
Article
A special class of graphs is introduced in this paper. The graphs belonging to this class are characterised by the existence of unique node labels. A number of matching algorithms for graphs with unique node labels are developed. It is shown that problems such as graph isomorphism, subgraph isomorphism, maximum common subgraph (MCS) and graph edit distance (GED) have a computational complexity that is only quadratic in the number of nodes. Moreover, computing the median of a set of graphs is only linear in the cardinality of the set. In a series of experiments, it is demonstrated that the proposed algorithms run very fast in practice. The considered class makes the matching of large graphs, consisting of thousands of nodes, computationally tractable. We also discuss an application of the considered class of graphs and related matching algorithms to the classification and detection of abnormal events in computer networks.
Article
Backtrack algorithms are applicable to a wide variety of problems. An efficient but readable version of such an algorithm is presented and its use in the problem of finding the maximal common subgraph of two graphs is described. Techniques available in this application area for ordering and pruning the backtrack search are discussed. This algorithm has been used successfully as a component of a program for analysing chemical reactions and enumerating the bond changes which have taken place.
Article
In the management of large enterprise communication networks, it becomes difficult to detect and identify causes of abnormal change in traffic distributions when the underlying logical topology is dynamic. This paper describes a novel approach to abnormal network change detection by representing periodic observations of logical communications within a network as a time series of graphs. A number of graph distance measures are proposed to assess the difference between successive graphs and identify abnormal behaviour. Localisation techniques have also been described to show where in the network most change occurred.
Article
A method for the recognition of multifont printed characters is proposed, giving emphasis to the identification of structural descriptions of character shapes using prototypes. Noise and shape variations are modeled as series of transformations from groups of features in the data to features in each prototype. Thus, the method manages systematically the relative distortion between a candidate shape and its prototype, accomplishing robustness to noise with less than two prototypes per class, on the average. Our method uses a flexible matching between components and a flexible grouping of the individual components to be matched. A number of shape transformations are defined. Also, a measure of the amount of distortion that these transformations cause is given. The problem of classification of character shapes is defined as a problem of optimization among the possible transformations that map an input shape into prototypical shapes. Some tests with hand printed numerals confirmed the method's high robustness level
Article
This paper studies the problem of structured data extraction from arbitrary Web pages. The objective of the proposed research is to automatically segment data records in a page, extract data items/fields from these records, and store the extracted data in a database. Existing methods addressing the problem can be classified into three categories. Methods in the first category provide some languages to facilitate the construction of data extraction systems. Methods in the second category use machine learning techniques to learn wrappers (which are data extraction programs) from human labeled examples. Manual labeling is time-consuming and is hard to scale to a large number of sites on the Web. Methods in the third category are based on the idea of automatic pattern discovery. However, multiple pages that conform to a common schema are usually needed as the input. In this paper, we propose a novel and effective technique (called DEPTA) to perform the task of Web data extraction automatically. The method consists of two steps: 1) identifying individual records in a page and 2) aligning and extracting data items from the identified records. For step 1, a method based on visual information and tree matching is used to segment data records. For step 2, a novel partial alignment technique is proposed. This method aligns only those data items in a pair of records that can be aligned with certainty, making no commitment on the rest of the items. Experimental results obtained using a large number of Web pages from diverse domains show that the proposed two-step technique is highly effective
Article
A method for the recognition of multifont printed characters is proposed, giving emphasis to the identification of structural descriptions of character shapes using prototypes. Noise and shape variations are modeled as series of transformations from groups of features in the data to features in each prototype. Thus, the method manages systematically the relative distortion between a candidate shape and its prototype, accomplishing robustness to noise with less than two prototypes per class, on average. The method uses a flexible matching between components and a flexible grouping of the individual components to be matched. A number of shape transformations are defined, including filling of gaps, so that the method handles broken characters. Also, a measure of the amount of distortion that these transformations cause is given. Classification of character shapes is defined as a minimization problem among the possible transformations that map an input shape into prototypical shapes. Some tests with hand-printed numerals confirmed the method's high robustness level
A comparison of algorithms for maximum common subgraph on randomly connected graphs In: Structural, syntactic, and statistical pattern recognition
  • H Bunke
  • P Foggia
  • C Guidobaldi
  • C Sansone
  • M Vento