Article

Structure-oriented prediction in complex networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Complex systems are extremely hard to predict due to its highly nonlinear interactions and rich emergent properties. Thanks to the rapid development of network science, our understanding of the structure of real complex systems and the dynamics on them has been remarkably deepened, which meanwhile largely stimulates the growth of effective prediction approaches on these systems. In this article, we aim to review different network-related prediction problems, summarize and classify relevant prediction methods, analyze their advantages and disadvantages, and point out the forefront as well as critical challenges of the field.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Nowadays, complex systems have become powerful tools in describing complicated real systems, as they can recognize and capture the complex structure between the variables of the phenomena. Complex networks have been widely applied in various researches (e.g., Sivakumar and Woldemeskel 2014;Ren et al. 2018). In the structure of the complex networks, there are nodes that the interactions between them have both some level of frequency and arbitrariness (Ren et al. 2018). ...
... Complex networks have been widely applied in various researches (e.g., Sivakumar and Woldemeskel 2014;Ren et al. 2018). In the structure of the complex networks, there are nodes that the interactions between them have both some level of frequency and arbitrariness (Ren et al. 2018). ...
... In developing a complex network, the most important step is to identify nodes and links. In some studies such as studies of Sivakumar and Woldemeskel (2014), Halverson and Fleming (2015), Serinaldi and Kilsby (2016), Han et al. (2018), and Ren et al. (2018), the complex network methods have been applied for modeling hydrological variables such as streamflow. Yasmin and Sivakumar (2018) implemented the complex network method and phase space reconstruction (PSR) for monthly stream flow modeling using more than 50 years of data from 639 hydrometric stations over the contiguous USA. ...
Article
The quantitative analysis of rainfall provides an in-depth understanding of the spatio-temporal variation of rainfall patterns. The present study aims to implement complex networks for studying the temporal connections of monthly rainfall in different rainfall regimes of Turkey between 1977 and 2016. The rainfall data of 151 rain gauges were reconstructed in the phase space, and the optimal embedding dimensions (OED) for the optimal delays are selected using the false nearest neighbors (FNN) approach. Subsequently, the reconstructed phase space (RPS) is served as a network, and the strength for each node of the network is calculated by applying a distance criterion. The results showed the utility of the RPS-based network for studying the temporal correlations in rainfall data. Moreover, the regional characteristics and rainfall properties reflect the strength values. The insights gained from the study provide baseline information for climate change adaptation and pave the way to similar applications on a global scale.
... Some scholars have argued that the ultimate goal of understanding a complex system is to accurately predict its future behavior [280]. Predicting the future evolution of a complex system can be difficult due to various factors, including chaotic behavior, our incomplete knowledge of the fundamental laws that describe the system's behavior, our lack of knowledge of the system's state vector -we refer to [281] for a survey. ...
... Predicting the future evolution of a complex system can be difficult due to various factors, including chaotic behavior, our incomplete knowledge of the fundamental laws that describe the system's behavior, our lack of knowledge of the system's state vector -we refer to [281] for a survey. The network science literature has designed a wide spectrum of techniques either to predict the future links that will appear in a network, or to infer the connections that are missing in noisy data (see [280,282,283] for recent reviews on the topic). The nested structure of socio-economic systems has motivated scholars to leverage such structure to predict the future behavior of the system, at the level of both individual links and nodes. ...
... Based on the methods and results in [71,72,335], in this review, centrality metrics for bipartite networks have been compared with respect to their ability to rank nodes by their structural importance (Section 6.1.2). We emphasize that centrality metrics are routinely used for different purposes (like the identification of influential nodes for diffusion processes [397], the identification of expert-selected significant nodes [291], the prediction of the nodes' future popularity [280]), and the results presented in Section 6.1.2 only refer to one particular application. ...
Article
Full-text available
The observed architecture of ecological and socio-economic networks differssignificantly from that of random networks. From a network science standpoint, non-random structural patterns observed in real networks call for an explanation of their emergence and an understanding of their potential systemic consequences. This article focuses on one of these patterns: nestedness. Given a network of interacting nodes, nestedness can be described as the tendency for nodes to interact with subsets of the interaction partners of better-connected nodes. Known since more than 80 years in biogeography, nestedness has been found in systems as diverse as ecological mutualistic systems, world trade, inter-organizational relations, among many others. This review article focuses on three main pillars: the existing methodologies to observe nestedness in networks; the main theoretical mechanisms conceived to explain the emergence of nestedness in ecological and socio-economic networks; the implications of a nested topology of interactions for the stability and feasibility of a given interacting system. We survey results from variegated disciplines, including statistical physics, graph theory, ecology, and theoretical economics. Nestedness was found to emerge both in bipartite networks and, more recently, in unipartite ones; this review is the first comprehensive attempt to unify both streams of studies, usually disconnected from each other. We believe that the truly interdisciplinary endeavor – while rooted in a complex systems perspective – may inspire new models and algorithms whose realm of application will undoubtedly transcend disciplinary boundaries.
... Some scholars have argued that the ultimate goal of understanding a complex system is to accurately predict its future behavior [280]. Predicting the future evolution of a complex system can be difficult due to various factors, including chaotic behavior, our incomplete knowledge of the fundamental laws that describe the system's behavior, our lack of knowledge of the system's state vector -we refer to [281] for a survey. ...
... Predicting the future evolution of a complex system can be difficult due to various factors, including chaotic behavior, our incomplete knowledge of the fundamental laws that describe the system's behavior, our lack of knowledge of the system's state vector -we refer to [281] for a survey. The network science literature has designed a wide spectrum of techniques either to predict the future links that will appear in a network, or to infer the connections that are missing in noisy data (see [280,282,283] for recent reviews on the topic). The nested structure of socio-economic systems has motivated scholars to leverage such structure to predict the future behavior of the system, at the level of both individual links and nodes. ...
... Based on the methods and results in [71,72,335], in this review, centrality metrics for bipartite networks have been compared with respect to their ability to rank nodes by their structural importance (Section 6.1.2). We emphasize that centrality metrics are routinely used for different purposes (like the identification of influential nodes for diffusion processes [397], the identification of expert-selected significant nodes [291], the prediction of the nodes' future popularity [280]), and the results presented in Section 6.1.2 only refer to one particular application. ...
Preprint
Full-text available
The observed architecture of ecological and socio-economic networks differs significantly from that of random networks. From a network science standpoint, non-random structural patterns observed in real networks call for an explanation of their emergence and an understanding of their potential systemic consequences. This article focuses on one of these patterns: nestedness. Given a network of interacting nodes, nestedness can be described as the tendency for nodes to interact with subsets of the interaction partners of better-connected nodes. Known since more than 80 years in biogeography, nestedness has been found in systems as diverse as ecological mutualistic organizations, world trade, inter-organizational relations, among many others. This review article focuses on three main pillars: the existing methodologies to observe nestedness in networks; the main theoretical mechanisms conceived to explain the emergence of nestedness in ecological and socio-economic networks; the implications of a nested topology of interactions for the stability and feasibility of a given interacting system. We survey results from variegated disciplines, including statistical physics, graph theory, ecology, and theoretical economics. Nestedness was found to emerge both in bipartite networks and, more recently, in unipartite ones; this review is the first comprehensive attempt to unify both streams of studies, usually disconnected from each other. We believe that the truly interdisciplinary endeavour -- while rooted in a complex systems perspective -- may inspire new models and algorithms whose realm of application will undoubtedly transcend disciplinary boundaries.
... Complex networks are models used to represent physical, chemical, biological, and social systems whose elements can be represented by nodes (or vertices), and interactions between those elements can be represented by links (or edges). Efficient algorithms and statistical methods have been developed in order to extract hidden characteristics of large and very large networks [1,2], sometimes relying on smaller versions of such networks [3]. ...
... Network reduction has been used to map large networks into small ones so the analysis and visualization can be simplified, while still preserving properties from the original ones, like eigenvectors with the largest eigenvalues [23]. Fundamental strategies consist on grouping nodes with similar characteristics (or communities) and verifying the relationship between the formed groups [3,24,25]. In public transportation systems, a similar approach -using the geographic location of stations to cluster them through K-means algorithm -revealed differences on how the city neighborhoods are served by bus routes [26]. ...
Article
Public transportation networks (PTNs) are represented as complex networks in order to analyze their robustness regarding node and link failures, to classify them into different theoretical network models, and to identify the characteristics of the underlying network. Usually, PTNs have a large amount of 1- and 2- degree nodes that blur the analysis and their characterization as complex networks. Subway and train-based transport networks present long single lines that connect central stations to far destinations differently from airport networks that usually have few large airports (hubs) connecting a significant number of small airports. By focusing on relevant network nodes and links and allowing comparisons between PTNs of different transportation modes, this paper proposes the Reduced Model as a simple method of network reduction that preserves the network skeleton (backbone structure) by properly removing 2-degree nodes of weighted and unweighted network representations. Different from other proposed methods, its simple formulation leads to mathematical expressions that show how the reduced model affects fundamental network metrics (degree, path length, and clustering coefficient distributions). The Reduced model is applied to four large real-world PTNs: (i) two Brazilian cities with bus-based transport; (ii) the Seoul metro network; (iii) a worldwide airport network. The results reveal a hub-based hierarchical structure when a large number of intermediary stops are present and small-world properties that emphasizes hub-hub connections after applying the Reduced model. Therefore, the reduced model emphasizes characteristics of the networks that could be difficult to identify without reduction.
... AUC (Area Under the receiver operating characteristic Curve) [36] is the most typical measure for performance evaluation in link prediction task. The value of AUC ranges from 0 to 1. AUC is equal to the probability that a higher score is given to a real link than a randomly chosen nonexistent link. ...
... Precision [36] is defined as the ratio of correct prediction in the top L predicted links. If there are m correct links among the top L links, the precision is defined as: ...
Article
Full-text available
Network motif is a sub-graph pattern may reflect structure functions or reveal underneath mechanics. Some scholars have used triadic motifs for link prediction and achieve convincing results. But most of them ignore the effectiveness of quad motifs in calculating the similarity of two endpoints. In this paper, a new method based on quad motifs is proposed. Two winner motifs are chosen among 199 subgraphs. Local information of motifs is also taken into consideration. Experiments on nine real networks show that the index we proposed can improve prediction accuracy, compared with seven well-known measures.
... Complex networks are an effective tool for this purpose, because they allow predicting the occurrence of events in the environment, which as a chaotic system is characterized by a high sensitivity to its initial conditions (i.e. it can be affected by the smallest disturbance) and by the emergence of new properties. This approach allows grasping the complexity of the environmental systems and enables the analysis of multidirectional interactions between all of its components, including those that generate impacts (Barabási, 2012;Ren et al. 2018). ...
... This is a logical and systemic approach that emerged from combinatorics and helps analysing and understanding the system by providing a simple way to represent relationships, and thus characterising their organisation or structure (Geetha and Sekar 2016). For a more thorough mathematical description of complex networks, please consult Boccaletti et al. (2006); Ch and Zhao (2016), Geetha and Sekar (2016) or Ren et al. (2018). ...
Article
This article proposes a complex network methodology for the process of Environmental Impact Assessment (EIA) that limits subjectivity and reduces uncertainty by incorporating elements of complex systems theory in the stages of identification and assessment of the significance of environmental impacts. The proposed methodology reduces the sources of uncertainty, which emerge from the use of simplified models that analyse the environment-activity interactions in a unidirectional fashion. This proposal determines the significance of environmental impacts through multidirectional or complex causal relationships. Likewise, it limits the subjectivity of the evaluator by using these causality relationships instead of criteria based on the impacts’ attributes. The application of the proposed methodology demonstrates the advantages of (i) prioritizing the impacts according to their capacity to interact with other impacts, and (ii) the possibility to redirect the environmental management plans towards the prevention of impacts of higher complexity and to reduce the importance of derived impacts. The application of the proposed methodology reveals that the percentage of irrelevant and moderate impacts is reduced, whereas the percentage of severe and critical impacts increase, in comparison to the conventional methodologies.
... With the development of higher-order network theory, classical centrality measures have been extended to simplicial complexes [26]. Traditional centrality measures-such as degree, betweenness, and closeness are widely used to assess node importance in complex networks [27,28]. However, these metrics focus primarily on pairwise interactions, limiting their effectiveness in high-order network contexts. ...
Article
Full-text available
World trade networks are exhaustively described by pairwise interactions, and overlook higher-order structure from the outcome of collective interactions at the level of groups of nodes like multilateral trade agreements. To address this limitation, we collect multiplex world trade networks, including the bilateral regional trade agreement network, which represents pairwise interactions; the multilateral regional trade agreement network, which naturally represents a higher-order network structure; and the import and export trade network, which represents pairwise interactions and additional complexities. The analysis of simplicial centrality, including degree, closeness, and subgraph at 0, 1, and 2-simplex levels, reveals that intra-level correlations are high, while inter-levels may exhibit significant disparities. Nodes with low centrality at higher-order levels could influence network robustness due to the diversity of interactions and higher-order dependencies. Simplicial centrality on robustness of multiplex world trade networks under random and targeted attacks reveals that the complex connectivity of higher-order levels renders them more vulnerable post-attack. An optimization strategy of the rebalancing of network centrality is proposed to enhance the robustness, and the simulation shows risks posed to central nodes are minimized and opportunities for peripheral nodes to partake in global trade are broadened.
... In network science, link prediction heuristics are used to extract information from a network structure to quantify the probability of the existence of links between nodes [3,[13][14][15]. For example, the preferential attachment (PA) index, which is based on node popularity, indicates that nodes with higher degrees have a greater probability of forming links with other nodes [16]. ...
Preprint
Full-text available
Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to construct multi-order popularity- and similarity-based network link predictors, demonstrating that traditional popularity- and similarity-based indices can be efficiently represented in terms of orbit degrees. Moreover, we designed a supervised learning model that fuses multiple orbit-degree-based features and validated its link prediction performance. We also evaluated the mean absolute Shapley additive explanations of each feature within this model across 550 real-world networks from six domains. We observed that the homophily mechanism, which is a similarity-based feature, dominated social networks, with its win rate being 91\%. Moreover, a different similarity-based feature was prominent in economic, technological, and information networks. Finally, no single feature dominated the biological and transportation networks. The proposed approach improves the accuracy and interpretability of link prediction, thus facilitating the analysis of complex networks.
... Network topology information can be used to capture the patterns and properties of the network structure, such as degree distribution, clustering coefficient, or centrality measures, which can also influence link formation. Based on this idea, structural similarity methods use network topology information to define the similarity between nodes by extracting their local, semi-local, or global structural patterns [26]. The trade-off of using structural similarity methods lies in their higher computational resource demands due to leveraging structural information. ...
Article
Full-text available
During the process of link prediction, traditional resource allocation methods only consider the influence of common neighbor nodes as transmission paths, while ignoring the impact of the effective resource amount of the topology structure surrounding these common neighbor nodes on link prediction performance. To address this limitation, this paper proposes a complex network link prediction method based on resource broadcast. Firstly, the paper provides a detailed analysis of the topology structure between the source node and the target node, presenting four different transmission paths. Secondly, in order to characterize the initial resources, the paper defines the effective resource amount after transmission through these four paths as the resource broadcast amount between nodes. Lastly, the similarity between nodes is characterized bidirectionally by considering the resource broadcast amount between nodes. Experiments conducted on 9 real network datasets demonstrate that, when compared with 8 other similarity-based indicators, this method achieves better prediction results according to benchmark evaluation indicators.
... The theory of spreading dynamics can be used to analyze many aspects of life, including healthy behaviors [1][2][3], social recommendations [4][5][6][7][8][9], advertising and promotion [10][11], and fashion trends. The adoption of popular trends is strengthened by the reinforcement effect, which can lead to further expansion. ...
Article
Full-text available
In the spreading dynamics of previous fashion trends, adoption researchers have neglected to consider that some individuals may behave differently from popular tendencies, which is called opposite-trend adoption behavior. To explore the dissemination mechanisms of the behavior, we first establish the adoption-against-trend model. Additionally, an edge division theory based on the adoption of opposite trends was proposed to quantitatively analyze this unique dissemination mechanism. This study presents three different degrees of opposite trends, each highlighting unique spreading scenarios. In the case of a strong opposite trend, no spreading occurs. In the case of a weak opposite trend, limited contact will accelerate information spreading, but it will not alter the mode of spreading. Nevertheless, in the case of a moderately opposite trend, the degree of the opposite trend alters the mode of spreading. Meanwhile, a cross-phase transition occurs. The findings of this paper can be applied to various areas, including social media and commercial trades.
... Network topology information can be used to capture the patterns and properties of the network structure, such as degree distribution, clustering coefficient, or centrality measures, which can also influence link formation. Based on this idea, structural similarity methods use network topology information to define the similarity between nodes by extracting their local, semi-local, or global structural patterns [19]. The trade-off of using structural similarity methods is that they require more computational resources as they use more structural information. ...
Preprint
Full-text available
During the process of link prediction, traditional resource allocation methods only consider the influence of common neighbor nodes as transmission paths, while ignoring the impact of the effective resource amount of the topology structure surrounding these common neighbor nodes on link prediction performance. To address this limitation, this paper proposes a complex network link prediction method based on resource broadcast. Firstly, the paper provides a detailed analysis of the topology structure between the source node and the target node, presenting four different transmission paths. Secondly, in order to characterize the initial resources, the paper defines the effective resource amount after transmission through these four paths as the resource broadcast amount between nodes. Lastly, the similarity between nodes is characterized bidirectionally by considering the resource broadcast amount between nodes.Experiments conducted on 9 real network datasets demonstrate that, when compared with 8 other similarity-based indicators, this method achieves better prediction results according to benchmark evaluation indicators.
... Among all existing link prediction algorithms, local structural similarity-based algorithms, such as CN [9], AA [17], and RA [18], are more competitive compared to other algorithms. Firstly, local structural similarity-based algorithms are preferred over other structural similarity-based algorithms due to their low algorithmic complexity and high accuracy [19]. Additionally, their efficient representation of node similarity using structural information enables better performance in short-path networks compared to embedding-based algorithms [20]. ...
Article
Full-text available
Link prediction plays a crucial role in discovering missing information and understanding evolutionary mechanisms in complex networks, so several algorithms have been proposed. However, existing link prediction algorithms usually rely only on structural information, limiting the potential for further accuracy improvement. Recently, the significance of node behaviour synchronization in network reconstruction has emerged. Both link prediction and network reconfiguration aim to reveal the underlying network structure, so node behavior synchronization has the potential to improve link prediction accuracy. In this study, we propose a mutual information-based method to quantitatively measure node behavior synchronization, which is more suitable for link prediction and yields more stable performance than the methods based on node behavior’s temporal similarity. Further, we propose a link prediction algorithm that combines local structural similarity with node behavior synchronization. Experimental results on real-life networks show that the proposed method is competitive in accuracy compared to methods relying solely on network structure or exploiting information about node behavior. In addition, the analysis of the prediction performance with different combination ratios reveals the role of node behaviour synchronization in different types of real networks. Our study not only improves the performance of link prediction, but also helps to reveal the role of node behavior synchronization in different types of networks.
... The authors of the article in [11] discuss various forecasting problems associated with the Internet, summarize and classify the relevant forecasting methods, analyze their advantages and disadvantages and indicate cutting-edge and critical problems in this area. ...
Article
Full-text available
This paper explores the social dynamics of processes in complex systems involving humans by focusing on user activity in online media outlets. The R/S analysis showed that the time series of the processes under consideration are fractal and anti-persistent (they have a short-term memory and a Hurst exponent significantly less than 0.5). Following statistical processing, the observed data showed that there is a small amount of asymmetry in the distribution of user activity change amplitudes in news comments; the amplitude distribution is almost symmetrical, but there is a heavy tail as the probability plots lie above the normal probability plot. The fractality of the time series for the observed processes could be due to the variables describing them (the time and level of a series), which are characterized by fractional variables of measurement. Therefore, when figuring out how to approximate functions to determine the probability density of their parameters, it is advisable to use fractional differential equations, such as those of the diffusion type. This paper describes the development of such a model and uses the observed data to analyze and compare the modeling results.
... Illustrative examples are spreading models for the propagation of epidemics [338], information, memes [339] or behaviors, and gives the theoretical background for predicting and controlling these processes [340]. In this regard, finding those paths causing the spreading on the network is crucial for implementing efficient strategies to either hinder dissemination (in the case of diseases), or speed up spreading (in the case of information) [341]. In CN with a broad degree distribution [329,330,342], hubs are the key agents involved in fast spreading process [208,342,343]. ...
Preprint
Full-text available
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems' persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definitions, concepts, methods, literature and latest results on persistence in complex systems. Firstly, the most used definitions of persistence in short-term and long-term cases are presented. The most relevant methods to characterize persistence are then discussed in both cases. A complete literature review is also carried out. We also present and discuss some relevant results on persistence, and give empirical evidence of performance in different detailed case studies, for both short-term and long-term persistence. A perspective on the future of persistence concludes the work.
... Complex networks (CNs) represent a wide range of applications in multifarious fields such as power grids, pattern recognition, automatic control, computational science, biochemistry, and so forth [1][2][3]. A complex network consists of a large number of interconnected nodes where each node corresponds to a non-linear system. ...
Article
Full-text available
This paper investigates the exponential synchronization for T‐S fuzzy complex networks (TSFCNs) with discontinuous activations and mixed time‐varying delays. Based on the IF‐THEN rules, the T‐S fuzzy model of complex networks has been obtained to approximate non‐linear dynamic systems via interpolating certain local linear system. A fuzzy sampled‐data control strategy is designed to achieve exponential stability for the TSFCN by constructing the bilateral time‐dependent Lyapunov–Krasovskii functional. Furthermore, according to the Filippov discontinuity theory, the exponential synchronization criteria of TSFCN with discontinuous activations and mixed time‐varying delays under the linear matrix inequality constraints are obtained. Finally, two numerical simulations are provided to illustrate the effectiveness and feasibility of the proposed method.
... Link prediction aims at estimating the likelihood of links between any two nodes by known partial network structure information [9]- [11]. A series of link prediction methods based on local structure similarity have been proposed [12]. For example, the indices of Adamic-Adar (AA) [13] and the number of Common Neighbors (CN) [14], [15] are able to achieve high performance in pioneering researches of link prediction in scientific collaboration networks. ...
Article
Full-text available
Scientist collaboration is of great significance for knowledge production and scientific development, and the prediction of connection and the intensity in collaboration networks are essential to understand collaboration relationships between scientists. In previous studies, most scholars only use local structure similarity to infer scientist collaboration modes, which leads to failure to accurately predict collaboration relationships between scientists. In this study, we propose a prediction method to identify missing links and link weight by using multiple motif features. The experimental results show that the highest improvement of performance in link prediction is 13.5%, and 86.8% in weight prediction. In addition, the correlation analysis on multiple motif features reveals topology correlation between different scientist collaboration modes. Our finding is helpful to predict link possibility and tie strength in scientist collaboration networks more accurately and understand deeply the evolution pattern of collaboration networks among scientists.
... The proposed edge label detection problem aims to detect edges with new labels from existing ones. The problem is different from previously-formulated problems that aim to predict old labels of sets of edges from the observed labeled edges, which has been widely studied in existing link annotation research [16][17][18], including the sign prediction [19][20][21][22][23] and link prediction [24][25][26]. We mathematically describe the proposed problem which has not been studied in existing research before, as follows. ...
Preprint
Networks representing complex systems in nature and society usually involve multiple interaction types. These types suggest essential information on the interactions between components, but not all of the existing types are usually discovered. Therefore, detecting the undiscovered edge types is crucial for deepening our understanding of the network structure. Although previous studies have discussed the edge label detection problem, we still lack effective methods for uncovering previously-undetected edge types. Here, we develop an effective technique to detect undiscovered new edge types in networks by leveraging a novel temporal network model. Both analytical and numerical results show that the prediction accuracy of our method is perfect when the model networks' time parameter approaches infinity. Furthermore, we find that when time is finite, our method is still significantly more accurate than the baseline.
... Such inference is not the subject of this paper. Relevant discussions can be found in [29,30] for network modeling. Combination the inference of network and dynamics will be studied in future work. ...
Article
Full-text available
The degradation and recovery processes are multi-scale phenomena in many physical, engineering, biological, and social systems, and determine the aging of the entire system. Therefore, understanding the interplay between the two processes at the component level is the key to evaluate the reliability of the system. Based on the principle of maximum entropy, an approach is proposed to model and infer the processes at the component level, and is applied to repairable and non-repairable systems. By incorporating the reliability block diagram, this approach allows for integrating the information of network connectivity and statistical moments to infer the hazard or recovery rates of the degradation or recovery processes. The overall approach is demonstrated with numerical examples.
... This technology predicts whether there can or cannot be links between two nodes in the network through a known network structure [18]. It can also be used to predict undiscovered links and future possibilities [19]. There are three main types of link prediction algorithms: graph topology, data mining classification, and network modeling probability model algorithms. ...
Article
Full-text available
With the increasing complexity of scientific research and the expanding scale of projects, scientific research cooperation is an important trend in large-scale research. The analysis of co-authorship networks is a big data problem due to the expanding scale of the literature. Without sufficient data mining, research cooperation will be limited to a similar group, namely, a “small group”, in the co-author networks. This “small group” limits the research results and openness. However, the researchers are not aware of the existence of other researchers due to insufficient big data support. Considering the importance of discovering communities and recommending potential collaborations from a large body of literature, we propose an enhanced clustering algorithm for detecting communities. It includes the selection of an initial central node and the redefinition of the distance and iteration of the central node. We also propose a method that is based on the hilltop algorithm, which is an algorithm that is used in search engines, for recommending co-authors via link analysis. The co-author candidate set is improved by screening and scoring. In screening, the expert set formation of the hilltop algorithm is added. The score is calculated from the durations and quantity of the collaborations. Via experiments, communities can be extracted, and co-authors can be recommended from the big data of the scientific research literature.
... A high clustering coefficient means that the nodes in a real-life network tend to form triangular structures [32], which is the simplest motif can be used for sign or link prediction. Counting the number of triangular motifs is equal to the method of sign prediction based on common neighbors [33]. However, in a signed network, there is more than one single type of connection between nodes. ...
Article
Full-text available
Sign prediction is a significant research content in signed social networks, so it has attracted increasing attention from the field of online social networks recently. Traditionally, the basic idea of motif-based predictive method is to calculate the motif number on the predicted edge (i.e., the single edge-dependent motif based method), and then use a machine learning predictor for sign prediction. Although this intuition-based method can achieve great performance for sign prediction, up to now its reasonability has not been proved theoretically. Furthermore, the method of counting the number of edge-dependent motif can not distinguish the distinct role of each node on the neighborhood of the predicted edge. In this study, firstly we propose a Single Motif Naive Bayes (SMNB) model for sign prediction, which can not only explain why the single edge-dependent motif based method is efficient, but also quantify the role of each neighbor node (for 3-node predictors) or neighbor edge (for 4-node predictors) which is connected by the predicted edge for the task of sign prediction. Then, we extend SMNB by merging two types of motifs, and propose a Two Motif Naive Bayes (TMNB) model. Experimental results on real-world networks indicate that the proposed algorithms outperform the state-of-the-art approaches. Finally, we explore the intrinsic relationship among different motifs according to the matrix of Maximal Information Coefficient (MIC). Our research not only extends the traditional motif theory by proving the rationality of the method based on edge-dependent motifs and distinguishing a node (or an edge) contribution for sign prediction, but also is helpful to further understand the evolution mechanism of signed social networks based on the correlation among different motifs.
... The existing link prediction algorithms can be divided into two categories: structural similarity algorithms in network domain [18] and network embedding algorithms in the field of machine learning [19]. Traditionally, link prediction methods in network domain tend to use micro-scale topology properties, such as link existence [11], link weights [20], common neighbors [21], node degree [17], [22], clustering coefficient [23], to predict the links. ...
Article
Full-text available
The existing studies of link prediction mostly focused on using network topology properties to improve the accuracy of link prediction. More broadly, researches on the role of community structures for link prediction have recently received increasing attention. In this study, we propose a succinct algorithm that is built on community structures to improve the performance of link prediction, and it has been verified by both of synthetic benchmarks and real-world networks. More importantly, we introduce different null models to study the role of community structures on link prediction more carefully. Firstly, it is found that clearer community structures correspond to the higher performance of link prediction algorithms that are based on community information. Secondly, the role of links within a community and that between two communities are further distinguished. The edges within one community play a vital role for link prediction of the whole network, and conversely the edges between two communities have a minimal effect on that. At last, we reveal the relationship and dependence between this special meso-scale structure (community) and micro-scale structures of different orders (i.e., degree distribution, assortativity, and transitivity) for link prediction.
... PINs change with time and conditions to ensure normal life activities. The dynamic of cells in time and space plays a vital role in the replication and viability of organisms [111][112][113][114]. Traditional methods construct static networks to analyze complex networks under a single condition, but static networks cannot reveal the dynamic of PINs. ...
Article
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein-protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
... First, we introduce several important types of biological networks, including the protein-protein interaction, isoform-isoform networks, genetic interactions, metabolic associations, neuron-neuron interactions, and drug-target interactions, and we also provide some data resources in Sec. 2. In Sec. 3, we review the network structure in biological systems, focusing on two points: biological network structural properties and the mapping between the cell function and topological structure of biological networks. In Sec. 4, we introduce the network-based methods in network biology, which can be roughly divided into network-structure-oriented methods [31] and machine-learning-based methods. In Sec. 5, we introduce the application of the computational network biology, concentrating on discuss how to use the computational network biology to interpret human disease, neuroscience, and drug development. ...
Article
Biological entities are involved in intricate and complex interactions, in which uncovering the biological information from the network concepts are of great significance. Benefiting from the advances of network science and high-throughput biomedical technologies, studying the biological systems from network biology has attracted much attention in recent years, and networks have long been central to our understanding of biological systems, in the form of linkage maps among genotypes, phenotypes, and the corresponding environmental factors. In this review, we summarize the recent developments of computational network biology, first introducing various types of biological networks and network structural properties. We then review the network-based approaches, ranging from some network metrics to the complicated machine-learning methods, and emphasize how to use these algorithms to gain new biological insights. Furthermore, we highlight the application in neuroscience, human disease, and drug developments from the perspectives of network science, and we discuss some major challenges and future directions. We hope that this review will draw increasing interdisciplinary attention from physicists, computer scientists, and biologists.
... The World Trade Network has attracted the attention of researchers in many fields and has become an important research field in studying the economic development of countries. By studying the structure and dynamics of trade networks, physicists have made it possible to explain the state of development and potential of the country's economy from the complex interactions among nations [1]. Hausmann and Hidalgo et al. [2,3] proposed the Economic Complexity Index (ECI) to measure diversification of a country and the ubiquity of a product. ...
Article
Full-text available
GDP is a classic indicator of the extent of national economic development. Research based on the World Trade Network has found that a country’s GDP depends largely on the products it exports. In order to increase the competitiveness of a country and further increase its GDP, a crucial issue is finding the right direction to upgrade the industry so that the country can enhance its competitiveness. The proximity indicator measures the similarity between products and can be used to predict the probability that a country will develop a new industry. On the other hand, the Fitness–Complexity algorithm can help to find the important products and developing countries. In this paper, we find that the maximum of the proximity between a certain product and a country’s existing products is highly correlated with the probability that the country exports this new product in the next year. In addition, we find that the more products that are related to a certain product, the higher probability of the emergence of the new product. Finally, we combine the proximity indicator and the Fitness–Complexity algorithm and then attempt to provide a recommendation list of new products that can help developing countries to upgrade their industry. A few examples are given in the end.
... In this sense, the link prediction accuracy depends on the understanding of network specificities, that is, whether the link prediction algorithm can well reflect the corresponding mechanisms of network organization. However, even the most recent reviews of predictions in complex networks [10,11] do not specifically consider link predictions in nested networks. Our goal here is to fill the gap and provide a comparison of how various link prediction methods perform in nested networks. ...
Article
Full-text available
Real networks typically studied in various research fields—ecology and economic complexity, for example—often exhibit a nested topology, which means that the neighborhoods of high-degree nodes tend to include the neighborhoods of low-degree nodes. Focusing on nested networks, we study the problem of link prediction in complex networks, which aims at identifying likely candidates for missing links. We find that a new method that takes network nestedness into account outperforms well-established link-prediction methods not only when the input networks are sufficiently nested, but also for networks where the nested structure is imperfect. Our study paves the way to search for optimal methods for link prediction in nested networks, which might be beneficial for World Trade and ecological network analysis.
Article
Full-text available
Introduction: This study explores the impact of global economic volatility, particularly influenced by the Russia-Ukraine and Israel-Palestine conflicts, on the ASEAN stock markets. The research aims to analyze stock price patterns and trends to support sustainable economic planning and improve market stability.Methods: The study employed non-hierarchical clustering techniques, including K-Means and K-Medoids, to analyze time series data from 18 ASEAN stocks over a 10-year period. Data preprocessing involved Min-Max normalization, and Principal Component Analysis (PCA) was utilized for dimensionality reduction. The clustering performance was evaluated using silhouette coefficients, and the Elbow Method determined the optimal number of clusters.Results: K-Means demonstrated superior clustering performance with a silhouette coefficient of 0.63362 compared to K-Medoids (0.37133). The K-Means method identified seven distinct clusters, effectively grouping stocks with similar temporal patterns. The results revealed significant trends in price stability and volatility across different sectors.Conclusions: The findings highlight the value of clustering techniques in understanding market dynamics and provide actionable insights for policymakers and investors. The study recommends the development of real-time market monitoring systems to mitigate price fluctuations and support sustainable economic growth in ASEAN. Future research could explore integrating machine learning models for enhanced market analysis
Article
The limitations of classical node centralities such as degree, closeness, betweenness and eigenvector are rooted in the network topology structure. For a deeper understanding, we regulate the basic network topology structure clustering and assortative coefficient to study the effect on these four classical node centralities. To observe the structural diversity of the complex network, we firstly construct two types of the growing scale-free networks with tunable clustering coefficient and assortative coefficient respectively, and simulate three types of null models on ten real networks to adjust cluster and assortativity. The results indicate that the impact of varying cluster and assortativity on node centrality in complex networks is obvious. We should pay more attention to the network topology when selecting node centralities as identifying the significance or influence of nodes in complex networks.
Article
Full-text available
Temporal network link prediction is an important task in the field of network science, and has a wide range of applications in practical scenarios. Revealing the evolutionary mechanism of the network is essential for link prediction, and how to effectively utilize the historical information for temporal links and efficiently extract the high-order patterns of network structure remains a vital challenge. To address these issues, in this paper, we propose a novel temporal link prediction model with adjusted sigmoid function and 2-simplex structure (TLPSS). The adjusted sigmoid decay mode takes the active, decay and stable states of edges into account, which properly fits the life cycle of information. Moreover, the latent matrix sequence is introduced, which is composed of simplex high-order structure, to enhance the performance of link prediction method since it is highly feasible in sparse network. Combining the life cycle of information and simplex high-order structure, the overall performance of TLPSS is achieved by satisfying the consistency of temporal and structural information in dynamic networks. Experimental results on six real-world datasets demonstrate the effectiveness of TLPSS, and our proposed model improves the performance of link prediction by an average of 15% compared to other baseline methods.
Article
Major infectious disease epidemic (MIDE) poses a great threat to human survival and development. It is critical to the MIDE prevention and control to figure out the risk influencing factors that may lead to MIDE outbreaks. MIDE risk identification is the starting point and the basis of risk management. This study conducts the risk identification of MIDE based on complex network theory. To this end, we create MIDE risk network and improve the classical Leaderrank algorithm with the idea of biased random wandering adopted. SLR1 algorithm and SLR2 algorithm are proposed. And SLR1 and SLR2 algorithms are compared with Leaderrank and Pagerank algorithms, based on which we select the best performing algorithm from SLR1 and SLR2 algorithms as the novel algorithm proposed in this study. And we use the best performing algorithm to complete the risk identification of MIDE. Results show that MIDE risk network has such properties as small-world and scale-free. Under targeted attacks the risk network exhibits high vulnerability. Both SLR1 and SLR2 outperform the other two algorithms, and SLR2 demonstrates the best performance. Therefore, SLR2 is used to rank the importance of risk factors. Fifteen key risk factors are identified which are related to the vulnerability of personnel, equipment, resources, environment and management, and the risk receptor exposure. The validity of SLR2 implementation in MIDE risk identification is verified from theoretical and practical perspective. This study facilitates MIDE risk reduction and thus improves MIDE risk management. What's more, the proposed SLR2 algorithm can be used for the risk identification of other disasters.
Article
Studying networked systems in a variety of domains, including biology, social science and internet of things, has recently received a surge of attention. For a networked system, there are usually multiple types of interactions between its components, and such interaction type information is crucial since it always associated with important features. However, some interaction types which actually exist in the network may not be observed in the metadata collected in practice. This paper proposes an approach aiming to detect previously undiscovered interaction types (PUITs) in networked systems. The first step in our proposed PUIT detection approach is to answer the following fundamental question: is it possible to effectively detect PUITs without utilizing metadata other than the existing incomplete interaction type information and the connection information of the system? Here, we first propose a temporal network model which can be used to mimic any real network and then discover that some special networks which fit the model shall a common topological property. Supported by this discovery, we finally develop a PUIT detection method for networks which fit the proposed model. Both analytical and numerical results show this detection method is more effective than the baseline method, demonstrating that effectively detecting PUITs in networks is achievable. More studies on PUIT detection are of significance and in great need since this approach should be as essential as the previously undiscovered node type detection which has gained great success in the field of biology.
Article
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems’ persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definitions, concepts, methods, literature and latest results on persistence in complex systems. Firstly, the most used definitions of persistence in short-term and long-term cases are presented. The most relevant methods to characterize persistence are then discussed in both cases. A complete literature review is also carried out. We also present and discuss some relevant results on persistence, and give empirical evidence of performance in different detailed case studies, for both short-term and long-term persistence. A perspective on the future of persistence concludes the work.
Article
Currently, vaccination is the most effective means to prevent the spread of infectious diseases. In this paper, a novel SIRV-NI-EG (susceptible, infected, recovered, vaccinated - node importance - evolutionary game) model is established to analyze the evolution of vaccination strategy under the combination of mandatory vaccination and voluntary vaccination. For the mandatory vaccination, some nodes with high node importance are firstly vaccinated in a certain proportion according to the node importance ranking. The remaining nodes in the network voluntarily decide whether to vaccinate according to the surrounding situation based on the evolutionary game theory. And degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, k-core, structural holes and WTOPSIS are used to evaluate the node importance in the network. In addition, the methods based on node deletion are used to further evaluate the importance of the initial vaccination nodes. Finally, vaccination evolutionary game analysis based on the SIRV-NI-EG model is performed on three complex networks, including USAir network, Facebook network and BA scale-free network. The results show that the performances of all evaluation indicators are better than random vaccination. Our conclusions can provide better vaccination strategies for government decision-making to control the spread of infectious diseases.
Conference Paper
Full-text available
This article considers current research prospects, theories and mathematical formalism related to the value of outcomes of digitalisation, their informational value for actions and using information for various types of activities in an array of systems (e.g. production, business, economic, social and organisational systems). Following a review of literature on digitalisation, digitalisation in organisations and the digital economy and society, theories and mathematical techniques for modelling the use of information in various kinds of systems activity were identified, as were blind spots and gaps in research on information used in system actions. In particular, a multidisciplinary gap exists between the need to solve problems in using information for further actions in various systems (i.e. mathematical and theoretical systems-related problems) and available theoretical and mathematical means to solve such problems. Difficult to overcome given its multidisciplinary nature, that gap appears on the border of systems theory, human action theory, economic theory, organisation studies, cybernetics, psychology, theory of mind and mathematics. In response, this article also considers the role of information actions in system action and proposes a mathematical theory on using information in such actions.
Article
A supply chain system can be considered as an interdependent supply chain network (ISCN) which consists of an undirected cyber-network and a directed physical-network. To survive against disruptions, an ISCN needs to maintain operations and connectedness, referred to as robustness. Studies on the robustness of ISCNs when considering both functional and structural cascading failures are still scarce. In this paper, we first propose a cascading failure model which considers these two cascading failures simultaneously. We also present a model to generate ISCNs with different network types and interconnecting patterns. Using the transition threshold based on the proposed all-type connected sub-network, we can evaluate the robustness of ISCNs more properly. We then conduct numerical simulations to investigate how some parameters (e.g., network type, interconnecting pattern, the distribution of different types of nodes, etc.) affect the robustness of ISCNs under random and targeted disruptions. The results mainly show that the robustness of ISCNs can be affected seriously by different network types, interconnecting patterns, and disruption types; and the distribution of different types of nodes is more uniform, the corresponding ISCN is more robust, no matter what the disruption type is. Our results may provide help for building robust ISCNs.
Article
The ubiquity of user-item interactions makes it essential and challenging to utilize the rich variety of hidden structural and temporal information for effective and efficient recommendation. In this work, our goal is to address the limitations of existing research: (i) inadequacy of popularity trend prediction and temporal recommendation (ii) failure to clarify the influence and mechanism of structural characteristics and the temporal evolution on recommendation. To this end, we first construct time sequences of grown popularity of items. Then we propose a family of time-series predictive models to predict the growing trend of popularity. Furthermore, we exploit the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton optimization algorithm (BFGS) to adjust the predictive parameters adaptively. Moreover, to investigate the influence and interaction mechanism of structural and temporal information on recommendation, we propose a novel Hybrid Network Adaptive Time Series recommendation framework (HNATS), which improves synchronously the recommendation performance. Finally, we conduct comprehensive experiments on four real-world datasets of different sizes and time spans. The experimental results demonstrate that our proposed predictive models can capture the hidden temporal patterns and the HNATS method surpasses those compared state-of-the-art temporal ones, including the popularity-based, the time decay-based, and the Markov-based baselines.
Article
Crucial to the physicists’ strong interest in the field is the fact that such macroscopic properties typically arise as the result of a myriad of interactions between the system constituents. Network science aims at simplifying the study of a given complex system by representing it as a network, a collection of nodes and edges interconnecting them. Nowadays, it is widely recognized that some of the structural traits of networks are in fact ubiquitous properties in real systems. The identification and prediction of node influence are of great theoretical and practical significance to be known as a hot research field of complex networks. Most of current research advance is focused on static network or a snapshot of dynamic networks at a certain moment. However, in practical application scenarios, mostly complex networks extracted from society, biology, information, technology are evolving dynamically. Therefore, it is more meaningful to evaluate the node's influence in the dynamic network and predict the future influence of the node, especially before the change of the network structure. In this summary, we contribute on reviewing the improvement of node influence in dynamical networks, which involves three tasks: algorithmic complexity and time bias in growing networks; algorithmic applicability in time varying networks; algorithmic robustness in a dynamical network with small or sharp perturbation. Furthermore, we overview the framework of economic complexity based on dynamical network structure. Lastly, we point out the forefront as well as critical challenges of the field.
Article
Over the years, quantifying the similarity of nodes has been a hot topic in network science, yet little has been known about the distribution of node-similarity. In this paper, we consider a typical measure of node-similarity called the common neighbor based similarity (CNS). By means of the generating function, we propose a general framework for calculating the CNS distributions of node sets in various networks. Particularly, we show that for the Erd's-Rnyi random network, the CNS distribution of node sets of any size obeys the Poisson law. Furthermore, we connect the node-similarity distribution to the link prediction problem, and derive analytical solutions for two key evaluation metrics: i) precision and ii) area under the receiver operating characteristic curve (AUC). We also use the similarity distributions to optimize link prediction by i) deriving the expected prediction accuracy of similarity scores and ii) providing the optimal prediction priority of unconnected node pairs. Simulation results confirm our theoretical findings and also validate the proposed tools in evaluating and optimizing link prediction.
Article
Link prediction has been widely applied in social network analysis. Existing studies on link prediction assume the network to be undirected, while most realistic social networks are directed. In this paper, we design a simple but effective method of link prediction in directed social networks based on common interest and local community. The proposed method quantifies the contributions of neighbors with analysis on the information exchange process among nodes. It captures both the essential motivation of link formation and the effect of local community in social networks. We validate the effectiveness of our method with comparative experiments on nine realistic networks. Empirical studies show that the proposed method is able to achieve better prediction performance under three standard evaluation metrics, with great robustness on the size of training set.
Article
As an elementary task in statistical physics and network science, link prediction has attracted great attention of researchers from many fields. While numerous similarity-based indices have been designed for undirected networks, link prediction in directed networks has not been thoroughly studied yet. Among several representative works, motif predictors such as “feed-forward-loop” and Bi-fan predictor perform well in both accuracy and efficiency. Nevertheless, they fail to explicitly explain the linkage motivation of nodes, nor do they consider the unequal contributions of different neighbors between node pairs. In this paper, motivated by the investment theory in economics, we propose a universal and explicable model to quantify the contributions of nodes on driving link formation. Based on the analysis on two typical investment relationships, namely “follow-up” and “co-follow”, an investment-profit index is designed for link prediction in directed networks. Empirical studies on 12 static networks and four temporal networks show that the proposed method outperforms eight mainstream baselines under three standard metrics. As a quasi-local index, it is also suitable for large-scale networks.
Article
Link prediction can discover missing information and evolution mechanism of complex networks, so a huge number of novel algorithms have been proposed recently. However, the existing link prediction algorithms for directed signed networks only depend on motifs that satisfy status theory, and other types of motifs are rarely taken into account. In this study, first we propose a link prediction method based on the number of edge-dependent motifs, and explain it by a naive Bayes model. Furthermore, we put forward a Signed Local Naive Bayes (SLNB) model based on two kinds of different motifs, which has higher prediction performance than only considering a single motif. Finally, we combine all the 3-node motifs to form a motif family, and use a machine learning framework for link prediction. The results show that motif families can greatly improve the performance of link prediction. Moreover, according to the correlation between these predictors, the intrinsic relationship between different motifs can be discovered, and the computational complexity of link prediction can be reduced after feature selection. Our research can not only improve the performance of link prediction, but also be helpful to uncover the evolutionary mechanism of signed social networks.
Article
Full-text available
Link prediction plays a significant role in various applications of complex networks. The existing link prediction methods can be divided into two categories: structural similarity algorithms in network domain and network embedding algorithms in the field of machine learning. However, few researchers focus on comparing these two categories of algorithms and exploring the intrinsic relationship between them. In this study, we systematically compare the two categories of algorithms and study the shortcomings of network embedding algorithms. The results indicate that network embedding algorithms have poor performance in short-path networks. Then, we explain the reasons for this phenomenon by computing the Euclidean distance distribution of node pairs after a given network has been embedded into a vector space. In the vector space of a short-path network, the distance distribution of existent and nonexistent links are often less distinguishable, which can sharply reduce the algorithmic performance. In contrast, structural similarity algorithms, which are not restricted by the distance function, can represent node similarity accurately in short-path networks. To address the above pitfall of network embedding, we propose a novel method for link prediction aiming to supplement network embedding algorithms with local structural information. The experimental results suggest that our proposed algorithm has significant performance improvement in many empirical networks, especially in short-path networks. AUC and Precision can be improved by 36.7%–94.4% and 53.2%–207.2%, respectively.
Chapter
The scale of projects and literatures have been continuously expanded and become more complex with the development of scientific research. Scientific cooperation has become an important trend in the scientific research. Analysis of the co-author network is a big data problem. Without enough data mining, the research cooperation will be limited to some same group, named as “small group” in the co-author networks. This situation has led to the researchers’ lack of openness and limited scientific research results. It is important to recommend some potential collaboration from huge amount of literature. We propose a method based on Hilltop algorithm, an algorithm in search engine, to recommend co-authors by link analysis. The candidate set is screening and scored for recommendation. By setting certain rules, the expert set formation of the Hilltop algorithm is added to the screening. And the score is calculated by the durations and times of the collaborations. The co-authors can be extracted and recommended from the big data of the scientific research literatures through the experiments.
Article
Identifying significant works in the field of arts and sciences should avoid age preference. Recent research has shown that the time-balanced variant of the popular Google's PageRank and degree for identifying significant works could provide an objective approach to eliminate the age preference in growing networks. However, a fundamental question remains open: How much performance capability do time-balanced metrics expect when they identify significant nodes in the growing networks? Through investigating of two large time-aggregated citations networks between movies procured from the Internet Movie Database and papers published on the journals of American Physical Society respectively, we analyze the age preference of several time-balanced metrics of PageRank and degree for identifying significant nodes in comparison.
Article
Full-text available
China has experienced an outstanding economic expansion during the past decades, however, literature on non-monetary metrics that reveal the status of China's regional economic development are still lacking. In this paper, we fill this gap by quantifying the economic complexity of China's provinces through analyzing 25 years' firm data. First, we estimate the regional economic complexity index (ECI), and show that the overall time evolution of provinces' ECI is relatively stable and slow. Then, after linking ECI to the economic development and the income inequality, we find that the explanatory power of ECI is positive for the former but negative for the latter. Next, we compare different measures of economic diversity and explore their relationships with monetary macroeconomic indicators. Results show that the ECI index and the non-linear iteration based Fitness index are comparative, and they both have stronger explanatory power than other benchmark measures. Further multivariate regressions suggest the robustness of our results after controlling other socioeconomic factors. Our work moves forward a step towards better understanding China's regional economic development and non-monetary macroeconomic indicators.
Article
Full-text available
The science of science (SOS) is a rapidly developing field which aims to understand, quantify and predict scientific research and the resulting outcomes. The problem is essentially related to almost all scientific disciplines and thus has attracted attention of scholars from different backgrounds. Progress on SOS will lead to better solutions for many challenging issues, ranging from the selection of candidate faculty members by a university to the development of research fields to which a country should give priority. While different measurements have been designed to evaluate the scientific impact of scholars, journals and academic institutions, the multiplex structure, dynamics and evolution mechanisms of the whole system have been much less studied until recently. In this article, we review the recent advances in SOS, aiming to cover the topics from empirical study, network analysis, mechanistic models, ranking, prediction, and many important related issues. The results summarized in this review significantly deepen our understanding of the underlying mechanisms and statistical rules governing the science system. Finally, we review the forefront of SOS research and point out the specific difficulties as they arise from different contexts, so as to stimulate further efforts in this emerging interdisciplinary field.
Article
Full-text available
Recent studies on the controllability of complex systems offer a powerful mathematical framework to systematically explore the structure-function relationship in biological, social, and technological networks. Despite theoretical advances, we lack direct experimental proof of the validity of these widely used control principles. Here we fill this gap by applying a control framework to the connectome of the nematode Caenorhabditis elegans, allowing us to predict the involvement of each C. elegans neuron in locomotor behaviours. We predict that control of the muscles or motor neurons requires 12 neuronal classes, which include neuronal groups previously implicated in locomotion by laser ablation, as well as one previously uncharacterized neuron, PDB. We validate this prediction experimentally, finding that the ablation of PDB leads to a significant loss of dorsoventral polarity in large body bends. Importantly, control principles also allow us to investigate the involvement of individual neurons within each neuronal class. For example, we predict that, within the class of DD motor neurons, only three (DD04, DD05, or DD06) should affect locomotion when ablated individually. This prediction is also confirmed; single cell ablations of DD04 or DD05 specifically affect posterior body movements, whereas ablations of DD02 or DD03 do not. Our predictions are robust to deletions of weak connections, missing connections, and rewired connections in the current connectome, indicating the potential applicability of this analytical framework to larger and less well-characterized connectomes.
Article
Full-text available
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Well-established ranking algorithms (such as the popular Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. The recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of real network traffic, prediction of future links, and identification of highly-significant nodes.
Article
Full-text available
A central question in science of science concerns how time affects citations. Despite the long-standing interests and its broad impact, we lack systematic answers to this simple yet fundamental question. By reviewing and classifying prior studies for the past 50 years, we find a significant lack of consensus in the literature, primarily due to the coexistence of retrospective and prospective approaches to measuring citation age distributions. These two approaches have been pursued in parallel, lacking any known connections between the two. Here we developed a new theoretical framework that not only allows us to connect the two approaches through precise mathematical relationships, it also helps us reconcile the interplay between temporal decay of citations and the growth of science, helping us uncover new functional forms characterizing citation age distributions. We find retrospective distribution follows a lognormal distribution with exponential cutoff, while prospective distribution is governed by the interplay between a lognormal distribution and the growth in the number of references. Most interestingly, the two approaches can be connected once rescaled by the growth of publications and citations. We further validate our framework using both large-scale citation datasets and analytical models capturing citation dynamics. Together this paper presents a comprehensive analysis of the time dimension of science, representing a new empirical and theoretical basis for all future studies in this area.
Article
Full-text available
Locating sources of diffusion and spreading from minimum data is a significant problem in network science with great applied values to the society. However, a general theoretical framework dealing with optimal source localization is lacking. Combining the controllability theory for complex networks and compressive sensing, we develop a framework with high efficiency and robustness for optimal source localization in arbitrary weighted networks with arbitrary distribution of sources. We offer a minimum output analysis to quantify the source locatability through a minimal number of messenger nodes that produce sufficient measurement for fully locating the sources. When the minimum messenger nodes are discerned, the problem of optimal source localization becomes one of sparse signal reconstruction, which can be solved using compressive sensing. Application of our framework to model and empirical networks demonstrates that sources in homogeneous and denser networks are more readily to be located. A surprising finding is that, for a connected undirected network with random link weights and weak noise, a single messenger node is sufficient for locating any number of sources. The framework deepens our understanding of the network source localization problem and offers efficient tools with broad applications.
Working Paper
Full-text available
Industrial development is the process by which economies learn how to produce new products and services. But how do economies learn? And who do they learn from? The literature on economic geography and economic development has emphasized two learning channels: inter-industry learning, which involves learning from related industries; and inter-regional learning, which involves learning from neighboring regions. Here we use 25 years of data describing the evolution of China's economy between 1990 and 2015--a period when China multiplied its GDP per capita by a factor of ten--to explore how Chinese provinces diversified their economies. First, we show that the probability that a province will develop a new industry increases with the number of related industries that are already present in that province, a fact that is suggestive of inter-industry learning. Also, we show that the probability that a province will develop an industry increases with the number of neighboring provinces that are developed in that industry, a fact suggestive of inter-regional learning. Moreover, we find that the combination of these two channels exhibit diminishing returns, meaning that the contribution of either of these learning channels is redundant when the other one is present. Finally, we address endogeneity concerns by using the introduction of high-speed rail as an instrument to isolate the effects of inter-regional learning. Our differences-in-differences (DID) analysis reveals that the introduction of high speed-rail increased the industrial similarity of pairs of provinces connected by high-speed rail. Also, industries in provinces that were connected by rail increased their productivity when they were connected by rail to other provinces where that industry was already present. These findings suggest that inter-regional and inter-industry learning played a role in China's great economic expansion.
Article
Full-text available
As an important type of dynamics on complex networks, spreading is widely used to model many real processes such as the epidemic contagion and information propagation. One of the most significant research questions in spreading is to rank the spreading ability of nodes in the network. To this end, substantial effort has been made and a variety of effective methods have been proposed. These methods usually define the spreading ability of a node as the number of finally infected nodes given that the spreading is initialized from the node. However, in many real cases such as advertising and news propagation, the spreading only aims to cover a specific group of nodes. Therefore, it is necessary to study the spreading ability of nodes towards localized targets in complex networks. In this paper, we propose a reversed local path algorithm for this problem. Simulation results show that our method outperforms the existing methods in identifying the influential nodes with respect to these localized targets. Moreover, the influential spreaders identified by our method can effectively avoid infecting the non-target nodes in the spreading process.
Article
Full-text available
Understanding the behavior of users in online systems is of essential importance for sociology, system design, e-commerce, and beyond. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce platforms, tend to what is already popular. We propose a statistical time-aware framework to identify the users who differ from the usual behavior by being repeatedly and persistently among the first to collect the items that later become hugely popular. Since these users effectively discover future hits, we refer them as discoverers. We use the proposed framework to demonstrate that discoverers are present in a wide range of real systems. Once identified, discoverers can be used to predict the future success of new items. We finally introduce a simple network model which reproduces the discovery patterns observed in the real data. Our results open the door to quantitative study of detailed temporal patterns in social systems.
Article
Full-text available
We analyse global export data within the Economic Complexity framework. We couple the new economic dimension Complexity, which captures how sophisticated products are, with an index called logPRODY, a measure of the income of the respective exporters. Products’ aggregate motion is treated as a 2-dimensional dynamical system in the Complexity-logPRODY plane. We find that this motion can be explained by a quantitative model involving the competition on the markets, that can be mapped as a scalar field on the Complexity-logPRODY plane and acts in a way akin to a potential. This explains the movement of products towards areas of the plane in which the competition is higher. We analyse market composition in more detail, finding that for most products it tends, over time, to a characteristic configuration, which depends on the Complexity of the products. This market configuration, which we called asymptotic, is characterized by higher levels of competition.
Article
Full-text available
Recommender systems are designed to effectively support individuals' decision-making process on various web sites. It can be naturally represented by a user-object bipartite network, where a link indicates that a user has collected an object. Recently, research on the information backbone has attracted researchers' interests, which is a sub-network with fewer nodes and links but carrying most of the relevant information. With the backbone, a system can generate satisfactory recommenda- tions while saving much computing resource. In this paper, we propose an enhanced topology-aware method to extract the information backbone in the bipartite network mainly based on the information of neighboring users and objects. Our backbone extraction method enables the recommender systems achieve more than 90% of the accuracy of the top-L recommendation, however, consuming only 20% links. The experimental results show that our method outperforms the alternative backbone extraction methods. Moreover, the structure of the information backbone is studied in detail. Finally, we highlight that the information backbone is one of the most important properties of the bipartite network, with which one can significantly improve the efficiency of the recommender system.
Article
Full-text available
The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as topological descriptors itself is not assessed, and without this it is not possible to ultimately distinguish between actual shortcomings of the community detection algorithms, on one hand, and the incompleteness, inaccuracy, or structured nature of the data annotations themselves, on the other. In this work, we present a principled method to access both aspects simultaneously. We construct a joint generative model for the data and metadata, and a nonparametric Bayesian framework to infer its parameters from annotated data sets. We assess the quality of the metadata not according to their direct alignment with the network communities, but rather in their capacity to predict the placement of edges in the network. We also show how this feature can be used to predict the connections to missing nodes when only the metadata are available, as well as predicting missing metadata. By investigating a wide range of data sets, we show that while there are seldom exact agreements between metadata tokens and the inferred data groups, the metadata are often informative of the network structure nevertheless, and can improve the prediction of missing nodes. This shows that the method uncovers meaningful patterns in both the data and metadata, without requiring or expecting a perfect agreement between the two.
Article
Full-text available
Citations between scientific papers and related bibliometric indices, such as the h-index for authors and the impact factor for journals, are being increasingly used -- often in controversial ways -- as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the 449,935 papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, outperforms the others in identifying the Milestone Letters short after they are published. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics generally identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks.
Article
Full-text available
Economic complexity reflects the amount of knowledge that is embedded in the productive structure of an economy. By combining tools from network science and econometrics, a robust and stable relationship between a country’s productive structure and its economic growth has been established. Here we report that not only goods but also services are important for predicting the rate at which countries will grow. By adopting a terminology which classifies manufactured goods and delivered services as products, we investigate the influence of services on the country’s productive structure. In particular, we provide evidence that complexity indices for services are in general higher than those for goods, which is reflected in a general tendency to rank countries with developed service sector higher than countries with economy centred on manufacturing of goods. By focusing on country dynamics based on experimental data, we investigate the impact of services on the economic complexity of countries measured in the product space (consisting of both goods and services). Importantly, we show that diversification of service exports and its sophistication can provide an additional route for economic growth in both developing and developed countries.
Article
Full-text available
We introduce an intuitive model that describes both the emergence of community structure and the evolution of the internal structure of communities in growing social networks. The model comprises two complementary mechanisms: One mechanism accounts for the evolution of the internal link structure of a single community, and the second mechanism coordinates the growth of multiple overlapping communities. The first mechanism is based on the assumption that each node establishes links with its neighbors and introduces new nodes to the community at different rates. We demonstrate that this simple mechanism gives rise to an effective maximal degree within communities. This observation is related to the anthropological theory known as Dunbar's number, i.e., the empirical observation of a maximal number of ties which an average individual can sustain within its social groups. The second mechanism is based on a recently proposed generalization of preferential attachment to community structure, appropriately called structural preferential attachment (SPA). The combination of these two mechanisms into a single model (SPA+) allows us to reproduce a number of the global statistics of real networks: The distribution of community sizes, of node memberships, and of degrees. The SPA+ model also predicts (a) three qualitative regimes for the degree distribution within overlapping communities and (b) strong correlations between the number of communities to which a node belongs and its number of connections within each community. We present empirical evidence that support our findings in real complex networks.
Article
Full-text available
Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.
Article
Full-text available
Numerical analysis of data from international trade and ecological networks has shown that the non-linear fitness-complexity metric is the best candidate to rank nodes by importance in bipartite networks that exhibit a nested structure. Despite its relevance for real networks, the mathematical properties of the metric and its variants remain largely unexplored. Here, we perform an analytic and numeric study of the fitness-complexity metric and a new variant, called minimal extremal metric. We rigorously derive exact expressions for node scores for perfectly nested networks and show that these expressions explain the non-trivial convergence properties of the metrics. A comparison between the fitness-complexity metric and the minimal extremal metric on real data reveals that the latter can produce improved rankings if the input data are reliable.
Article
Full-text available
All ecosystems are subjected to chronic disturbances, such as harvest, pollution, and climate change. The capacity to forecast how species respond to such press perturbations is limited by our imprecise knowledge of pairwise species interaction strengths and the many direct and indirect pathways along which perturbations can propagate between species. Network complexity (size and connectance) has thereby been seen to limit the predictability of ecological systems. Here we demonstrate a counteracting mechanism in which the influence of indirect effects declines with increasing network complexity when species interactions are governed by universal allometric constraints. With these constraints, network size and connectance interact to produce a skewed distribution of interaction strengths whose skew becomes more pronounced with increasing complexity. Together, the increased prevalence of weak interactions and the increased relative strength and rarity of strong interactions in complex networks limit disturbance propagation and preserve the qualitative predictability of net effects even when pairwise interaction strengths exhibit substantial variation or uncertainty.
Chapter
Advanced statistical modeling and knowledge representation techniques for a newly emerging area of machine learning and probabilistic reasoning; includes introductory material, tutorials for different proposed approaches, and applications. Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems. Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases and programming languages to represent structure. In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data. The early chapters provide tutorials for material used in later chapters, offering introductions to representation, inference and learning in graphical models, and logic. The book then describes object-oriented approaches, including probabilistic relational models, relational Markov networks, and probabilistic entity-relationship models as well as logic-based formalisms including Bayesian logic programs, Markov logic, and stochastic logic programs. Later chapters discuss such topics as probabilistic models with unknown objects, relational dependency networks, reinforcement learning in relational domains, and information extraction. By presenting a variety of approaches, the book highlights commonalities and clarifies important differences among proposed approaches and, along the way, identifies important representational and algorithmic issues. Numerous applications are provided throughout.
Chapter
Papers from the 2006 flagship meeting on neural computation, with contributions from physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation and machine learning. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists—interested in theoretical and applied aspects of modeling, simulating, and building neural-like or intelligent systems. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2006 meeting, held in Vancouver. Bradford Books imprint
Article
Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day. This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment. Using LIWC text analysis software, we conducted a content-analysis of over 100,000 messages containing a reference to either a political party or a politician. Our results show that Twitter is indeed used extensively for political deliberation. We find that the mere number of messages mentioning a party reflects the election result. Moreover, joint mentions of two parties are in line with real world political ties and coalitions. An analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape. We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research.
Article
A main characteristic of social media is that its diverse content, copiously generated by both standard outlets and general users, constantly competes for the scarce attention of large audiences. Out of this flood of information some topics manage to get enough attention to become the most popular ones and thus to be prominently displayed as trends. Equally important, some of these trends persist long enough so as to shape part of the social agenda. How this happens is the focus of this paper. By introducing a stochastic dynamical model that takes into account the user’s repeated involvement with given topics, we can predict the distribution of trend durations as well as the thresholds in popularity that lead to their emergence within social media. Detailed measurements of datasets from Twitter confirm the validity of the model and its predictions.
Article
p>Technological innovation seems to be dominated by chance. But a new mathematical analysis suggests we might be able to anticipate when seemingly useless technologies become keystones of more complex environments.</p
Article
Teams dominate the production of high-impact science and technology. Analyzing teamwork from more than 50 million papers, patents, and software products, 1954-2014, we demonstrate across this period that larger teams developed recent, popular ideas, while small teams disrupted the system by drawing on older and less prevalent ideas. Attention to work from large teams came immediately, while advances by small teams succeeded further into the future. Differences between small and large teams magnify with impact - small teams have become known for disruptive work and large teams for developing work. Differences in topic and re- search design account for part of the relationship between team size and disruption, but most of the effect occurs within people, controlling for detailed subject and article type. Our findings suggest the importance of supporting both small and large teams for the sustainable vitality of science and technology.
Article
In recent years, tagging system has become a building block o summarize the content of items for further functions like retrieval or personalized recommendation in various web applications. One nontrivial requirement is to precisely deliver a list of suitable items when users interact with the systems via inputing a specific tag (i.e. a query term). Different from traditional recommender systems, we need deal with a collaborative retrieval (CR) problem, where both characteristics of retrieval and recommendation should be considered to model a ternary relationship involved with query× user× item. Recently, several works are proposed to study CR task from users’ perspective. However, they miss a significant challenge raising from the sparse content of items. In this work, we argue that items will suffer from the sparsity problem more severely than users, since items are usually observed with fewer features to support a feature-based or content-based algorithm. To tackle this problem, we aim to sufficiently explore the sophisticated relationship of each query× user× item triple from items’ perspective. By integrating item-based collaborative information for this joint task, we present an alternative factorized model that could better evaluate the ranks of those items with sparse information for the given query-user pair. In addition, we suggest to employ a recently proposed bayesian personalized ranking (BPR) algorithm to optimize latent collaborative retrieval problem from pairwise learning perspective. The experimental results on two real-world datasets, (i.e. Last.fm, Yelp), verified the efficiency and effectiveness of our proposed approach at top-k ranking metric.
Article
Collective phenomena emerge from the interaction of natural or artificial units with a complex organization. The interplay between structural patterns and dynamics might induce functional clusters that, in general, are different from topological ones. In biological systems, like the human brain, the overall functionality is often favored by the interplay between connectivity and synchronization dynamics, with functional clusters that do not coincide with anatomical modules in most cases. In social, sociotechnical, and engineering systems, the quest for consensus favors the emergence of clusters. Despite the unquestionable evidence for mesoscale organization of many complex systems and the heterogeneity of their interconnectivity, a way to predict and identify the emergence of functional modules in collective phenomena continues to elude us. Here, we propose an approach based on random walk dynamics to define the diffusion distance between any pair of units in a networked system. Such a metric allows us to exploit the underlying diffusion geometry to provide a unifying framework for the intimate relationship between metastable synchronization, consensus, and random search dynamics in complex networks, pinpointing the functional mesoscale organization of synthetic and biological systems.
Article
Financial time series are notoriously difficult to analyze and predict, given their non-stationary, highly oscillatory nature. In this study, we evaluate the effectiveness of the Ensemble Empirical Mode Decomposition (EEMD), the ensemble version of Empirical Mode Decomposition (EMD), at generating a representation for market indexes that improves trend prediction. Our results suggest that the promising results reported using EEMD on financial time series were obtained by inadvertently adding look-ahead bias to the testing protocol via pre-processing the entire series with EMD, which affects predictive results. In contrast to conclusions found in the literature, our results indicate that the application of EMD and EEMD with the objective of generating a better representation for financial time series is not sufficient to improve the accuracy or cumulative return obtained by the models used in this study.
Article
To understand quantitatively how scientists choose and shift their research focus over time is of high importance, because it affects the ways in which scientists are trained, science is funded, knowledge is organized and discovered, and excellence is recognized and rewarded1, 2, 3, 4, 5, 6, 7, 8, 9. Despite extensive investigation into various factors that influence a scientist’s choice of research topics8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, quantitative assessments of mechanisms that give rise to macroscopic patterns characterizing research-interest evolution of individual scientists remain limited. Here we perform a large-scale analysis of publication records, and we show that changes in research interests follow a reproducible pattern characterized by an exponential distribution. We identify three fundamental features responsible for the observed exponential distribution, which arise from a subtle interplay between exploitation and exploration in research-interest evolution5,22. We developed a random-walk-based model, allowing us to accurately reproduce the empirical observations. This work uncovers and quantitatively analyses macroscopic patterns that govern changes in research interests, thereby showing that there is a high degree of regularity underlying scientific research and individual careers.
Article
Collaboration is one of the key features in scientific research. With more and more knowledge accumulated in each discipline, individual researcher can only be an expert in some specific areas. As such, there are more and more multi-author papers nowadays. Many works in scientometrics have been devoted to analyze the structure of collaboration networks. However, how the collaboration impacts an author’s future career is much less studied in the literature. In this paper, we provide empirical evidence with American Physical Society data showing that collaboration with outstanding scientists (measured by their total citation) will significantly improve young researchers’ career. Interestingly, this effect is strongly nonlinear and subject to a power function with an exponent<1. Our work is also meaningful from practical point of view as it could be applied to identifying the potential young researchers.
Article
The desire to predict discoveries—to have some idea, in advance, of what will be discovered, by whom, when, and where—pervades nearly all aspects of modern science, from individual scientists to publishers, from funding agencies to hiring committees. In this Essay, we survey the emerging and interdisciplinary field of the “science of science” and what it teaches us about the predictability of scientific discovery. We then discuss future opportunities for improving predictions derived from the science of science and its potential impact, positive and negative, on the scientific community.
Article
We have tried to predict the future since ancient times when shamans looked for patterns in smoking entrails. As this special section explores, prediction is now a developing science. Essays probe such questions as how to allocate limited resources, whether a country will descend into conflict, and
Article
Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to predictive accuracy in complex social systems must be better characterized, thereby setting expectations for what can be predicted or explained. Third, predictive accuracy and interpretability must be recognized as complements, not substitutes, when evaluating explanations. Resolving these three issues will lead to better, more replicable, and more useful social science.
Article
The collective behaviors of community members for dynamic social networks are significant for understanding evolution features of communities. In this Letter, we empirically investigate the evolution properties of the new community members for dynamic networks. Firstly, we separate data sets into different slices, and analyze the statistical properties of new members as well as communities they joined in for these data sets. Then we introduce a parameter φ to describe community evolution between different slices and investigate the dynamic community properties of the new community members. The empirical analyses for the Facebook, APS, Enron and Wiki data sets indicate that both the number of new members and joint communities increase, the ratio declines rapidly and then becomes stable over time, and most of the new members will join in the small size communities that is . Furthermore, the proportion of new members in existed communities decreases firstly and then becomes stable and relatively small for these data sets. Our work may be helpful for deeply understanding the evolution properties of community members for social networks.
Article
A model is proposed for the evolution of network topology in social networks with overlapping community structure. Starting from an initial community structure that is defined in terms of group affiliations, the model postulates that the subsequent growth and loss of connections is similar to the Hebbian learning and unlearning in the brain and is governed by two dominant factors: the strength and frequency of interaction between the members, and the degree of overlap between different communities. The temporal evolution from an initial community structure to the current network topology can be described based on these two parameters. It is possible to quantify the growth occurred so far and predict the final stationary state to which the network is likely to evolve. Applications in epidemiology or the spread of email virus in a computer network as well as finding specific target nodes to control it are envisaged. While facing the challenge of collecting and analyzing large-scale time-resolved data on social groups and communities one faces the most basic questions: how do communities evolve in time? This work aims to address this issue by developing a mathematical model for the evolution of community networks and studying it through computer simulation.
Article
Quality of information is crucial for decision-makers to judge the battlefield situations and design the best operation plans, however, real intelligence data are often incomplete and noisy, where missing links prediction methods and spurious links identification algorithms can be applied, if modeling the complex military organization as the complex network where nodes represent functional units and edges denote communication links. Traditional link prediction methods usually work well on homogeneous networks, but few for the heterogeneous ones. And the military network is a typical heterogeneous network, where there are different types of nodes and edges. In this paper, we proposed a combined link prediction index considering both the nodes’ types effects and nodes’ structural similarities, and demonstrated that it is remarkably superior to all the 25 existing similarity-based methods both in predicting missing links and identifying spurious links in a real military network data; we also investigated the algorithms’ robustness under noisy environment, and found the mistaken information is more misleading than incomplete information in military areas, which is different from that in recommendation systems, and our method maintained the best performance under the condition of small noise. Since the real military network intelligence must be carefully checked at first due to its significance, and link prediction methods are just adopted to purify the network with the left latent noise, the method proposed here is applicable in real situations. In the end, as the FINC-E model, here used to describe the complex military organizations, is also suitable to many other social organizations, such as criminal networks, business organizations, etc., thus our method has its prospects in these areas for many tasks, like detecting the underground relationships between terrorists, predicting the potential business markets for decision-makers, and so on.
Article
Detecting the evolution properties of online user preference diversity is of significance for deeply understanding online collective behaviors. In this paper, we empirically explore the evolution patterns of online user rating preference, where the preference diversity is measured by the variation coefficient of the user rating sequence. The statistical results for four real systems show that, for movies and reviews, the user rating preference would become diverse and then get centralized finally. By introducing the empirical variation coefficient, we present a model, which could regenerate the evolution properties of two online systems regarding to the stable variation coefficients. In addition, we investigate the evolution of the correlation between the user ratings and the object qualities, and find that the correlation would keep increasing as the user degree increases. This work could be helpful for understanding the anchoring bias and memory effects of the online user collective behaviors.
Article
Scientific impact—that is the Q Are there quantifiable patterns behind a successful scientific career? Sinatra et al. analyzed the publications of 2887 physicists, as well as data on scientists publishing in a variety of fields. When productivity (which is usually greatest early in the scientist's professional life) is accounted for, the paper with the greatest impact occurs randomly in a scientist's career. However, the process of generating a high-impact paper is not an entirely random one. The authors developed a quantitative model of impact, based on an element of randomness, productivity, and a factor Q that is particular to each scientist and remains constant during the scientist's career. Science , this issue p. 596
Article
In the paper we investigate experimentally the feasibility of Rough Sets in building profitable trend prediction models for financial time series. In order to improve the decision process for long time series, a novel time-weighted rule voting method, which accounts for information aging, is proposed. The experiments have been performed using market data of multiple stock market indices. The classification efficiency and financial performance of the proposed Rough Sets models was verified and compared with that of Support Vector Machines models and reference financial indices. The results showed that the Rough Sets approach with time weighted rule voting outperforms the classical Rough Sets and Support Vector Machines decision systems and is profitable as compared to the buy and hold strategy. In addition, with the use of Variable Precision Rough Sets, the effectiveness of generated trading signals was further improved.
Article
Stock trend prediction is regarded as one of the most challenging tasks of financial time series prediction. Conventional statistical modeling techniques are not adequate for stock trend forecasting because of the non-stationarity and non-linearity of the stock market. With this regard, many machine learning approaches are used to improve the prediction results. These approaches mainly focus on two aspects: regression problem of the stock price and prediction problem of the turning points of stock price. In this paper, we concentrate on the evaluation of the current trend of stock price and the prediction of the change orientation of the stock price in future. Then, a new approach named status box method is proposed. Different from the prediction issue of the turning points, the status box method packages some stock points into three categories of boxes which indicate different stock status. And then, some machine learning techniques are used to classify these boxes so as to measure whether the states of each box coincides with the stock price trend and forecast the stock price trend based on the states of the box. These results would support us to make buying or selling strategies. Comparing with the turning points prediction that only considered the features of one day, each status box contains a certain amount of points which represent the stock price trend in a certain period of time. So, the status box reflects more information of stock market. To solve the classification problem of the status box, a special features construction approach is presented. Moreover, a new ensemble method integrated with the AdaBoost algorithm, probabilistic support vector machine (PSVM), and genetic algorithm (GA) is constructed to perform the status boxes classification. To verify the applicability and superiority of the proposed methods, 20 shares chosen from Shenzhen Stock Exchange (SZSE) and 16 shares from National Association of Securities Dealers Automated Quotations (NASDAQ) are applied to perform stock trend prediction. The results show that the status box method not only have the better classification accuracy but also effectively solve the unbalance problem of the stock turning points classification. In addition, the new ensemble classifier achieves preferable profitability in simulation of stock investment and remarkably improves the classification performance compared with the approach that only uses the PSVM or back-propagation artificial neural network (BPN).
Article
Significance We study the dynamic network of real world person-to-person interactions between approximately 1,000 individuals with 5-min resolution across several months. There is currently no coherent theoretical framework for summarizing the tens of thousands of interactions per day in this complex network, but here we show that at the right temporal resolution, social groups can be identified directly. We outline and validate a framework that enables us to study the statistical properties of individual social events as well as series of meetings across weeks and months. Representing the dynamic network as sequences of such meetings reduces the complexity of the system dramatically. We illustrate the usefulness of the framework by investigating the predictability of human social activity.
Article
As much effort has been made to accelerate the publication of research results, nowadays the number of papers per scientist is much larger than before. In this context, how to identify the representative work for individual researcher is an important yet uneasy problem. Addressing it will help policy makers better evaluate the achievement and potential of researchers. So far, the representative work of a researcher is usually selected as his/her most highly cited paper or the paper published in top journals. Here, we consider the representative work of a scientist as an important paper in his/her area of expertise. Accordingly, we propose a self-avoiding preferential diffusion process to generate personalized ranking of papers for each scientist and identify their representative works. The citation data from American Physical Society (APS) are used to validate our method. We find that the self-avoiding preferential diffusion method can rank the Nobel prize winning paper in each Nobel laureate's personal ranking list higher than the citation count and PageRank methods, indicating the effectiveness of our method. Moreover, the robustness analysis shows that our method can highly rank the representative papers of scientists even if partial citation data are available or spurious behaviors exist. The method is finally applied to revealing the research patterns (i.e. consistency-oriented or diversity-oriented) of different scientists, institutes and countries.
Article
Real networks exhibit heterogeneous nature with nodes playing far different roles in structure and function. To identify vital nodes is thus very significant, allowing us to control the outbreak of epidemics, to conduct advertisements for e-commercial products, to predict popular scientific publications, and so on. The vital nodes identification attracts increasing attentions from both computer science and physical societies, with algorithms ranging from simply counting the immediate neighbors to complicated machine learning and message passing approaches. In this review, we clarify the concepts and metrics, classify the problems and methods, as well as review the important progresses and describe the state of the art. Furthermore, we provide extensive empirical analyses to compare well-known methods on disparate real networks, and highlight the future directions. In despite of the emphasis on physics-rooted approaches, the unification of the language and comparison with cross-domain methods would trigger interdisciplinary solutions in the near future.
Article
The problem of reconstructing nonlinear and complex dynamical systems from measured data or time series is central to many scientific disciplines including physical, biological, computer, and social sciences, as well as engineering and economics. The classic approach to phase-space reconstruction through the methodology of delay-coordinate embedding has been practiced for more than three decades, but the paradigm is effective mostly for low-dimensional dynamical systems. Often, the methodology yields only a topological correspondence of the original system. There are situations in various fields of science and engineering where the systems of interest are complex and high dimensional with many interacting components. A complex system typically exhibits a rich variety of collective dynamics, and it is of great interest to be able to detect, classify, understand, predict, and control the dynamics using data that are becoming increasingly accessible due to the advances of modern information technology. To accomplish these tasks, especially prediction and control, an accurate reconstruction of the original system is required.