ArticlePDF Available

Abstract and Figures

Background We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. Results We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. Conclusions The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results.
Content may be subject to copyright.
A preview of the PDF is not available
... Any change that alters the connection between interacting complexes greatly affects the properties and functions of cell. Collection of proteins is known as protein complex [1] that collectively performs a specific function and its function is distinguished from other complexes. The functions and biological processes of cells are coordinately performed by protein complexes. ...
... Biological information contained within the amino acid sequence of the protein evaluates the interacting property between two proteins in order to fulfill a particular biological function. Furthermore the physical, chemical and biological properties of complexes make them distinguish from non complexes [1]. Therefore, amino acid physical properties i.e. kidera factors are computed as biological features. ...
... Therefore in this paper we combine SVM and ECOC for identification of proteins belonging to multiple complex classes. The physical, biological or chemical properties of protein clusters make them distinguish from non clusters [1]. Because the sequence specifies the structure thus sequential properties of proteins might be adequate to investigate the interacting property between proteins. ...
Article
Full-text available
Protein Complexes plays important role in key functional processes in cells by forming Protein Protein Interaction (PPI) networks. Conventionally, they were determined through experimental approaches. For the sake of saving time and cost reduction, many computational methods have been proposed. Fewer computational approaches take into account significant biological information contained within protein amino acid sequence and identified dense sub graphs as complexes from PPI network by considering density and degree statistics. Biological information evaluate the common features for performing a particular biological function among two proteins. Moreover, linear, star and hybrid sub graph structures may be found in PPI network so other topological features of graph are also important. In this article, support vector machine (SVM) in combination with Error-correcting output coding (ECOC) algorithm is utilized to construct an automatic detector for mining multiple protein complexes from PPI network, where amino acid physical properties i.e. kidera factors and a variety of topological constrains are employed as feature vectors. The overall success rates of protein complex identification achieved are 88.6% and 76.0% on MIPS benchmark set by considering DIP and Gavin interactions respectively. Support vector machine was an effective and solid approach for complex detection with amino acid’s physical properties and complex topology as dimensional vectors. Error-correcting output coding (ECOC) algorithm is a powerful algorithm for mining multiple protein complexes of small as well as large sizes. The accuracy of complex identification task based on amino acid’s physical and complex topological characteristics are strikingly increase when ECOC is integrated with SVM approach. Moreover, this paper implies that ECOC algorithm may succeed over a wide range of applications in biological data mining.
... It assigned a score to each cluster based on the density and the number of members in a particular cluster (Bader and Hogue 2003). NetworkAnalyzer v2.7, another Cytoscape plug-in, estimated the network topological matrices to simplify the complex biological networks and provide us with intrinsic details regarding each gene (Fiannaca et al. 2013). ...
Article
Full-text available
The pathogenic Enterobacter cloacae subsp. cloacae str. ATCC 13047 has contemporarily emerged as a multi-drug resistant strain. To formulate an effective treatment option, alternative therapeutic methods need to be explored. The present study focused on Gene Interaction Network study of 46 antimicrobial resistance genes to reveal the densely interconnecting and functional hub genes in E. cloacae ATCC 13047. The AMR genes were subjected to clustering, topological and functional enrichment analysis, revealing rpsE (RpsE), acrA (AcrA) and arnT (ArnT) as novel therapeutic drug targets for hindering drug resistance in the pathogenic strain. Network topology further indicated translational protein RpsE to be exploited as a promising drug-target candidate for which the structure was predicted, optimized and validated through molecular dynamics simulations (MDS). Absorption, distribution, metabolism and excretion screening recognized ZINC5441082 (N-Isopentyladenosine) (Lead_1) and ZINC1319816 (cyclopentyl-aminopurinyl-hydroxymethyl-oxolanediol) (Lead_2) as orally bioavailable compounds against RpsE. Molecular docking and MDS confirmed the binding efficacy and protein−ligand complex stability. Furthermore, binding free energy (Gbind) calculations, principal component and free energy landscape analyses affirmed the predicted nucleoside analogues against RpsE protein to be comprehensively examined as a potential treatment strategy against E. cloacae ATCC 13047.
... In further analysis, this study also investigated the effects of gene expressions on immune cell infiltration. Molecular and cellular factors of immune cell infiltration play essential roles in cancer BPs and are particularly useful in predicting OS and guiding treatment for patients with breast cancer [86][87][88][89][90]. We also investigated the interaction of a potential network of genes from the NEK family with miRNA. ...
Article
Full-text available
Breast cancer remains the most common malignant cancer in women, with a staggering incidence of two million cases annually worldwide; therefore, it is crucial to explore novel biomarkers to assess the diagnosis and prognosis of breast cancer patients. NIMA-related kinase (NEK) protein kinase contains 11 family members named NEK1-NEK11, which were discovered from Aspergillus Nidulans; however, the role of NEK family genes for tumor development remains unclear and requires additional study. In the present study, we investigate the prognosis relationships of NEK family genes for breast cancer development, as well as the gene expression signature via the bioinformatics approach. The results of several integrative analyses revealed that most of the NEK family genes are overexpressed in breast cancer. Among these family genes, NEK2/6/8 overexpression had poor prognostic significance in distant metastasis-free survival (DMFS) in breast cancer patients. Meanwhile, NEK2/6 had the highest level of DNA methylation, and the functional enrichment analysis from MetaCore and Gene Set Enrichment Analysis (GSEA) suggested that NEK2 was associated with the cell cycle, G2M checkpoint, DNA repair, E2F, MYC, MTORC1, and interferon-related signaling. Moreover, Tumor Immune Estimation Resource (TIMER) results showed that the transcriptional levels of NEK2 were positively correlated with immune infiltration of B cells and CD4+ T Cell. Collectively, the current study indicated that NEK family genes, especially NEK2 which is involved in immune infiltration, and may serve as prognosis biomarkers for breast cancer progression.
... As a subject of intensive research for years, DSS have already been successfully used in various fields. Today's market offers many examples in common day-to-day usage to improve decision-making in biology (Suphavilai et al. 2018;Fiannaca et al. 2011), economics (Dutta et al. 2011), engineering (Simões-Marques 2016Papathoma-Köhle et al. 2019;Coelho et al. 2021) or e-commerce (Leung et al. 2018;Masaro et al. 2020). Refer to Sutton et al. (2020) and Stark et al. (2019), DSS are also widely used in the medical domain for disease diagnostic (Kumar et al. 2011;Kihlgren et al. 2016) and patient monitoring (Kaczmarek et al. 2011;Billis et al. 2015;Hussain et al. 2015). ...
Article
Full-text available
Looking ahead of 2070, the number of the elderly population will increase rapidly in the European Union and beyond. As society ages, it will be confronted to novel challenges related with other concerns like the concept aging in place that the majority of the elderly prefer. Concerning that, the living space must be adapted to the requirements of people with a disability, to support their relatives or friends that will become more and more important in future due to a lack of professional’s and both overstressed and expensive hospitals or nursing homes. Compounding this, those living space requirements are highly individual, depending on the disease. Our study focuses on a medical white box decision support system providing advice even for unknowledgeable users by evaluating the suitability of an elderly’s living environment in terms of their individual disease. In this paper, we propose tackling this issue with a decision support system linked to Building Information Modeling (BIM) and based on Artificial Intelligence using semantic technologies. The proposed approach's contribution is a reliable process that uses up-to-date 3D point cloud data of the person’s living environment and predicts suitable, non-suitable and adaptable zones therein according to different pathologies using formalised knowledge. We are able to provide deep expert knowledge linked from different domains inside a knowledge base and thus produce an outcome through BIM, which is understandable and helpful for two types of users, ordinary people concerned by the matter and building experts. We illustrate our methodology by a proof of concept concerning a wheel-chaired person.
... It can be used for calculating the network diameter, network heterogeneity, network density, average number of neighbours, average shortest path length, betweenness and closeness centrality, degree of distribution and clustering coefficient. This tool is useful for analysing directed as well as undirected networks, even to build a union or intersection between two networks [25]. ...
Article
Salmonella enterica subsp. enterica serovar Typhi, a human enteric pathogen causing typhoid fever, developed resistance to multiple antibiotics over the years. The current study was dedicated to understand the multi-drug resistance (MDR) mechanism of S. enterica serovar Typhi CT18 and to identify potential drug targets that could be exploited for new drug discovery. We have employed gene interaction network analysis for 44 genes which had 275 interactions. Clustering analysis resulted in three highly interconnecting clusters (C1-C3). Functional enrichment analysis revealed the presence of drug target alteration and three different multi-drug efflux pumps in the bacteria that were associated with antibiotic resistance. We found seven genes (arnA,B,C,D,E,F,T) conferring resistance to Cationic Anti-Microbial Polypeptide (CAMP) molecules by membrane Lipopolysaccharide (LPS) modification, while macB was observed to be an essential controlling hub of the network and played a crucial role in MacAB-TolC efflux pump. Further, we identified five genes (mdtH, mdtM, mdtG, emrD and mdfA) which were involved in Major Facilitator Superfamily (MFS) efflux system and acrAB contributed towards AcrAB-TolC efflux pump. All three efflux pumps were seen to be highly dependent on tolC gene. The five genes, namely tolC, macB, acrA, acrB and mdfA which were involved in multiple resistance pathways, can act as potential drug targets for successful treatment strategies. Therefore, this study has provided profound insights into the MDR mechanism in S. Typhi CT18. Our results will be useful for experimental biologists to explore new leads for S. enterica.
... A protein complex is a group of proteins that interact with one another for specific biological activities (Fiannaca et al., 2013). Identification of protein complexes is important for predicting protein functions (Schwikowski et al., 2000;King et al., 2004;Winterhalter et al., 2014), disease genes (Lage et al., 2007;Yang et al., 2011), phenotypic effects of genetic mutations (Fraser and Plotkin, 2007), and drug-disease associations (Yu et al., 2015). ...
Preprint
Full-text available
Background Discovering functional modules in protein-protein interaction networks through optimization remains a longstanding challenge in Biology. Traditional algorithms simply consider strong protein complexes found in the original network by optimizing some metric, which may cause obstacles for discovering weak and hidden complexes that are overshadowed by strong complexes. Additionally, protein complexes have not only different densities but also various ranges of scales, making them extremely difficult to be detected. We address these issues and propose a hierarchical hidden community detection approach to predict protein complexes of various strengths and scales accurately. ResultsWe propose a meta-method called HirHide (Hierarchical Hidden Community Detection). It is the first combination of hierarchical structure with hidden structure, which provides a new perspective for finding protein complexes of various strengths and scales. We compare the performance of several standard community detection methods with their HirHide versions. Experimental results show that the HirHide versions achieve better performance and sometimes even significantly outperform the baselines. Conclusions HirHide can adopt any standard community detection method as the base algorithm and enable it to discover hidden hierarchical communities as well as boosting the detection of strong hierarchical communities. Some biological networks are too complex for standard community detection algorithms to produce a positive performance. Most of the time, a better choice is to choose a corresponding algorithm based on the characteristics of a specific biological network. Under these circumstances, HirHide has clear advantages because of its flexibility. At the same time, according to the natural hierarchy of cells, organelle, intracellular compound etc., hierarchical structure with hidden structure is in line with the characteristics of the data itself, thus helping researchers to study biological interactions more deeply.
... The fuzzy knowledge-based systems are, otherwise, known as fuzzy rule-based system or fuzzy expert system. The fuzzy knowledge-based systems are applied in various areas such as biology [3,4], defence [5,6], finance [7,8] and economics [9,10]. ...
Article
The stock market is hit with a bigger pool of data every day which complicates the process of decision making. Extracting relevant information from the complex stock market data and interpreting trading decisions is an important issue. Applying fuzzy concepts to decision-making problem has increased recently. The objective of this paper is to develop a fuzzy knowledge-based system for making stock trading decisions. A fuzzy clustering method is employed to classify the preprocessed data into three classes representing buy, sell and hold. The data is divided into two parts for training and testing. A weighted fuzzy rule-based system is employed to develop the rule base using the training data. Trading recommendations for the testing period are predicted by new rule generation using the proposed method. The buy, sell and hold signals are recommended for a set of stocks using daily data. Experiments are performed using the shares of New York Stock Exchange (NYSE) according to the decisions suggested by the proposed system. The results show that the proposed system is superior concerning the profit return and cumulative portfolio return than that of the buy and hold strategy.
... A group of proteins that interact with one another for specific biological activities is called a protein complex [1]. Predicting protein complexes is helpful for understanding the principles of cellular tissue [2,3], predicting protein functions [4], identifying disease genes [5] and discovering drug-disease associations [6]. ...
Article
Full-text available
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
Conference Paper
Protein complex (complex for short), is a set of proteins that interact with each other for specific biological activities. The core idea of traditional unsupervised clustering methods is finding dense subgraphs from the protein-protein interaction (PPI) network. In fact, some complexes are not dense in the network. Supervised clustering methods regard known complexes as positive cases and unknown complexes as negative cases, attempting to discover the sparse complexes hidden in the network. Unknown complex subgraphs contain many undetected complexes. Those undetected positive complexes are learned as negative cases, which affects the performance of supervised learning seriously. Therefore, supervised clustering methods are faced with the problem of PU (Positive Unlabeled), which contains only the positive cases. Complex prediction not only needs to consider the establishment of PU learning model, but also involves how to cluster. On top of this, this paper considers 22 attributes of the complex, such as the density of subgraphs, topological coefficients, the weights of edges and so on. We proposed an approach of complex prediction based on PU learning to mine complexes which cannot be found by using traditional approaches. Experiments show that our method has a higher accuracy than the traditional approaches, e.g., CFinder, CMC, MCODE and AP.
Article
Full-text available
Most protein complex detection methods utilize unsupervised techniques to cluster densely connected nodes in a protein-protein interaction (PPI) network, in spite of the fact that many true complexes are not dense subgraphs. Supervised methods have been proposed recently, but they do not answer why a group of proteins are predicted as a complex, and they have not investigated how to detect new complexes of one species by training the model on the PPI data of another species. We propose a novel supervised method to address these issues. The key idea is to discover emerging patterns (EPs), a type of contrast pattern, which can clearly distinguish true complexes from random subgraphs in a PPI network. An integrative score of EPs is defined to measure how likely a subgraph of proteins can form a complex. New complexes thus can grow from our seed proteins by iteratively updating this score. The performance of our method is tested on eight benchmark PPI datasets and compared with seven unsupervised methods, two supervised and one semi-supervised methods under five standards to assess the quality of the predicted complexes. The results show that in most cases our method achieved a better performance, sometimes significantly.
Conference Paper
Full-text available
A novel technique to search for functional modules in a protein-protein interaction network is presented. The network is represented by the adjacency matrix associated with the undirected graph modelling it. The algorithm introduces the concept of quality of a sub-matrix of the adjacency matrix, and applies a greedy search technique for finding local optimal solutions made of dense sub-matrices containing the maximum number of ones. An initial random solution, constituted by a single protein, is evolved to search for a locally optimal solution by adding/removing connected proteins that best contribute to improve the quality function. Experimental evaluations carried out on Saccaromyces Cerevisiae proteins show that the algorithm is able to efficiently isolate groups of biologically meaningful proteins corresponding to the most compact sets of interactions.
Book
For MIS specialists and nonspecialists alike, teacher and consultant Dan Power provides a readable, comprehensive, understandable guide to the concepts and applications of decision support systems. Power defines DSS broadly: interactive computer-based systems and subsystems that help people use computer communications, data, documents, knowledge, and models to solve problems and make decisions. This book covers an expanded framework for categorizing Decision Support Systems (DSS), a general managerial and technical perspective on building DSS, details and examples of the general types of DSS, and tools and issues associated with assessing proposals for DSS projects. A glossary and DSS readiness audit questions give special, ongoing value to all readers. Free eBook at https://scholarworks.uni.edu/facbook/67/
Article
Conference Paper
Ontologies are formal knowledge representation models. Knowledge organization is a fundamental requirement in order to develop Knowledge-Based systems. In this paper we present Data-Problem-Solver (DPS) approach, a new ontological paradigm that allows the knowledge designer to model and represent a Knowledge Base (KB) for expert systems. Our approach clearly distinguishes among the knowledge about a problem to resolve (answering the “what to do” question), the solver method to resolve it (answering the “how to do” question) and the type of input data required (answering the “what I need” question). The main purpose of the proposed paradigm is to facilitate the generalization of the application domain and the modularity and the expandability of the represented knowledge. The proposed DPS ontological approach is applied to the modelling of the knowledge about a bioinformatics application scenario: the protein complex extraction from a protein-protein interaction network.
Article
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.