Fig 10 - available from: BMC Bioinformatics
This content is subject to copyright. Terms and conditions apply.
UML diagram of the Integrated Dataset (ID). The attributes in red are the identifiers of the nodes of a given type (i.e., the primary key in a relational database), while attributes in green refer to the identifier of nodes of other types (i.e., foreign keys in a relational database)

UML diagram of the Integrated Dataset (ID). The attributes in red are the identifiers of the nodes of a given type (i.e., the primary key in a relational database), while attributes in green refer to the identifier of nodes of other types (i.e., foreign keys in a relational database)

Source publication
Article
Full-text available
Background. The study of functional associations between ncRNAs and human diseases is a pivotal task of modern research to develop new and more effective therapeutic approaches. Nevertheless, it is not a trivial task since it involves entities of different types, such as microRNAs, lncRNAs or target genes whose expression also depends on endogenous...

Similar publications

Article
Full-text available
Interactions between genetic factors and environmental factors (EFs) play an important role in many diseases. Many diseases result from the interaction between genetics and EFs. The long non-coding RNA (lncRNA) is an important non-coding RNA that regulates life processes. The ability to predict the associations between lncRNAs and EFs is of importa...

Citations

... However, elucidating the role of miRNAs in specific diseases remains a significant challenge, worsened by the resource-intensive nature of experimental investigations. In this context, many computational methods have been proposed [1,3,7], providing cost-effective approaches to formulate biological hypotheses that can be subsequently validated through in-vitro experiments. ...
... As a result, constructing association networks from heterogeneous biological data has emerged as a primary goal in overcoming this obstacle [3]. LP-HCLUS [1] is an example of an approach that aims to reach this goal by solving a link prediction task on heterogeneous graphs to unveil previously unknown RNA-disease associations. It exploits validated relationships to predict novel associations between non-coding RNAs and diseases, demonstrating its potential in elucidating the functional role of miRNAs in disease onset or progression. ...
... ncRNAs represent approximately 60% of the transcriptional production of the human genome [15,16], and there is a close association between many diseases and ncRNA mutations or abnormal expression [17,18]. ncRNAs can be divided into two categories based on their length: small ncRNAs and lncRNAs [19][20][21]. ...
... There are increasing studies that have identified lncRNAs as a new class of regulatory molecules with the functions of scaffold, signal, and guide, and they are also involved in transcriptional interference [40][41][42]. man genome [15,16], and there is a close association between many diseases and ncRNA mutations or abnormal expression [17,18]. ncRNAs can be divided into two categorie based on their length: small ncRNAs and lncRNAs [19][20][21]. ...
Article
Full-text available
Non-coding RNAs (ncRNAs) are transcribed from the genome and do not encode proteins. In recent years, ncRNAs have attracted increasing attention as critical participants in gene regulation and disease pathogenesis. Different categories of ncRNAs, which mainly include microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), are involved in the progression of pregnancy, while abnormal expression of placental ncRNAs impacts the onset and development of adverse pregnancy outcomes (APOs). Therefore, we reviewed the current status of research on placental ncRNAs and APOs to further understand the regulatory mechanisms of placental ncRNAs, which provides a new perspective for treating and preventing related diseases.
... Some studies have shown that pregnant women suffer worse than non-pregnant women (Fan et al. 2020). For predicting possibly unknown ncRNAdisease relationships used multi-type hierarchical clustering (Barracchia et al. 2020). Some papers aim to find possible treatments or drug/gene-disease association clustering ( Loucera et al. 2020). ...
Article
PurposeBased on medical reports, it is hard to find levels of different hospitalized symptomatic COVID-19 patients according to their features in a short time. Besides, there are common and special features for COVID-19 patients at different levels based on physicians’ knowledge that make diagnosis difficult. For this purpose, a hierarchical model is proposed in this paper based on experts’ knowledge, fuzzy C-mean (FCM) clustering, and adaptive neuro-fuzzy inference system (ANFIS) classifier.Methods Experts considered a special set of features for different groups of COVID-19 patients to find their treatment plans. Accordingly, the structure of the proposed hierarchical model is designed based on experts’ knowledge. In the proposed model, we applied clustering methods to patients’ data to determine some clusters. Then, we learn classifiers for each cluster in a hierarchical model. Regarding different common and special features of patients, FCM is considered for the clustering method. Besides, ANFIS had better performances than other classification methods. Therefore, FCM and ANFIS were considered to design the proposed hierarchical model. FCM finds the membership degree of each patient’s data based on common and special features of different clusters to reinforce the ANFIS classifier. Next, ANFIS identifies the need of hospitalized symptomatic COVID-19 patients to ICU and to find whether or not they are in the end-stage (mortality target class). Two real datasets about COVID-19 patients are analyzed in this paper using the proposed model. One of these datasets had only clinical features and another dataset had both clinical and image features. Therefore, some appropriate features are extracted using some image processing and deep learning methods.ResultsAccording to the results and statistical test, the proposed model has the best performance among other utilized classifiers. Its accuracies based on clinical features of the first and second datasets are 92% and 90% to find the ICU target class. Extracted features of image data increase the accuracy by 94%.Conclusion The accuracy of this model is even better for detecting the mortality target class among different classifiers in this paper and the literature review. Besides, this model is compatible with utilized datasets about COVID-19 patients based on clinical data and both clinical and image data, as well.Highlights• A new hierarchical model is proposed using ANFIS classifiers and FCM clustering method in this paper. Its structure is designed based on experts’ knowledge and real medical process. FCM reinforces the ANFIS classification learning phase based on the features of COVID-19 patients.• Two real datasets about COVID-19 patients are studied in this paper. One of these datasets has both clinical and image data. Therefore, appropriate features are extracted based on its image data and considered with available meaningful clinical data. Different levels of hospitalized symptomatic COVID-19 patients are considered in this paper including the need of patients to ICU and whether or not they are in end-stage.• Well-known classification methods including case-based reasoning (CBR), decision tree, convolutional neural networks (CNN), K-nearest neighbors (KNN), learning vector quantization (LVQ), multi-layer perceptron (MLP), Naive Bayes (NB), radial basis function network (RBF), support vector machine (SVM), recurrent neural networks (RNN), fuzzy type-I inference system (FIS), and adaptive neuro-fuzzy inference system (ANFIS) are designed for these datasets and their results are analyzed for different random groups of the train and test data;• According to unbalanced utilized datasets, different performances of classifiers including accuracy, sensitivity, specificity, precision, F-score, and G-mean are compared to find the best classifier. ANFIS classifiers have the best results for both datasets.• To reduce the computational time, the effects of the Principal Component Analysis (PCA) feature reduction method are studied on the performances of the proposed model and classifiers. According to the results and statistical test, the proposed hierarchical model has the best performances among other utilized classifiers.Graphical Abstract
... Overlapping clustering approaches have also been applied to graph clustering. LP-HCLUS is one such method, generating hierarchical clusters with potential for overlap, allowing diseases and ncRNA in a heterogenous graph to be involved in multiple interaction subnetworks, better reflecting their true function (Barracchia et al., 2020). ...
Article
Full-text available
Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death in the United States. COPD represents one of many areas of research where identifying complex pathways and networks of interacting biomarkers is an important avenue toward studying disease progression and potentially discovering cures. Recently, sparse multiple canonical correlation network analysis (SmCCNet) was developed to identify complex relationships between omics associated with a disease phenotype, such as lung function. SmCCNet uses two sets of omics datasets and an associated output phenotypes to generate a multi-omics graph, which can then be used to explore relationships between omics in the context of a disease. Detecting significant subgraphs within this multi-omics network, i.e., subgraphs which exhibit high correlation to a disease phenotype and high inter-connectivity, can help clinicians identify complex biological relationships involved in disease progression. The current approach to identifying significant subgraphs relies on hierarchical clustering, which can be used to inform clinicians about important pathways involved in the disease or phenotype of interest. The reliance on a hierarchical clustering approach can hinder subgraph quality by biasing toward finding more compact subgraphs and removing larger significant subgraphs. This study aims to introduce new significant subgraph detection techniques. In particular, we introduce two subgraph detection methods, dubbed Correlated PageRank and Correlated Louvain, by extending the Personalized PageRank Clustering and Louvain algorithms, as well as a hybrid approach combining the two proposed methods, and compare them to the hierarchical method currently in use. The proposed methods show significant improvement in the quality of the subgraphs produced when compared to the current state of the art.
... At the same time, there are also related study based on heterogeneous clustering methods to predict the unknown relationships between lncRNAs and diseases based on the relationship network constructed by diseases, lncRNAs, microRNAs, and genes (Barracchia et al., 2018). LP-HCLUS uses multi-type hierarchical clustering methods to predict potentially lncRNA-disease relationships (Barracchia et al., 2020). However, all these methods only discriminate disease-associated lncRNAs without relating the lncRNAs with specific cancer types. ...
Article
Full-text available
Motivation: Long non-coding RNAs (lncRNAs) play important roles in cancer development. Prediction of lncRNA–cancer association is necessary for efficiently discovering biomarkers and designing treatment for cancers. Currently, several methods have been developed to predict lncRNA–cancer associations. However, most of them do not consider the relationships between lncRNA with other molecules and with cancer prognosis, which has limited the accuracy of the prediction. Method: Here, we constructed relationship matrices between 1,679 lncRNAs, 2,759 miRNAs, and 16,410 genes and cancer prognosis on three types of cancers (breast, lung, and colorectal cancers) to predict lncRNA–cancer associations. The matrices were iteratively reconstructed by matrix factorization to optimize low-rank size. This method is called detecting lncRNA cancer association (DRACA). Results: Application of this method in the prediction of lncRNAs–breast cancer, lncRNA–lung cancer, and lncRNA–colorectal cancer associations achieved an area under curve (AUC) of 0.810, 0.796, and 0.795, respectively, by 10-fold cross-validations. The performances of DRACA in predicting associations between lncRNAs with three kinds of cancers were at least 6.6, 7.2, and 6.9% better than other methods, respectively. To our knowledge, this is the first method employing cancer prognosis in the prediction of lncRNA–cancer associations. When removing the relationships between cancer prognosis and genes, the AUCs were decreased 7.2, 0.6, and 5% for breast, lung, and colorectal cancers, respectively. Moreover, the predicted lncRNAs were found with greater numbers of somatic mutations than the lncRNAs not predicted as cancer-associated for three types of cancers. DRACA predicted many novel lncRNAs, whose expressions were found to be related to survival rates of patients. The method is available at https://github.com/Yanh35/DRACA.
... Recently, AI techniques are developed and applied to the medical and biological domain. Multitask learning is proposed for the prediction of cancer drug sensitivity [27], a machine learning method INLOCANDA, which is able to learn an inductive predictive model for gene network reconstruction, is developed [28], the system LP-HCLUS, which exploits a multitype hierarchical clustering method to predict possibly unknown ncRNA-disease relationships, is proposed [29], a generative adversarial network deep learning model is proposed for disease gene prediction [30], Parkinson's disease identification system is developed using support vector machine and random forest [31], and a novel deep neural network model RDense is constructed for protein-RNA binding prediction [32]. These AI approaches enhance the decisionmaking abilities of doctors by successfully completing machine learning tasks. ...
Article
Chronic obstructive pulmonary disease (COPD) is a global burden, which is estimated to be the third leading cause of death worldwide by 2030. The economic burden of COPD grows continuously because it is not a curable disease. These conditions make COPD an important research field of artificial intelligence (AI) techniques in medicine. In this study, an integrated approach of the statistical-based fuzzy cognitive maps (SBFCM) and artificial neural networks (ANN) is proposed for predicting length of hospital stay of patients with COPD, who admitted to the hospital with an acute exacerbation. The SBFCM method is developed to determine the input variables of the ANN model. The SBFCM conducts statistical analysis to prepare preliminary information for the experts and then collects expert opinions accordingly, to define a conceptual map of the system. The integration of SBFCM and ANN methods provides both statistical data and expert opinion in the prediction model. In the numerical application, the proposed approach outperformed the conventional approach and other machine learning algorithms with 79.95% accuracy, revealing the power of expert opinion involvement in medical decisions. A medical decision support framework is constructed for better prediction of length of hospital stay and more effective hospital management.Graphical abstract
... These methods exploit predictive models to classify unknown gene regulations. On the other hand, reference [52] performed link prediction on heterogeneous networks with the aim to detect associations between ncRNAs and diseases. ...
Article
Full-text available
Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.
... Several techniques have been applied to interconnected knowledge bases for various problems in biology [1,37]. RL in particular has been used in problems such as predicting gene function [16], gene regulation [9] and QSAR-related problems [31]. ...
Chapter
Full-text available
The key to success in machine learning is the use of effective data representations. The success of deep neural networks (DNNs) is based on their ability to utilize multiple neural network layers, and big data, to learn how to convert simple input representations into richer internal representations that are effective for learning. However, these internal representations are sub-symbolic and difficult to explain. In many scientific problems explainable models are required, and the input data is semantically complex and unsuitable for DNNs. This is true in the fundamental problem of understanding the mechanism of cancer drugs, which requires complex background knowledge about the functions of genes/proteins, their cells, and the molecular structure of the drugs. This background knowledge cannot be compactly expressed propositionally, and requires at least the expressive power of Datalog. Here we demonstrate the use of relational learning to generate new data descriptors in such semantically complex background knowledge. These new descriptors are effective: adding them to standard propositional learning methods significantly improves prediction accuracy. They are also explainable, and add to our understanding of cancer. Our approach can readily be expanded to include other complex forms of background knowledge, and combines the generality of relational learning with the efficiency of standard propositional learning.
... A machine learning algorithm was applied to diagnose and treat Parkinson's disease in a different study [48]. Machine learning algorithms with biological and medical content were used in different studies [49,50]. ...
Article
Parkinson's disease is a neurological disorder that causes partial or complete loss of motor reflexes, speech, thinking, behavior, and other vital functions affecting the nervous system. Parkinson's disease causes impaired speech and motor abilities (writing, balance, etc.) in about 90% of patients and is often seen in older people. Some signs (deterioration of vocal cords) in medical voice recordings from Parkinson's patients are used to diagnose this disease. The database used in this study contains biomedical speech voice from 31 people of different age and sex related to this disease. The performance comparison of the machine learning algorithms k-Nearest Neighborhood (k-NN), Random Forest, Naive Bayes, Support Vector Machine classifiers was performed with the used database. Moreover, the best classifier was determined for the diagnosis of Parkinson's disease. eleven different training and test data (45X55, 50X50, 55X45, 60X40, 65X35, 70X30, 75X25, 80X20, 85X15, 90X10, 95X5) were processed separately. The data obtained from these training and tests were compared with statistical measurements. The training results of the k-NN classification algorithm were generally 100% successful. The best test result was obtained from Random Forest classifier with 85.81%. All statistical results and measured values are given in detail in the experimental studies section.
... Similarly, various research works are in progress nowadays to develop new technologies for the detection of early-stage cancer [6][7][8][9][10][11][12][13][14], which motivates to design a simple, cost-effective, reliable system model. As a result, some significant research works take place like amino acid-based sensor models are developed for cancer detection [9,10,62,65,66]. ...
Article
An efficient and novel modeling approach is proposed in this paper for identifying proteins or genes involved in melanoma skin cancer. Two types of classifiers are modeled, based on the chemical structure and hydropathy property of amino acids. These classifiers are further implemented using NI LabVIEW–based hardware kit to observe the real-time response for proper diagnosis. The phase responses, pole-zero diagrams, and transient responses are examined to screen out the genes related to melanoma from healthy genes. The performance of the proposed classifier is measured using various performance measurement metrics in terms of accuracy, sensitivity, specificity, etc. The classifier is experimented along with a color code scheme on skin genes and illustrates the superiority in comparison with traditional methods by achieving 94% of classification accuracy with 96% of sensitivity.Graphical abstract An equivalent electrical model is developed for designing melanoma classifier. Initially, each amino acid is modeled using the RC passive circuit depending on their physicochemical structure and hydropathy nature, to form a gene structure model. The melanoma-related genes are detected by phase, transient, and color code analysis.