
Jiangning SongMonash University (Australia) · Department of Biochemistry and Molecular Biology
Jiangning Song
Doctor of Philosophy
Accelerate biomedical knowledge discovery and develop novel diagnostics/treatments in the big data & digital health era
About
348
Publications
84,439
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,454
Citations
Introduction
My research interests are at the frontier and interface of computer science, data science and computational biomedicine. I am highly motivated to design, develop and deploy cutting-edge, data-driven statistical and computational algorithms, models, web servers, tools, and pipelines to address a range of challenging problems in computational biomedicine. I am passionate about solving the statistical and analytical challenges posed by the scale and complexity of biomedical big data in the era of Data Science and Analytics.
Additional affiliations
January 2019 - present
Publications
Publications (348)
Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature repre...
With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures, and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have...
Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these e...
A depleted antimicrobial drug pipeline combined with an increasing prevalence of Gram-negative 'superbugs' has increased interest in nano therapies to treat antibiotic resistance. As cubosomes and polymyxins disrupt the outer membrane of Gram-negative bacteria via different mechanisms, we herein examine the antimicrobial activity of polymyxin-loade...
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no on...
Nitric oxide (NO)-releasing nanoparticles are effective nanomedicines with diverse therapeutic advantages compared with small molecule-based NO donors. Here, we report a new class of furoxan-based NO-releasing nanoparticles using a simple, creative yet facile coassembly approach. This is the first time we demonstrated that the coassembled NO-releas...
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for protein stability (PROST) change (∆∆G) upon single-point missense mutation. PROST extracts multiple descriptors from the most...
Posttranslational modifications (PTMs) have vital roles in a myriad of
biological processes, such as metabolism, DNA damage response, transcriptional
regulation, protein-protein interactions, cell death, immune response, signaling pathways and aging. Identification of PTM sites is a crucial first step for biochemical, pathological and pharmaceutica...
Motivation: Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is funda-mentally important for understanding gene structure, regulation, and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identif...
Motivation: The molecular subtyping of gastric cancer (adenocarcinoma) into four main subtypes based on integrated multiomics profiles, as proposed by The Cancer Genome Atlas (TCGA) initiative, represents an effective strategy for patient stratification. However, this approach requires the use ofmultiple technological platforms, and is quite expens...
The ubiquitin-mediated pathway has been comprehensively explored in the free-living nematode Caenorhabditis elegans, but very little is known about this pathway in parasitic nematodes. Here, we inferred the ubiquitination pathway for an economically significant and pathogenic nematode – Haemonchus contortus – using abundant resources available for...
Motivation: Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design.
Results: Here, we develop a novel bioimage-based computational approach, term...
Gastric cancer is one of the deadliest cancers worldwide. Accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the m...
Among various types of protein post-translational modifications (PTMs), lysine PTMs play an important role in regulating a wide range of functions and biological processes. Due to the generation and accumulation of enormous amount of protein sequence data by ongoing whole-genome sequencing projects, systematic identification of different types of l...
Optimization of fermentation process for the recombinant protein production (RPP) is often resource-intensive. Machine learning (ML) approaches are useful in minimizing the experimentations and find vast applications in RPP. However, these ML-based tools primarily focus on features with respect to amino acid sequences ruling out the influence of fe...
Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming th...
It is a general assumption of molecular biology that the ensemble of expressed molecules, their activities, and interactions determine biological function, cellular states and phenotypes. Stable protein complexes - or macromolecular machines - are in turn the key functional entities mediating and modulating most biological processes. While the iden...
RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotech...
Ke Han Yan Liu Jian Xu- [...]
Dong-Jun Yu
Protein fold recognition is a critical step in protein structure and function prediction, and aims to ascertain the most likely fold type of the query protein. As a typical pattern recognition problem, designing a powerful feature extractor and metric function to extract relevant and representative fold-specific features from protein sequences is t...
Thyroid disease instances have been continuously increasing since the 1990s, and thyroid cancer has become the most rapidly rising disease among all the malignancies in recent years. Most existing studies focused on applying deep convolutional neural networks for detecting thyroid cancer. Despite their satisfactory performance on binary classificat...
Background:
Accurate estimation of historical PM2.5 (particle matter with an aerodynamic diameter of less than 2.5μm) is critical and essential for environmental health risk assessment.
Objectives:
The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-le...
Inhaled polymyxins are increasingly used to treat pulmonary infections caused by multidrug-resistant Gram-negative pathogens. We have previously shown that apoptotic pathways, autophagy and oxidative stress are involved in polymyxin-induced toxicity in human lung epithelial cells. In the present study, we employed human lung epithelial cells A549 t...
Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among...
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a signifi...
Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be...
Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be...
Occluded person re-identification (ReID) aims at matching occluded person images to holistic ones across different camera views. Target Pedestrians (TP) are usually disturbed by Non-Pedestrian Occlusions (NPO) and NonTarget Pedestrians (NTP). Previous methods mainly focus on increasing model's robustness against NPO while ignoring feature contamina...
Background: Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identif...
More than 6,000 human diseases have been recorded to be caused by non-synonymous single nucleotide polymorphisms (nsSNPs). Rapid and accurate prediction of pathogenic nsSNPs can improve our understanding of the principle and design of new drugs, which remains an unresolved challenge. In the present work, a new computational approach, termed MSRes-M...
Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, su...
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, la...
Accurate identification of transcription factor binding sites is of great significance in understanding gene expression, biological development and drug design. Although a variety of methods based on deep-learning models and large-scale data have been developed to predict transcription factor binding sites in DNA sequences, there is room for furthe...
Bacterial type IV secretion systems (T4SSs) are versatile and membrane-spanning apparatuses, which mediate both genetic exchange and delivery of effector proteins to target eukaryotic cells. The secreted effectors (T4SEs) can affect gene expression and signal transduction of the host cells. As such, they often function as virulence factors and play...
Gastric cancer is one of the deadliest cancers worldwide. Accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the m...
DNA N6-methyladenine is an important type of DNA modification that plays important roles in multiple biological processes. Despite the recent progress in developing DNA 6mA site prediction methods, several challenges remain to be addressed. For example, although the hand-crafted features are interpretable, they contain redundant information which m...
Objectives
To identify the associations of temperature with non-COVID-19 mortality and all-cause mortality in the pandemic 2020 in comparison with the non-COVID-19 period in Italy.
Methods
The data on 3,189,790 all-cause deaths (including 3,134,137 non-COVID-19 deaths) and meteorological conditions in 107 Italian provinces between February 1st and...
Motivation
X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structure...
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. How...
Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been...
Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming expe...
Motivation: Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patt...
Motivation:
Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility....
Despite the availability of methods for analyzing protein complexes, systematic analysis of complexes under multiple conditions remains challenging. Approaches based on biochemical fractionation of intact, native complexes and correlation of protein profiles have shown promise. However, most approaches for interpreting cofractionation datasets to y...
Understanding how a mutation might affect protein stability is of significant importance to protein engineering and for elucidating the mechanisms of protein evolution and genetic diseases. While a number of computational tools have been developed to predict the effect of missense mutations on protein stability upon mutations; they are known to exh...
Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification (PTM) and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for the identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-...
Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have s...
Both protease- and reactive oxygen species (ROS)-mediated proteolysis are thought to be key effectors of tissue remodeling. We have previously shown that comparison of amino acid composition can predict the differential susceptibilities of proteins to photo-oxidation. However, predicting protein susceptibility to endogenous proteases remains challe...
Glaucoma, a major cause of irreversible blindness worldwide, is associated with elevated intraocular pressure (IOP) and progressive loss of retinal ganglion cells (RGCs) that undergo apoptosis. A mechanism for RGCs injury involves impairment of neurotrophic support and exogenous supply of neurotrophic factors has been shown to be beneficial. Howeve...
It is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of "OMICS" technologies, and recen...
Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of A...
Background: Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs.
Several models have been developed to predict human dicer cleavage sites, including
PHDCleav and LBSizeCleav. Given an input sequence, these models can predict
whether the sequence contains a cleavage site. However, these models only consider
each sequence independently and l...
As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on hand-crafted features, which depict the characteristics of different protein folds; however, effective feature extrac...
Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP predicti...
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such bindi...
Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into target cells. These substrates are proteins that function to promote bacterial survival: by Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or facilitating nutrient collec...
Background: Determining a suitable dose of intravenous colistimethate is challenging because of complicated pharmacokinetics, confusing terminology, and the potential for renal toxicity. Only recently have reliable pharmacokinetic/pharmacodynamic data and dosing recommendations for intravenous colistimethate become available.
Objective: The aim of...
Age, disease, and exposure to environmental factors can induce tissue remodelling and alterations in protein structure and abundance. In the case of human skin, ultraviolet radiation (UVR)-induced photo-ageing has a profound effect on dermal extracellular matrix (ECM) proteins. We have previously shown that ECM proteins rich in UV-chromophore amino...
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerabl...
Beta-lactamases are enzymes localized in the periplasmic space of bacterial pathogens, where they confer resistance to beta-lactam antibiotics. Experimental identification of beta-lactamases is costly yet crucial to understanding beta-lactam resistance mechanisms. To address this issue, we present DeepBL, a deep learning-based approach by incorpora...