Jiangning Song

Jiangning Song
Monash University (Australia) · Department of Biochemistry and Molecular Biology

Doctor of Philosophy
Accelerate biomedical knowledge discovery and develop novel diagnostics/treatments in the big data & digital health era

About

348
Publications
84,439
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,454
Citations
Introduction
My research interests are at the frontier and interface of computer science, data science and computational biomedicine. I am highly motivated to design, develop and deploy cutting-edge, data-driven statistical and computational algorithms, models, web servers, tools, and pipelines to address a range of challenging problems in computational biomedicine. I am passionate about solving the statistical and analytical challenges posed by the scale and complexity of biomedical big data in the era of Data Science and Analytics.
Additional affiliations
January 2019 - present
Monash University (Australia)
Position
  • Group Leader

Publications

Publications (348)
Article
Full-text available
Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature repre...
Article
Full-text available
With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures, and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have...
Article
Full-text available
Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these e...
Article
Full-text available
A depleted antimicrobial drug pipeline combined with an increasing prevalence of Gram-negative 'superbugs' has increased interest in nano therapies to treat antibiotic resistance. As cubosomes and polymyxins disrupt the outer membrane of Gram-negative bacteria via different mechanisms, we herein examine the antimicrobial activity of polymyxin-loade...
Article
Full-text available
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no on...
Article
Nitric oxide (NO)-releasing nanoparticles are effective nanomedicines with diverse therapeutic advantages compared with small molecule-based NO donors. Here, we report a new class of furoxan-based NO-releasing nanoparticles using a simple, creative yet facile coassembly approach. This is the first time we demonstrated that the coassembled NO-releas...
Article
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for protein stability (PROST) change (∆∆G) upon single-point missense mutation. PROST extracts multiple descriptors from the most...
Chapter
Posttranslational modifications (PTMs) have vital roles in a myriad of biological processes, such as metabolism, DNA damage response, transcriptional regulation, protein-protein interactions, cell death, immune response, signaling pathways and aging. Identification of PTM sites is a crucial first step for biochemical, pathological and pharmaceutica...
Article
Motivation: Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is funda-mentally important for understanding gene structure, regulation, and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identif...
Article
Motivation: The molecular subtyping of gastric cancer (adenocarcinoma) into four main subtypes based on integrated multiomics profiles, as proposed by The Cancer Genome Atlas (TCGA) initiative, represents an effective strategy for patient stratification. However, this approach requires the use ofmultiple technological platforms, and is quite expens...
Article
The ubiquitin-mediated pathway has been comprehensively explored in the free-living nematode Caenorhabditis elegans, but very little is known about this pathway in parasitic nematodes. Here, we inferred the ubiquitination pathway for an economically significant and pathogenic nematode – Haemonchus contortus – using abundant resources available for...
Article
Motivation: Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design. Results: Here, we develop a novel bioimage-based computational approach, term...
Article
Full-text available
Gastric cancer is one of the deadliest cancers worldwide. Accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the m...
Article
Among various types of protein post-translational modifications (PTMs), lysine PTMs play an important role in regulating a wide range of functions and biological processes. Due to the generation and accumulation of enormous amount of protein sequence data by ongoing whole-genome sequencing projects, systematic identification of different types of l...
Article
Full-text available
Optimization of fermentation process for the recombinant protein production (RPP) is often resource-intensive. Machine learning (ML) approaches are useful in minimizing the experimentations and find vast applications in RPP. However, these ML-based tools primarily focus on features with respect to amino acid sequences ruling out the influence of fe...
Article
Full-text available
Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming th...
Article
Full-text available
It is a general assumption of molecular biology that the ensemble of expressed molecules, their activities, and interactions determine biological function, cellular states and phenotypes. Stable protein complexes - or macromolecular machines - are in turn the key functional entities mediating and modulating most biological processes. While the iden...
Article
RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotech...
Article
Protein fold recognition is a critical step in protein structure and function prediction, and aims to ascertain the most likely fold type of the query protein. As a typical pattern recognition problem, designing a powerful feature extractor and metric function to extract relevant and representative fold-specific features from protein sequences is t...
Preprint
Thyroid disease instances have been continuously increasing since the 1990s, and thyroid cancer has become the most rapidly rising disease among all the malignancies in recent years. Most existing studies focused on applying deep convolutional neural networks for detecting thyroid cancer. Despite their satisfactory performance on binary classificat...
Article
Full-text available
Background: Accurate estimation of historical PM2.5 (particle matter with an aerodynamic diameter of less than 2.5μm) is critical and essential for environmental health risk assessment. Objectives: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-le...
Article
Full-text available
Inhaled polymyxins are increasingly used to treat pulmonary infections caused by multidrug-resistant Gram-negative pathogens. We have previously shown that apoptotic pathways, autophagy and oxidative stress are involved in polymyxin-induced toxicity in human lung epithelial cells. In the present study, we employed human lung epithelial cells A549 t...
Article
Full-text available
Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among...
Article
Full-text available
Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a signifi...
Article
Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be...
Preprint
Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be...
Preprint
Full-text available
Occluded person re-identification (ReID) aims at matching occluded person images to holistic ones across different camera views. Target Pedestrians (TP) are usually disturbed by Non-Pedestrian Occlusions (NPO) and NonTarget Pedestrians (NTP). Previous methods mainly focus on increasing model's robustness against NPO while ignoring feature contamina...
Article
Full-text available
Background: Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identif...
Article
Full-text available
More than 6,000 human diseases have been recorded to be caused by non-synonymous single nucleotide polymorphisms (nsSNPs). Rapid and accurate prediction of pathogenic nsSNPs can improve our understanding of the principle and design of new drugs, which remains an unresolved challenge. In the present work, a new computational approach, termed MSRes-M...
Article
Full-text available
Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, su...
Article
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, la...
Article
Full-text available
Accurate identification of transcription factor binding sites is of great significance in understanding gene expression, biological development and drug design. Although a variety of methods based on deep-learning models and large-scale data have been developed to predict transcription factor binding sites in DNA sequences, there is room for furthe...
Article
Full-text available
Bacterial type IV secretion systems (T4SSs) are versatile and membrane-spanning apparatuses, which mediate both genetic exchange and delivery of effector proteins to target eukaryotic cells. The secreted effectors (T4SEs) can affect gene expression and signal transduction of the host cells. As such, they often function as virulence factors and play...
Preprint
Full-text available
Gastric cancer is one of the deadliest cancers worldwide. Accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the m...
Article
Full-text available
DNA N6-methyladenine is an important type of DNA modification that plays important roles in multiple biological processes. Despite the recent progress in developing DNA 6mA site prediction methods, several challenges remain to be addressed. For example, although the hand-crafted features are interpretable, they contain redundant information which m...
Article
Objectives To identify the associations of temperature with non-COVID-19 mortality and all-cause mortality in the pandemic 2020 in comparison with the non-COVID-19 period in Italy. Methods The data on 3,189,790 all-cause deaths (including 3,134,137 non-COVID-19 deaths) and meteorological conditions in 107 Italian provinces between February 1st and...
Article
Motivation X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structure...
Article
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. How...
Article
Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been...
Article
Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming expe...
Article
Motivation: Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patt...
Article
Motivation: Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility....
Article
Full-text available
Despite the availability of methods for analyzing protein complexes, systematic analysis of complexes under multiple conditions remains challenging. Approaches based on biochemical fractionation of intact, native complexes and correlation of protein profiles have shown promise. However, most approaches for interpreting cofractionation datasets to y...
Article
Understanding how a mutation might affect protein stability is of significant importance to protein engineering and for elucidating the mechanisms of protein evolution and genetic diseases. While a number of computational tools have been developed to predict the effect of missense mutations on protein stability upon mutations; they are known to exh...
Article
Lysine crotonylation (Kcr) is a newly discovered type of protein post-translational modification (PTM) and has been reported to be involved in various pathophysiological processes. High-resolution mass spectrometry is the primary approach for the identification of Kcr sites. However, experimental approaches for identifying Kcr sites are often time-...
Article
Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have s...
Article
Full-text available
Both protease- and reactive oxygen species (ROS)-mediated proteolysis are thought to be key effectors of tissue remodeling. We have previously shown that comparison of amino acid composition can predict the differential susceptibilities of proteins to photo-oxidation. However, predicting protein susceptibility to endogenous proteases remains challe...
Article
Full-text available
Glaucoma, a major cause of irreversible blindness worldwide, is associated with elevated intraocular pressure (IOP) and progressive loss of retinal ganglion cells (RGCs) that undergo apoptosis. A mechanism for RGCs injury involves impairment of neurotrophic support and exogenous supply of neurotrophic factors has been shown to be beneficial. Howeve...
Preprint
Full-text available
It is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of "OMICS" technologies, and recen...
Article
Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of A...
Article
Full-text available
Background: Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and l...
Article
As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on hand-crafted features, which depict the characteristics of different protein folds; however, effective feature extrac...
Article
Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP predicti...
Article
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such bindi...
Article
Full-text available
Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into target cells. These substrates are proteins that function to promote bacterial survival: by Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or facilitating nutrient collec...
Article
Full-text available
Background: Determining a suitable dose of intravenous colistimethate is challenging because of complicated pharmacokinetics, confusing terminology, and the potential for renal toxicity. Only recently have reliable pharmacokinetic/pharmacodynamic data and dosing recommendations for intravenous colistimethate become available. Objective: The aim of...
Preprint
Full-text available
Age, disease, and exposure to environmental factors can induce tissue remodelling and alterations in protein structure and abundance. In the case of human skin, ultraviolet radiation (UVR)-induced photo-ageing has a profound effect on dermal extracellular matrix (ECM) proteins. We have previously shown that ECM proteins rich in UV-chromophore amino...
Article
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerabl...
Article
Full-text available
Beta-lactamases are enzymes localized in the periplasmic space of bacterial pathogens, where they confer resistance to beta-lactam antibiotics. Experimental identification of beta-lactamases is costly yet crucial to understanding beta-lactam resistance mechanisms. To address this issue, we present DeepBL, a deep learning-based approach by incorpora...