Osama MahmoudUniversity of Essex · Department of Mathematical Sciences
Osama Mahmoud
Data Science & Statistics
About
52
Publications
8,199
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
947
Citations
Introduction
Detailed descriptions of my ongoing work are reported on my website: http://osmahmoud.com
Additional affiliations
January 2019 - December 2022
January 2017 - April 2019
April 2015 - present
Education
January 2012 - June 2015
September 2003 - August 2008
September 1999 - May 2003
Publications
Publications (52)
Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical meth...
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based u...
Estimated genetic associations with prognosis, or conditional on a phenotype (e.g. disease incidence), may be affected by collider bias, whereby conditioning on the phenotype induces associations between causes of the phenotype and prognosis. We propose a method, ‘Slope-Hunter’, that uses model-based clustering to identify and utilise the class of...
Rationale:
Puberty may influence lung function, but the precise role of pubertal height growth in lung development is unclear.
Objectives:
To examine associations of timing of puberty and peak velocity of pubertal height growth with lung function in adolescence and early-adulthood.
Methods:
Longitudinal analyses of repeat height measurements f...
Rationale
Lung function in early adulthood is associated with subsequent adverse health outcomes.
Objectives
To ascertain whether stable and reproducible lung function trajectories can be derived in different populations and investigate their association with objective measures of cardiovascular structure and function.
Methods
Using latent profil...
Rationale
Early-life exposures may influence lung function at different stages of the life course. However, relative importance of characteristics at different stages of infancy and childhood are unclear.
Objectives
To examine the associations and relative importance of early-life events on lung function at age 24-years.
Methods
We followed 7545...
Objectives: lung function in early adulthood is an important determinant of all-cause mortality and COPD. We aimed to model data from multiple cohorts from pre-school age to physiological peak in early adulthood to derive FEV1/FVC trajectories. We then investigated the association of the derived trajectories with early-life risk factors and markers...
Genome‐wide association studies have provided many genetic markers that can be used as instrumental variables to adjust for confounding in epidemiological studies. Recently, the principle has been applied to other forms of bias in observational studies, especially collider bias that arises when conditioning or stratifying on a variable that is asso...
Longitudinal epidemiological data are scarce on the relation between dietary intake of vitamin A and respiratory outcomes in childhood. We investigated whether a higher intake of preformed vitamin A or provitamin β-carotene in mid-childhood is associated with higher lung function and with asthma risk in adolescence.
In the Avon Longitudinal Study o...
Background
Residing in greener areas is increasingly linked to beneficial health outcomes, but little is known about its effect on respiratory health.
Objective
We examined associations between residential greenness and nearby green spaces with lung function up to 24 years in the UK Avon Longitudinal Study of Parents and Children (ALSPAC) birth co...
Background: Studying genetic associations with prognosis (e.g. survival, disability, subsequent disease events) is problematic due to selection bias - also termed index event bias or collider bias - whereby selection on disease status can induce associations between causes of incidence with prognosis. A current method for adjusting genetic associat...
Background:
Although physical activity has many known health benefits, its association with lung function in childhood/adolescence remains unclear. We examined the association of physical-activity trajectories between 11 and 15 years with lung function at 15 years in 2266 adolescents.
Methods:
A population-based cohort of 14 305 singleton births...
Objectives
To (1) determine the prevalence of nonperialveolar palatal fistula up to age 5 following repair of unilateral cleft lip and palate (UCLP) in the United Kingdom, (2) examine the association of palatoplasty techniques with fistula occurrence, and (3) describe the frequency of fistula repairs and their success.
Design
Cross-sectional study...
The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse....
Rationale: Body composition changes throughout life may explain the inconsistent associations reported between body mass index and lung function in children.
Objectives: To assess the associations of body weight and composition trajectories from 7 to 15 years with lung function at 15 years and lung function growth between 8 and 15 years.
Methods: S...
Evidence on whether early puberty increases the risk of adult asthma is suggestive but inconclusive in women, and very scarce in men. To overcome the issue of residual confounding in observational studies and provide evidence on casual effects, we used Mendelian randomization (MR) with 332 SNPs as instrumental variables for age at menarche in women...
Background:
Latent class analysis (LCA) has been used extensively to identify (latent) phenotypes of childhood wheezing. However, the number and trajectory of discovered phenotypes differed substantially between studies.
Objective:
We sought to investigate sources of variability affecting the classification of phenotypes, identify key time point...
Introduction and objectives
Low lung function in adult life could be attributed to poor lung growth with low maximal lung function in early adulthood, rapid decline during adult life or a combination of these. The aim of this study is to identify maternal and early childhood determinants of lung function in young adults, incorporating preconception...
Background
Type 2 Diabetes (T2DM) is increasing in childhood especially among females and South‐Asians.
Objective
To report outcomes from a national cohort of children and adolescents with T2DM one year following diagnosis
Subjects and Methods
Clinician reported, one‐year follow‐up of a cohort of children (<17years) diagnosed with T2DM reported t...
Background
Observational studies on pubertal timing and asthma, mainly performed in females, have provided conflicting results about a possible association of early puberty with higher risk of adult asthma, possibly due to residual confounding. To overcome issues of confounding, we used Mendelian randomisation (MR), i.e., genetic variants were used...
Results of main and secondary MR analyses of age at voice breaking in males.
(XLSX)
Data for the MR analyses of age at voice breaking in males.
(XLSX)
Data for the MR analyses of age at menarche in females.
(XLSX)
Results of main and secondary MR analyses of age at menarche in females.
(XLSX)
Objective
To report outcomes from a national cohort of children and young people with type 2 diabetes (T2DM), 1 year post diagnosis Research design and methods
1 year follow up of a cohort of children (<17 years) with T2DM reported through the British Paediatric Surveillance Unit between April 2015 to April 2016. This established an overall UK inci...
Aims:
To estimate the incidence of Type 2 diabetes in children aged <17 years, compare this with similar data 10 years ago, and characterize clinical features at diagnosis in the UK and Republic of Ireland.
Methods:
Using the British Paediatric Surveillance Unit reporting framework, cases of Type 2 diabetes diagnosed in children aged <17 years b...
Objective:
To assess postoperative quality of life (QOL) and other patient-reported outcomes following surgery for vestibular schwannoma.
Study design:
Cross-sectional retrospective case review using postal questionnaires.
Setting:
Tertiary referral center.
Patients:
Five hundred consecutive patients undergoing surgery for vestibular schwann...
Objectives:
To explore centre-level variation in fluoride treatment and oral health outcomes and to examine the association of individual- and area-level risk factors with dental decay in Cleft Care UK (CCUK).
Setting:
Two hundred and sixty-eight 5-year-old British children with non-syndromic unilateral cleft lip and palate (UCLP).
Materials an...
Objectives:
Outline methods used to describe centre-level variation in treatment and outcome in children in the Cleft Care UK (CCUK) study. Report centre-level variation in dento-facial outcomes.
Setting and sample population:
Two hundred and sixty-eight five-year-old British children with non-syndromic unilateral cleft lip and palate (UCLP).
M...
Objectives:
To summarize and discuss centre-level variation across a range of treatment and outcome measures and examine individual and ecological determinants of outcome in children in Cleft Care UK (CCUK).
Setting and sample population:
Two hundred and sixty-eight 5-year-old British children with non-syndromic unilateral cleft lip and palate (...
Objectives:
To explore centre-level variation in otitis media with effusion (OME), hearing loss and treatments in children in Cleft Care UK (CCUK) and to examine the association between OME, hearing loss and developmental outcomes at 5 and 7 years.
Setting and sample population:
Two hundred and sixty-eight 5-year-old British children with non-sy...
Objectives:
The aims of this study were to describe child behavioural and psychosocial outcomes associated with appearance and speech in the Cleft Care UK (CCUK) study. We also wanted to explore centre-level variation in child outcomes and investigate individual predictors of such outcomes.
Setting and sample population:
Two hundred and sixty-ei...
Objectives:
To investigate centre-level variation in speech intervention and outcome and factors associated with a speech disorder in children in Cleft Care UK (CCUK).
Setting and sample population:
Two hundred and sixty-eight 5-year-old British children with non-syndromic unilateral cleft lip and palate recruited to CCUK.
Materials and methods...
Aims
To estimate the incidence of type 2 diabetes in children under 17 years of age in the UK and Republic of Ireland and characterise the clinical features and co-morbidities present at diagnosis. In 2005, the incidence in the UK of type 2 diabetes in children under 17 years of age was 0.53/100,000/year.
Methods
Using the British Paediatric Surve...
Aim:
To assess cholesterol screening of children with Type 1 diabetes by diabetes professionals using a survey of current practice, given that National Institute of Health and Care Excellence guidelines on childhood Type 1 diabetes do not recommend cholesterol screening, yet the National Paediatric Diabetes Audit has an annual cholesterol measure...
Background: The National Institute for Health and Care Excellence (NICE) guidelines on childhood type 1 diabetes (T1D) do not recommend cholesterol screening. However, the National Paediatric Diabetes Audit (NPDA) has an annual cholesterol measure (>12 years) as a key outcome indicator. This is confusing for professionals managing children with T1D...
Combining multiple classifiers can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features
in the data sets. This technique can also be used for estimating class membership probabilities. We propose an ensemble of k-Nearest Neighbours (kNN) classifiers for class membership...
For many functional genomic experiments, identifying the most characterizing genes is a main challenge. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification
based only on a set of discriminative genes. Analyzing overlapping
between gene expression of different classes is an effective c...
Machine learning methods can be used for estimating the class membership probability of an observation. We propose an ensemble of optimal trees in terms of their predictive performance. This ensemble is formed by selecting the best trees from a large initial set of trees grown by random forest. A proportion of trees is selected on the basis of thei...
Electronic kiosks in public locations such as the workplace can be used to provide health related information, and as automated occupational health tools to monitor employee health and well-being. These kiosks record physiological measures, e.g. body mass index (BMI), body composition, blood pressure (BP) and heart rate. For this pilot study, one W...
Functions for classification and group membership probability estimation are given. The issue of non-informative features in the data is addressed by utilizing the ensemble method. A few optimal models are selected in the ensemble from an initially large set of base k-nearest neighbours (KNN) models, generated on subset of features from the trainin...
Functions for creating ensembles of optimal trees for regression, classification and class membership probability estimation are given. A few trees are selected from an initial set of trees grown by random forest for the ensemble on the basis of their individual and collective performance. Trees are assessed on out-of-bag data and on an independent...
A Statistical learning approach concerns with understanding and modelling complex datasets. Based on a given training data, its main aim is to build a model that maps the relationship between a set of input features and a considered response in a predictive way. Classification is the foremost task of such a learning process. It has applications enc...
Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. A statistical method for selecting genes...
In genomic microarray data, as well as in proteomics, the expressions of thousands of genes are observed in a much smaller number of patients. We aimed to use the gene expression data to identify genes that distinguish between different classes, such as patients who have invasive/non-invasive colorectal cancer. We proposed a novel gene selection me...