ArticlePDF Available

SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes

Authors:

Abstract and Figures

Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the new potential targets for the drugs which not only control the disease but also can treat it. Support vector machines are the classifier which has a potential to make a classification of the discriminatory genes and non-discriminatory genes. SVMRFE a modification of SVM ranks the genes based on their discriminatory power and eliminate the genes which are not involved in causing the disease. A gene regulatory network has been formed with the top ranked coding genes to identify their role in causing diabetes. To further validate the results pathway study was performed to identify the involvement of the coding genes in type II diabetes. The genes obtained from this study showed a significant involvement in causing the disease, which may be used as a potential drug target.
No caption available
… 
No caption available
… 
No caption available
… 
No caption available
… 
Content may be subject to copyright.
SVMRFE based approach for prediction of most discriminatory gene
target for type II diabetes
Atul Kumar
a,
, D. Jeya Sundara Sharmila
b
, Sachidanand Singh
a
a
Department of Biotechnology and Health Sciences, Karunya University, Coimbatore, Tamil Nadu, India
b
Department of Nanosciences and Technology, Tamil Nadu Agriculture University, Coimbatore, Tamil Nadu, India
abstractarticle info
Article history:
Received 29 November 2016
Received in revised form 7 February 2017
Accepted 15 February 2017
Available online 17 February 2017
Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important
source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the
new potential targets for the drugs which not only control the disease but also can treat it. Support vector ma-
chines are the classier which has a potential to make a classication of the discriminatory genes and
non-discriminatory genes. SVMRFE a modication of SVM ranks the genes based on their discriminatory
power and eliminate the genes which are not involved in causing the disease. A gene regulatory network
has been formed with the top ranked coding genes to identify their role in causing diabetes. To further
validate the results pathway study was performed to identify the involvement of the coding genes in
type II diabetes. The genes obtained from this study showed a signicant involvement in causing the disease,
which may be used as a potential drug target.
© 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords:
Type II diabetes
SVMRFE
Microarray
Protein-protein interaction
t-test
1. Introduction
Support Vector Machine (SVM), a machine learning technique im-
plied in the area of time series prediction and classication [31,36] has
widely been applied in the life science elds, especially in Bioinformat-
ics. It can handle nonlinear classication tasks efciently by mapping
the samples into a higher dimensional feature space by using a nonlin-
ear kernel function. Since the SVM approach is data-driven and model-
free, it has important discriminating power for classication. This
characteristic of SVM is obviousin cases where the sample sizes are neg-
ligible and numerous variables are involved (high-dimensional space).
Expression prole come under such a category, which contain a
large number of attributes (genes). This type of expression data is
used to predict the type and occurrence of the disease in a patient
[39]. An important aspect while analyzing such type of expression
data is the feature selection or dimensionality reduction. Most algo-
rithms lose their potency when genes are largein number with different
time series data or dimensionality [7].
To accomplish the task of dimensionality reduction a modied ver-
sion of SVM known as SVMRFE (Support Vector Machine Recursive Fea-
ture Elimination) has been used in this work. SVMRFE was used to
identifythe most discriminatory targetgene in four different microarray
data samples of type II diabetes. These samples have been taken from
the Gene Expression Omnibus database (GEO) [13] and Diabetes
Genome Anatomy Project (DGAP) (http://www.diabetesgenome.org/).
The idea was to build a model wherein the least important features
(genes) can be eliminated at each iterative step based on the weight
assigned to each gene through SVM. The genes identied through this
approach were then classied as essential and non-essential genes.
The protein-protein interaction of these non-essential genes revealed
vital information regarding interacting proteins. Functional enrichment
about these proteins shed a light on their regulatory pathways associat-
ed with type II diabetes which can be further explored and conrmed
using experimental approach.
2. Materials and methods
2.1. Collection of data sample
71 samples from Pancreatic Islet and Skeletal muscle of Homo sapiens
were collected from the GEO and DGAP. Out of these 37 samples are of
normal human beings and 34 are of diabetic humans. Table 1 shows the
detail description of each of the data sets which were undertaken for
studies.
Fisher linear discriminant was applied to all the above-mentioned
data sets to rankthem based on the Fischer score [21] which wascontin-
ued with a redundancy reduction step to reduce the redundant data in
the microarray dataset [22]. The gene number present in each data set
Genomics Data 12 (2017) 2837
Corresponding author.
E-mail address: atulkumar@karunya.edu (A. Kumar).
http://dx.doi.org/10.1016/j.gdata.2017.02.008
2213-5960 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Contents lists available at ScienceDirect
Genomics Data
journal homepage: www.elsevier.com/locate/gdata
was still high. A t-test [3] with a signicance level of 0.05 was applied to
the datasets to lter out the genes which are not involved in causing
type II diabetes. After this reduction step SVMRFE approach (with linear
kernel function and 6 subsets of the training data) [24] was applied to
train the data samples for 5 iterations. As a result, discriminatory
genes based on the weighted ranking were obtained. The identied
genes were identied as being essential and non-essential using the da-
tabase of essential genes. A geneinteraction and pathway analysis of the
potential non-essential genes was performed to identify the novel tar-
gets for type II diabetes (Fig. 1)
3. Result and discussion
3.1. t-test analysis
For each of the T2D datasets, a t-test analysis was performed with a
signicance level of 0.05. As a result, there was a high dimensionality
Table 1
Microarray dataset undertaken for studies.
Source Data No. of samples No. of genes Country
Normal Diabetic
GEO Effect of insulin infusion on human skeletal muscle [33] 6 6 22,215 Sweden
DGAP Human pancreatic islets from normal and Type 2 diabetic subjects (A) [18] 7 5 22,191 Caucasian and Asian
DGAP Human pancreatic islets from normal and Type 2 diabetic subjects (B) [18] 7 5 22,550
DGAP Human skeletal muscle - type 2 diabetes [29] 17 18 22,177 Sweden
Fig. 1. Flow chart of the analysis.
Table 2
Number of input and output genes from each dataset for t-test analysis.
Name of dataset No of
inputted
genes
No of genes
rejecting the
null hypothesis
Effect of insulin infusion on human skeletal muscle 1223 24
Human pancreatic islets from normal and type II
diabetic subjects (A)
1210 17
Human pancreatic islets from normal and type II
diabetic subjects (B)
803 21
Human skeletal muscle-type II diabetes 1238 28
Table 3
p-value of genes following the alternative hypothesis for the dataset GSE7146.
Probe id Gene p-Value
213524_s_at G0/G1switch 2 0.00001
216599_x_at Solute carrier family 22 (organic anion transporter),
member 6
0.00005
207295_at Sodium channel, non-voltage-gated 1, gamma 0.0001
218409_s_at DnaJ (Hsp40) homolog, subfamily C, member 1 0.0003
203221_at Transducin-like enhancer of split 1 (E (sp1) homolog,
(Drosophila)
0.0004
210452_x_at Cytochrome P450, family 4, subfamily F, polypeptide 2 0.001
201630_s_at Acid phosphatase 1, soluble 0.001
207955_at Chemokine (C-C motif) ligand 27 0.002
208507_at Olfactory receptor, family 7, subfamily C, member 2 0.002
210889_s_at Fc fragment of IgG, low afnity IIb, receptor (CD32) 0.002
207732_s_at Discs, large homolog 3 (neuroendocrine-dlg, Drosophila) 0.002
220636_at Dynein, axonemal, intermediate polypeptide 2 0.002
205863_at S100 calcium binding protein A12 0.002
205603_s_at Diaphanous homolog 2 (Drosophila) 0.003
220979_s_at ST6 (alpha-N-acetyl-neuraminy l-2, 3-beta-galactosy l-1,
3) -N-acetylgalactosaminide alpha-2, 6-sialyltransferase 5
0.003
206310_at Serine peptidase inhibitor, Kazal Type II (acrosin-trypsin
inhibitor)
0.004
210442_at Interleukin 1 receptor-like 1 0.004
201214_s_at Protein phosphatase 1, regulatory subunit 7 0.004
220385_at Junctophilin 2 0.004
205490_x_at Gap junction protein, beta 3, 31 kDa (connexin 31) 0.004
213772_s_at Golgi-associated, gamma adaptin ear containing, ARF
binding protein 2
0.004
213950_s_at Protein phosphatase 3 (formerly 2B), catalytic subunit,
gamma isoform (calcineurin A gamma)
0.004
201681_s_at Discs, large homolog 5 (Drosophila) 0.004
220782_x_at Kallikrein-related peptidase 12 0.004
29A. Kumar et al. / Genomics Data 12 (2017) 2837
reduction in each dataset (Table 2). The genes rejecting the nullhypoth-
esis were obtained for each of the data samples. Tables 36show the
corresponding p-values of all the genes which have rejected the null hy-
pothesis at signicance level of 0.05. The Figs. 25represent graphically
the p-value of all the genes in the four datasets under consideration. The
p-value for most of the genes was above the signicance level value of
0.05. This represents that these genes have almost the same expression
value in the normal and diseased and may not be involved in causing
the disease.
3.2. Identication of best-ranked genes from SVMRFE
The subsets of genes based on the p-value were given as an input to
the support vector machine. Recursive Feature Elimination (RFE) is an
iterative procedure for SVM classier. The recursive feature elimination
algorithm of the support vector machine assigns a weight to each gene.
The weightwas calculated based on the expression value of genes in the
disease and the normal sample for all the dataset. The algorithm classi-
ed the genes (with a classication accuracy of 83.9%) based on the de-
scending order of the weight. Then it generated the list of genes which
were found to be the most discriminatory in the normal and disease
samples (Tables 710). The outline for SVMRFE in the linear kernel is
presented below:
Inputs:
Training samples
X
0
=[x
1
,x
2
,,x
n
]
T
Class labels (1 for normal or 0 for diseased)
y=[y
1
,y
2
,,y
n
]
T
Initialize:
Surviving genes
s = [1, 2,n]
Gene-ranking list
r=[]
Limit training samples to good genes
X=×
0
(:, s)
Train the classier
α= SVM-train (X, y)
Compute the weight from each selected gene:
w=
k
αkykxkwhere k indicates the k
th
training pattern
Compute the ranking criterion for the i
th
gene
R(i)=(w
i
)[2]
Mark the gene with the lowest ranking
g = arg min (R)
Renew the gene-ranking list
r = [s (g), r]
Table 4
p-Value of genes following the alternative hypothesis for the dataset human pancreatic
islets from normal and type II diabetic subjects (A).
Probe id Gene p-Value
207406_at Cytochrome P450, family 7, subfamily A, polypeptide 1 0.0003
214046_at Fucosyltransferase 9 (alpha (1,3) fucosyltransferase) 0.0004
213980_s_at C-terminal binding protein 1 0.0005
202854_at Hypoxanthine phosphoribosyltransferase 1 0.0005
215300_s_at Flavin containing monooxygenase 5 0.0007
212894_at Suppressor of var1, 3-like 1 (S. cerevisiae) 0.0012
202605_at Glucuronidase, beta 0.0017
203196_at ATP-binding cassette, sub-family C (CFTR/MRP), member 4 0.0021
205633_s_at Aminolevulinate, delta-, synthase 1 0.0022
207673_at Nephrosis 1, congenital, Finnish type (nephrin) 0.0027
209759_s_at Enoyl-CoA delta isomerase 1 0.003
208926_at Sialidase 1 (lysosomal sialidase) 0.003
205627_at Cytidine deaminase 0.004
210284_s_at TGF-beta activated kinase 1/MAP3K7 binding protein 2 0.004
213931_at Inhibitor of DNA binding 2, dominant negative
helix-loop-helix protein
0.0043
213426_s_at Caveolin 2 0.0047
221572_s_at Solute carrier family 26, member 6 0.0049
Table 5
p-Value of genes following the alternative hypothesis for the dataset human pancreatic
islets from normal and type II diabetic subjects (B).
Probe id Gene p-Value
227787_s_at Thyroid hormone receptor-associated protein 6 0.0001
222478_at Vacuolar protein sorting 36 (yeast) 0.0002
230329_s_at Nudix (nucleoside diphosphate linked moiety X) -type
motif 6
0.0003
226424_at Calcyphosine 0.0003
225491_at Solute carrier family 1 (glial high afnity glutamate
transporter), member 2
0.0004
225016_at Adenomatosis polyposis coli down-regulated 1 0.0005
243043_at RAD50 interactor 1 0.0008
224573_at Ribonuclease, RNase K 0.0012
228133_s_at Myosin, heavy polypeptide 11, smooth muscle 0.0013
225108_at Alkylglycerone phosphate synthase 0.0013
224865_at Male sterility domain containing 2 0.0024
231880_at Family with sequence similarity 40, member B 0.0026
241739_at 2-oxoglutarate and iron-dependent oxygenase domain
containing 1
0.003
228036_s_at F-box protein 2 0.0031
223978_s_at Cardiolipin synthase 1 0.0032
244706_at Protein-L-isoaspartate (D-aspartate) O-methyltransferase
domain containing 1
0.0033
237718_at Eukaryotic translation initiation factor 4E 0.0033
222999_s_at Cyclin L2 0.0038
230318_at Serpin peptidase inhibitor, clade A (alpha-1
antiproteinase, antitrypsin), member 1
0.0039
222408_s_at Yippee-like 5 (Drosophila) 0.004
224954_at Serine hydroxymethyltransferase 1 (soluble) 0.0046
Table 6
p-Value of genes following the alternative hypothesis for the dataset human skeletal
muscle-type II diabetes.
Probe id Gene p-Value
219572_at Ca++-dependent secretion activator 2 0.0002
204447_at Leucine zipper, putative tumor suppressor family
member 3
0.0002
221410_x_at Protocadherin beta 3 0.0003
201764_at Transmembrane protein 106C 0.0005
201429_s_at Ribosomal protein L37a 0.0008
204761_at USP6 N-terminal like 0.001
219642_s_at Peroxisomal biogenesis factor 5-like 0.001
218592_s_at Cat eye syndrome chromosome region, candidate 5 0.001
210835_s_at C-terminal binding protein 2 0.001
216695_s_at Tankyrase, TRF1-interacting ankyrin-related ADP-ribose
polymerase
0.001
208067_x_at Ubiquitously transcribed tetratricopeptide repeat
containing, Y-linked
0.001
209400_at Solute carrier family 12 (potassium/chloride
transporters), member 4
0.001
201262_s_at Biglycan 0.001
203171_s_at Ribosomal RNA processing 8, methyltransferase, homolog
(yeast)
0.002
207131_x_at Gamma-glutamyltransferase 1 0.002
219464_at Carbonic anhydrase XIV 0.002
206345_s_at Paraoxonase 1 0.002
210907_s_at Programmed cell death 10 0.002
202641_at ADP-ribosylation factor-like 3 0.002
204969_s_at Radixin 0.003
222289_at Potassium voltage-gated channel, Shaw-related
subfamily, member 2
0.003
210318_at Retinol binding protein 3, interstitial 0.003
219301_s_at Contactin associated protein-like 2 0.004
203116_s_at Ferrochelatase 0.004
207242_s_at Glutamate receptor, ionotropic, kainate 1 0.004
214005_at Gamma-glutamyl carboxylase 0.004
215529_x_at DIP2 disco-interacting protein 2 homolog A (Drosophila) 0.004
30 A. Kumar et al. / Genomics Data 12 (2017) 2837
Eliminate the gene with the lowest ranking
s=s(1:g1, g + 1: length (s))
Repeat until s = []
Output:
A gene-ranking list r
3.3. Identication of degree of essentiality and non-essentiality of genes
To identify signicant and reliable targets, the work was concentrat-
ed on non-essential genes. Essential genes were ruled out based on the
hits obtained from the Database of Essential Genes (DEG 10.9) (http://
tubic.tju.edu.cn/deg/)[46]. Essential genes sustain an organism. There-
fore, having them as a potential gene target may induce side effects of
the drugs. Hence, it is important to identify only the non-essential
genes which may be used as a potential drug target. Tables 1114
show the non-essential genes from the microarray dataset which is
under study
3.4. Gene interaction studies
After obtaining the non-essential genes from the top ranked coding
genes for each of the datasets, gene regulatory network was constructed
using STRING (Search Tool for the Retrieval of Interacting Genes/Pro-
teins) database [40]. The study was mainly done to observe the interac-
tion between non-essential protein-coding genes with other proteins
which are a result of biochemical events and/or electrostatic forces
[23]. The function and activity of a protein are often modulated by
other proteins with which it interacts.
3.4.1. Gene regulatory network of dataset GSE7146
In this dataset, out of the ten best coding genes obtained throughthe
SVMRFE approach, only 5 genes (ACP1, FCGR2B, SCNN1G, CCL27, and
DLG3) showed interaction with other protein coding genes (Fig. 6).
The ACP1 showed a direct interaction with EPHA2, which is reported
to increase the chance of myocardial infarction and reduce the survival
Fig. 2. p-Value corresponding to all the genes in the training set for dataset GSE7146.
Fig. 3. p-Value corresponding to all the genes in the training set for dataset human pancreatic islets from normal and type II diabetic subjects (A).
31A. Kumar et al. / Genomics Data 12 (2017) 2837
rate of hyperglycemic mice [12]. LYN showed indirect interaction with
ACP1 via EPHA2 and direct interaction with FCGR2B. Its kinase activa-
tion modulation has been reported to be a novel insulin receptor-poten-
tiating agent. This potentiating agent produces a rapid-onset and a
durable blood glucose-lowering activity in diabetic animals [32].
FCGR2B also showed direct interaction with PTPN6 which is been re-
ported to negatively regulate insulin action on glucose homeostasis in
the liver and muscle [44]. An analysis of DLG3 has shown its direct inter-
action with GRIN2A and GRIN2B. Both these genes have been reported
to play a potential role in diabetes [11,37,42]. UBC has been reported
to play a major role in the diabetes pathway [8,16,26] and its direct in-
teraction with SCNN1G shows that SCNN1G may also play a role in dia-
betes pathway. CCL27 interacts with CCL25, a protein whose expression
was shown to decrease signicantly in diabetes [30].
3.4.2. Gene regulatory network of dataset human pancreatic islets from
normal and type II diabetic subjects (A)
Except for ABCC4 and FMO5, all theother four proteins showed a sig-
nicant and strong interaction with other neighboring proteins (Fig. 7).
Purine Nucleoside Phosphorylase (PNP) and Nucleoside Phosphate Ki-
nase (NPK) have reportedly played a major role in diabetes either by
positive or negative metabolic regulation [9]. These two molecules
also showed interaction with the HPRT1 and the CDA. Caveolin has al-
ready been reported to mediate insulin signaling thereby affecting the
glucose uptake [6]. In the other subgroup network FUT3 has three direct
neighbors: FUT1, FUT2, and B4GALT1 of which the B4GALT1 expression
level has been shown to be affected by hyperglycemia [25].
3.4.3. Gene regulatory network of the dataset human pancreatic islets from
normal and type II diabetic subjects (B)
Both the protein coding genes in this dataset (RNASEK andAPCDD1)
have shown a signicantinteraction with the neighboring proteins(Fig.
8). The involvement of RNASEK in diabetes is still an unanswered ques-
tion, but APCDD1 interaction with its neighbors shows that it may be in-
volved in the pathophysiology of diabetes. LPAR6 (Lysophosphatidic
Acid Receptor 6) interacting directly withAPCCD1 has shown its activity
with PPARγwhich is a potential target for diabetes [38]. Aranda et al., in
2012 also showed that the DM/HG (Diabetes mellitus/High Glucose)
Fig. 4. p-Value corresponding to all the genes in the training set for dataset human pancreatic islets from normal and type II diabetic subjects (B).
Fig. 5. p-Value corresponding to all the genes in the training set for dataset human skeletal muscle-type II diabetes.
32 A. Kumar et al. / Genomics Data 12 (2017) 2837
reprograms signaling pathways in RECs (Retinal Endothelial Cells) to in-
duce a state of LPA (Lysophosphatidic Acid) resistance. In the year 2000,
Figueroa et al. [14] showed that alterations in LRP5 expression may be
responsible for diabetes susceptibility. Therefore it may be a potential
target for therapeutic intervention. It has been reported that Wnt/
LRP5 (lipoprotein receptor-related protein 5) signaling contributes to
the glucose-induced insulin secretion in the islets [15].
3.4.4. Gene regulatory network of dataset human skeletal muscle-type II
diabetes
The two prominent protein coding genes (USP6NL and ProSAPiP1)
as per SVMRFE analysis showed interaction with a different set of
genes (Fig. 9). This selective network of ProSAPiP1 has not been report-
ed till now, for diabetes. The three genes (SOS1, EGFR, and EGF) in the
interaction network of USP6NL have shown its signicance in connec-
tion with diabetes. SOS1 has shown its association with reference to
the insulin action [4], in differential expression of EGFR which is a
major impact on diabetes and associated diseases [1,5,27,28,41,45].
Kasayama et al. [19] long back in 1989 reported that EGF deciency oc-
curs in diabetes mellitus hence insulin may be important in maintaining
the normal level of EGF in the submandibular gland and plasma.
3.5. Functional enrichment of signicant genes implying pathway analysis
To further validate the involvement of the identied genes in type II
diabetes, pathway enrichment was considered. This was solely meant
for all the interacting proteins with the identied signicant protein(s).
The study was carried out using Biointerpreter, a web-based biological
interpretation tool for Microarray data analysis (Genotypic Technology
Pvt. Ltd., Bangalore, India). The pathway analysis showed that some of
the interacting proteins were involvedin pathways which were directly
or indirectly associated with type II diabetes.
3.5.1. Pathway enrichment for the interacting proteins of the dataset effect
of insulin infusion on human skeletal muscle
GRIN2A (Glutamate [NMDA] receptor subunit epsilon-1) and
GRIN2B (Glutamate [NMDA] receptor subunit epsilon-2), the two
Table 7
Best ranked genes for dataset GSE7146.
Gene name
G0/G1switch 2
Transducin-like enhancer of split 1 (E (sp1) homolog, Drosophila)
Acid phosphatase 1, soluble
DnaJ (Hsp40) homolog, subfamily C, member 1
Golgi-associated, gamma adaptin ear containing, ARF binding protein 2
Protein phosphatase 1, regulatory subunit 7
Interleukin 1 receptor-like 1
Discs, large homolog 5 (Drosophila)
Cytochrome P450, family 4, subfamily F, polypeptide 2
Protein phosphatase 3 (formerly 2B), catalytic subunit, gamma
isoform (calcineurin A gamma)
Gap junction protein, beta 3, 31 kDa (connexin 31)
Diaphanous homolog 2 (Drosophila)
Olfactory receptor, family 7, subfamily C, member 2
Solute carrier family 22 (organic anion transporter), member 6
Serine peptidase inhibitor, Kazal Type II (acrosin-trypsin inhibitor)
Chemokine (C-C motif) ligand 27
Dynein, axonemal, intermediate chain 2
Junctophilin 2
Kallikrein-related peptidase 12
S100 calcium binding protein A12
Discs, large homolog 3 (neuroendocrine-dlg, Drosophila)
Sodium channel, non-voltage-gated 1, gamma subunit
ST6 (alpha-N-acetyl-neuraminyl-2, 3-beta-galactosyl-1, 3)
-N- acetylgalactosaminide alpha-2, 6-sialyltransferase 5
Fc fragment of IgG, low afnity IIb, receptor (CD32)
Table 8
Best ranked genes for dataset human pancreatic islets from normal and type
II diabetic subjects (A).
Gene name
Glucuronidase, beta
Enoyl-CoA delta isomerase 1
C-terminal binding protein 1
Inhibitor of DNA binding 2, dominant negative helix-loop-helix protein
Hypoxanthine phosphoribosyltransferase 1
Sialidase 1 (lysosomal sialidase)
ATP-binding cassette, sub-family C (CFTR/MRP), member 4
Aminolevulinate, delta-, synthase 1
Suppressor of var1, 3-like 1 (S. cerevisiae)
Flavin-containing monooxygenase 5
Solute carrier family 26, member 6
TGF-beta activated kinase 1/MAP3K7 binding protein 2
Caveolin 2
Nephrosis 1, congenital, Finnish type (nephrin)
Fucosyltransferase 9 (alpha (1,3) fucosyltransferase)
Cytidine deaminase
Cytochrome P450, family 7, subfamily A, polypeptide 1
Table 9
Best ranked genes for dataset human pancreatic is-
lets from normal and type II diabetic subjects (B).
Gene name
Adenomatosis polyposis coli down-regulated 1
Ribonuclease, RNase K
Table 10
Best ranked genes for dataset human skeletal muscle-type II diabetes.
Gene name
Protocadherin beta 3
Leucine zipper, putative tumor suppressor family member 3
USP6 N-terminal like
Ubiquitously transcribed tetratricopeptide repeat containing, Y-linked
Table 11
Non-essential genes for dataset GSE7146.
Gene symbol Gene name
G0S2 G0/G1switch 2
ACP1 Acid phosphatase 1, soluble
CCL27 Chemokine (C-C motif) ligand 27
JPH2 Junctophilin 2
KLK12 Kallikrein-related peptidase 12
S100A12 S100 calcium binding protein A12
DLG3 Discs, large homolog 3 (neuroendocrine-dlg, Drosophila)
SCNN1G Sodium channel, non-voltage-gated 1, gamma subunit
ST6GALNAC5 ST6 (alpha-N-acetyl-neuraminyl-2, 3-beta-galactosyl-1, 3)
-N-acetylgalactosaminide alpha-2, 6-sialyltransferase 5
FCGR2B Fc fragment of IgG, low-afnity IIb, receptor (CD32)
Table 12
Non-essential genes for dataset human pancreatic islets fromnormal and type II diabetic
subjects (A).
Gene symbol Gene name
HPRT1 Hypoxanthine phosphoribosyltransferase 1
ABCC4 ATP-binding cassette, sub-family C (CFTR/MRP), member 4
FMO5 Flavin-containing monooxygenase 5
CAV2 Caveolin 2
FUT3 Fucosyltransferase 9 (alpha (1, 3) fucosyltransferase)
CDA Cytidine deaminase
33A. Kumar et al. / Genomics Data 12 (2017) 2837
proteins interacting mainly with the identied protein DLG3 have been
shown to be involved in 3 different pathways viz. Neuroactive ligand-
receptor interaction, circadian entrainment and Long-term potentiation
(Fig. 10). The proteins present in the Neuroactive ligand-receptor inter-
action have shown a signicant role in the pathobiology of obesity and
type II diabetes [10]. The second pathway, circadian entrainment is
the biological process that displays an endogenous oscillation of about
24 h. Studies show that exposure to light at night lowers glucose-stim-
ulated insulin secretion due to a decrease in insulin secretory pulse
mass. Potential mechanisms have been identied by which distur-
bances in the circadian rhythms due to modern lifestyle can lead to
islet failure in the type II diabetes [35]. It has also been reported that
the impaired energy utilization from insulin deciency impairs a long-
term potentiation in diabetes [47].
Table 13
Non-essential genes fordataset human pancreatic islets from normal and typeII diabetic
subjects (B).
Gene symbol Gene name
APCDD1 Adenomatosis polyposis coli down-regulated 1
RNASEK Ribonuclease, RNase K
Table 14
Non-essential genes for dataset human skeletal muscle-type II diabetes.
Gene symbol Gene name
USP6NL Leucine zipper, putative tumor suppressor family member 3
PROSAPIP1 USP6 N-terminal like
Fig. 6. Gene regulatory network of dataset GSE7146.
Fig. 7. Gene regulatory network of dataset human pancreatic islets from normal and type II diabetic subjects (A).
34 A. Kumar et al. / Genomics Data 12 (2017) 2837
3.5.2. Pathway enrichment for the interacting proteins of the dataset hu-
man pancreatic islets from normal and type II diabetic subjects (A)
The protein B4GALT1, interacting with the identied protein FUT3 is
involved in several metabolic pathways, connected to type II diabetes
(Fig. 11). The protein B4GALT1 participates both in glycoconjugate
and lactose biosynthesis. It has shown to be a biomarker in hepatocellu-
lar carcinoma, mainly caused due to the insulin resistance syndrome. Fi-
nally, the ailment manifests as obesity and later as diabetes [17].
3.5.3. Pathway enrichment for the interacting proteins of the dataset hu-
man pancreatic islets from normal and type II diabetic subjects (B)
The protein PNPT1 interacting with the RNASEK is reported to be in-
volved in pyrimidine and purine metabolism and the RNA degradation
(Fig. 12). Effects of the insulin regulation of purine and pyrimidine
metabolism had shown to cause some late complications of the diabetic
disease [34]. In 2009, Kocic et al. [20] reported that an impaired dsRNA
metabolism may lead to increased levels of different sized RNAs in type
II diabetic patients and may have an inuence on further ineffective re-
sponse against the different pathogens.
3.5.4. Pathway enrichment for the interacting proteins of dataset human
skeletal muscle-type II diabetes
EGFR protein interacting with the identied protein USP6NL has al-
ready been reported by many researchers to be involved in diabetes [1,
5,27,28,41,45]. With the pathway studies, it was identied that the main
pathways in which EGFR is involved, is also leading directly to or indi-
rectly to diabetes (Fig. 13). Hypoxia-inducible factor 1 alpha (HIF-1α)
is regulated precisely by hypoxia and hyperglycemia. It had also been
Fig. 8. Gene regulatory network of dataset human pancreatic islets from normal and type II diabetic subjects (B).
Fig. 9. Gene regulatory network of dataset human skeletal muscle-type II diabetes.
35A. Kumar et al. / Genomics Data 12 (2017) 2837
shown that the HIF-1αand glucose can sometimes inuence each other
[43]. It has been reported that the components of the MAPK/ERK path-
way act as modiers of the cellular insulin responsiveness. The insulin
resistance was due to downregulation of the insulin-like receptor gene
expression following persistent MAPK/ERK inhibition. The mechanism
permits physiological adjustment of insulin sensitivity and the subse-
quent maintenance of the circulating glucose at appropriate levels
[48]. MAPK and GnRh-Glp-1 pathways in the ileum have also been re-
ported to be involved in the improvement of the blood glucose level
[45].
4. Conclusion
Analysis of type II diabetes expression data from two different tissue
samples i.e. skeletal muscle and pancreatic islet has given a deep insight
into genes which may be possibly involved in the pathophysiology of
the disease. The most discriminatory genes obtained in each dataset
after complete analysis, have been found to be associated with diabetes
either directly or indirectly. However, themajority of the geneshave not
been previously reported in association with diabetes. The genes identi-
ed in the current study viz. FCGR2B,DLG3,SCNN1G,FUT3,HPRT1,
APCDD1,USP6NL,ProSAPiP1 and RNASEK may act as a potential drug tar-
get. The signicant pathways identied through the overall approach
were Neuroactive ligand-receptor interaction, circadian entrainment,
Long-term potentiation, pyrimidine and purine metabolism, dsRNA me-
tabolism, MAPK/ERK pathway, and GnRh-Glp-1. This study gave the in-
sight to focus on these associated pathways with the above-reported
proteins to study in pathway models or mouse model to elucidate
them as drug targets or markers for type II diabetes.
Conict of interest
The authors declare that there is no conict of interest in the present
work.
Appendix A. Supplementary data
Supplementary data to this article can be found online at http://dx.
doi.org/10.1016/j.gdata.2017.02.008.
References
[1] A. Advani, K.J. Wiggins, A.J. Cox, Y.Zhang, R.E. Gilbert,D.J. Kelly, Inhibition of the epi-
dermal growth factor receptor preserves podocytes and attenuates albuminuria in
experimental diabetic nephropathy. Nephrology 16 (6) (2011 Aug 1) 573581.
[2] J. Aranda, R. Motiejunaite, E. Im, A. Kazlauskas, Diabetesdisrupts the response of ret-
inal endothelial cells to the angiomodulator lysophosphatidic acid. Diabetes 61 (5)
(2012 May 1) 12251233.
[3] P. Baldi, A.D. Long, A Bayesian framework for the analysis of microarray expression
data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17
(6) (2001 Jun 1) 509519.
Fig. 10. Involvement of GRIN2A and GRIN2B in different pathways.
Fig. 11. Involvement of B4GALT1 in different pathways
Fig. 12. Involvement of PNPT1 in different pathways.
Fig. 13. Involvement of EGFR in different pathways.
36 A. Kumar et al. / Genomics Data 12 (2017) 2837
[4] I. Barroso, J.A. Luan , R.P. Middelberg, A.H. Harding, P.W . Franks, R.W. Jakes, D.
Clayton, A.J. Schafer, S. O'Rahilly, N.J. Wareham, Candidate gene association study
in type 2 diabetes indicates a rolefor genes involved in β-cell function as well as in-
sulin action. PLoS Biol. 1 (1) (2003 Oct 13), e20.
[5] S. Belmadani, D.I. Palen, R.A. Gonzalez-Villalobos, H.A. Boulares, K. Matrougui, Ele-
vated epidermal growth factor receptor phosphorylation induces resistance artery
dysfunction in diabetic db/db mice. Diabetes 57 (2008) 16291637.
[6] A.W. Cohen, T.P. Combs, P.E. Scherer, M.P. Lisanti, Role of caveolin and caveolae in
insulin signaling and diabetes. Am. J. Physiol. Endocrinol. Metab. 285 (6) (2003
Dec 1) E1151E1160.
[7] F. Cordero, M. Botta,R.A. Calogero, Microarray data analysis and mining approaches.
Brief. Funct. Genomic. Proteomic. 6 (4) (2007 Dec 1) 265281.
[8] S. Costes, C.J. Huang, T. Gurlo, M. Daval, A.V. Matveyenko, R.A. Rizza, A.E. Butler, P.C.
Butler, β-cell dysfunctional ERAD/ubiquitin/proteasome system in type 2 diabetes
mediated by islet amyloid polypeptideinduced UCH-L1 deciency. Diabetes 60
(1) (2011 Jan 1) 227238.
[9] P.A. Craven, R.K. Studer, F.R. DeRubertis, Impaired nitricoxide-dependentcyclic gua-
nosine monophosphate generation in glomeruli from diabetic rats. Evidence for pro-
tein kinase C-mediated suppression of the cholinergic response. J. Clin. Investig. 93
(1) (1994 Jan) 311.
[10] U.N. Das, A.A. Rao, Gene expression prole in obesity and type 2 diabetes mellitus.
Lipids Health Dis. 6 (1) (2007 Dec 14) 1.
[11] A.M. Davalli, C. Perego, F.B. Folli, The potential role of glutamate in the current dia-
betes epidemic. Acta Diabetol. 49 (3) (2012 Jun 1) 167183.
[12] A. DuSablon, S. Kent, A. Coburn, J. Virag, EphA2-receptor deciency exacerbates
myocardial inf arction and reduces survival in hyperglycem ic mice. Cardiovasc.
Diabetol. 13 (1) (2014 Aug 13) 1.
[13] R. Edgar,M. Domrachev, A.E. Lash, Gene Expression Omnibus: NCBI geneexpression
and hybridization array data repository. Nucleic Acids Res. 30 (1) (200 2 Jan 1)
207210.
[14] D.J. Figueroa, J.F. Hess, B. Ky, S.D. Brown, V. Sandig, A. Hermanowski-Vosatka, R.C.
Twells, J.A. Todd, C.P. Austin, E xpression of the type I diabetes-associated gene
LRP5 in macrophages, vitamin A system cells, and the Islets of Langerhans suggests
multiple potential roles in diabetes. J. Histochem. Cytochem. 48 (10) (2000 Oct 1)
13571368.
[15] T. Fujino, H. Asaba,M.J. Kang, Y. Ikeda, H.Sone, S. Takada, D.H. Kim, R.X. Ioka,M. Ono,
H. Tomoyori, M. Okubo, Low-density lipoprotein receptor-related protein 5 (LRP5)
is essential for normal cholesterol metabolism and glucose-induced insulin secre-
tion. Proc. Natl. Acad. Sci. 100 (1) (2003 Jan 7) 229234.
[16] C. Gao, W. Huang, K. Kanasaki, Y. Xu, The role of ubiquitination and sumoylation in
diabetic nephropathy. Biomed. Res. Int. (2014 Jun 4) 2014.
[17] R. Guerrero-Preston, M. Kim, A. Blanco, C. LeBron, R. Santella, M. Berdasco, M. Fraga,
M. Esteller,D. Sidransky, B4GALT1 as a potential epigenetic marker of metabolic dis-
ruptions associated with Non-Alcoholic Fatty Liver Disease. Cancer Res. 68 (9 Sup-
plement) (2008 May 1) (3827-).
[18] J.E. Gunton, R.N. Kulkarni, S. Yim, T. Okada, W.J. Hawthorne, Y.H. Tseng, R.S.
Roberson, C.Ricordi, P.J. O'Connell, F.J. Gonzalez, C.R. Kahn, Loss of ARNT/HIF1βme-
diates altered gene expression and pancreatic-islet dysfunction in human type 2 di-
abetes. Cell 122 (3) (2005 Aug 12) 337349.
[19] S. Kasayama, Y. Ohba, T. Oka,Epidermal growth factor deciencyassociated with di-
abetes mellitus. Proc. Natl. Acad. Sci. 86 (19) (1989 Oct 1) 76447648.
[20] G.M. Kocic, R. Kocic, R. Pavlovic, T. Jevtovic-Stoimenov, D. Sokolovic, G. Nikolic, V.
Pavlovic, S. Stojanovic, J. Basic, A. Veljkovic, D. Pavlovic, Possible impact of impaired
double-stranded RNA degradation and nitrosative stress on immuno-inammatory
cascade in type 2 diabetes. Exp. Clin. Endo crinol. Diabetes 117 (09) (2009 Oct)
480485.
[21] A. Kumar, D.J. Sharmila, R. Kant, Selection of discriminatory gene set for type II dia-
betes using sher linear discriminant. Int J Adv Comput Mathe Sci. 5 (2) (2014)
3642.
[22] A. Kumar, D.J. Sharmila, Algorithmic approach for removing the redundancy in dia-
betic gene categories based on semantic similarity and gene expression data. Inter-
disciplinary Sciences: Computational Life Sciences 17 (2015 Mar) 17.
[23] J. Li, A. Mahajan, M.D. Tsai, Ankyrin repeat: a unique motif mediating protein-pro-
tein interactions. Biochemistry 45 (51) (2006 Dec 26) 1516815178.
[24] Y. Liang, F. Zhang, J. Wang, T. Joshi, Y. Wang, D. Xu, Prediction of drought-resistant
genes in Arabidopsis thaliana using SVM-RFE. PLoS One 6 (7) (2011 Jul 15), e21750.
[25] K. Liu, H.Y. Liu, W. Ye, J.H. Jiang, X. Xu, The initial investigation of the expression of
glycosyltransferases in the retina of streptomycin diabetic rats. [Zhonghua Yan Ke
Za Zhi] Chinese Journal of Ophthalmology 46 (7) (2010 Jul) 580584.
[26] M.D. López-Avalos, V.F. Duvivier-Kali, G. Xu, S. Bonner-Weir, A. Sharma, G.C. Weir,
Evidence for a role of the ubiquitin-proteasome pathway in pancreatic islets. Diabe-
tes 55 (5) (2006 May 1) 12231231.
[27] K. Matrougui, Diabetes and microvascular pathophysiology: role of epidermal
growth factor receptor tyrosine kinase. Diabetes Metab. Res. Rev. 26 (1) (2010 Jan
1) 1316.
[28] P.J. Miettinen, J.Ustinov, P. Ormio, R. Gao, J. Palgi, E. Hakonen, L. Juntti-Berggren, P.O.
Berggren, T. Otonkoski, Downregulation of EGF receptor signaling in pancreatic is-
lets causes diabetes due to impaired po stnatal β-cell growth. Diabetes 55 (12)
(2006 Dec 1) 32993308.
[29] V.K. Mootha, C.M. Lindgren, K.F. Eriksson, A. Subramanian, S. Sihag, J. Le har, P.
Puigserver, E. Carlsson, M. Ridderstråle, E. Laurila, N. Houstis, PGC-1α-responsive
genes involved in oxidative phosphorylation are coordinately downregulated in
human diabetes. Nat. Genet. 34 (3) (2003 Jul 1) 267273.
[30] P.R. Nagib, J. Gameiro, L.G. Sti vanin-Silva, M.S. de Arruda, D .M. Villa-Verde, W.
Savino, L. Verinaud, Thymic microenvironmental alterations in experimentally in-
duced diabetes. Immunobiology 215 (12) (2010 Dec 31) 971979.
[31] K.L. Ng, S.K. Mishra, De novo SVM classication of precursor microRNAs from geno-
mic pseudo hairpins using global and intrinsic folding measures. Bioinformatics 23
(11) (2007 Jun 1) 13211330.
[32] A.R. Ochman, C.A. Lipinski, J.A. Handler, A.G. Reaume, M.S. Saporito, The Lyn kinase
activator MLR-1023 is a novel insulin receptor potentiator that elicits a rapid-onset
and durable improvement in glucose homeostasis in animal models of type 2 diabe-
tes. J. Pharmacol. Exp. Ther. 342 (1) (2012 Jul 1) 2332.
[33] H. Parikh, E. Carlsson, W.A. Chutkow, L.E. Johansson, H. Storgaard, P. Poulsen, R.
Saxena, C. Ladd, P.C. Schulze, M.J. Mazzini, C.B. Jensen, TXNIP regulates peripheral
glucose metabolism in humans. PLoS Med. 4 (5) (2007 May 1), e158.
[34] K. Pillwein, M.A. Reardon, H.N. Jayaram, Y. Natsumeda, W.L. Elliott, M.A. Faderan, N.
Prajda, W. Sperl, G. Weber, Insulin regulatory effects on purine-and pyrimidine me-
tabolism in alloxan diabetic rat liver. Padiatr. Padol. 23 (2) (1987 Dec) 135144.
[35] J. Qian, G.D. Block, C.S. Colwell, A.V. Matveyenko, Consequences of exposure to light
at night on the pancreatic islet circadian clockand function in rats. Diabetes 62 (10)
(2013 Oct 1) 34693478.
[36] S.B. Rice, G. Nenadic, B.J. Stapley, Mining protein function from text using term-
based support vector machines. BMC Bioinformatics 6 (1) (2005 May 24) 1.
[37] A.R. Santiago, J.M. Gaspar, F.I. Baptista, A.J. Cristóvão, P.F. Santos, W. Kamphuis, A.F.
Ambrósio, Diabetes changes the levels of ionotropic glutamate receptors in the rat
retina. Mol. Vis. 15 (2009) 16201630.
[38] C.M. Stapleton, D.G. Mashek, S. Wang, C.A. Nagle, G.W. Cline, P. Thuillier, L.M.
Leesnitzer, L.O. Li, J.B. Stimmel, G.I. Shulman, R.A. Coleman, Lysophosphatidic acid
activates peroxisome proliferator activated receptor-γin CHO cells that over-ex-
press glycerol 3 -phosphate acyltransferase-1. PLoS One 6 (4) (2 011 Apr 20),
e18932.
[39] S. Van Dieren, J.W. Beulens, A.P. Kengne, L.M. Peelen, G.E. Rutten, M. Woodward, Y.T.
Van der Schouw, K.G. Moons, Prediction models for the risk of cardiovascular dis-
ease in patients with type 2 diabetes: a syste matic review. Heart 98 (5) (20 12
Mar 1) 360369.
[40] C. Von Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, B. Snel, STRING: a database
of predicted functional associations between proteins. Nucleic Acids Res. 31 (1)
(2003 Jan 1) 258261.
[41] L. Wassef, D.J. Kelly, R.E. Gilbert, Epidermal growth factor receptor inhibition atten-
uates early kidney enlargement in experimental diabetes. Kidney Int. 66 (5) (2004
Nov 1) 18051814.
[42] C. Wilson, Diabetes: pathogenesis of diabetes mellitus: does glutamate have a role?
Nat. Rev. Endocrinol. 7 (5) (2011 May 1) 248.
[43] H. Xiao, Z. Gu, G. Wang, T. Zhao, The possible mechanisms underlying the impair-
ment of HIF-1αpathway signaling in hyperglycemia and the benecial effects of
certain therapies. Int. J. Med. Sci. 10 (10) (2013 Jan 1) 14121421.
[44] E. Xu, A. Charbonneau, Y. Rolland, K. Bellmann, L. Pao, K.A.Siminovitch, B.G. Neel, N.
Beauchemin, A. Marette, Hepatocyte-specic Ptpn6 deletion protects from obesity-
linked hepatic insulin resistance. Diabetes 61 (8) (2012 Aug 1) 19491958.
[45] M.Z. Zhang, Y. Wang, P. Paueksakon, R.C. Harris, Epidermal growth factor receptor
inhibition slows progression of diabetic nephropathy in association with a decrease
in endoplasmic reticulum stress and an increase in autophagy. Diab etes 63 (6)
(2014 Jun 1) 20632072.
[46] R. Zhang, H.Y. Ou, C.T. Zhang, DEG: a database of essential genes. Nucleic Acids Res.
32 (Suppl. 1) (2004 Jan 1) D271D272.
[47] Y. Izumi, K.A. Yamada, M. Matsukawa, C.F. Zorumski, Effects of insulin on long-term
potentiation in hippocampal slices from diabetic rats. Diabetologia 46 (7) (2003)
10071012.
[48] W. Zhang, B.J. Thompson, V. Hietakangas, S.M. Cohen, MAPK/ERK signaling regulates
insulin sensitivity to control glucose metabolism in Drosophila. PLoSGenetics 7 (12)
(2011), e1002429.
37A. Kumar et al. / Genomics Data 12 (2017) 2837
... DNNs have also been trained to successfully predict diabetic retinopathy based on retinal fundus images [24][25][26][27][28][29]. Furthermore, previous studies have established positive impacts from support vector machine-recursive feature elimination (SVM-RFE) as the feature selection algorithm on improving classification accuracy [30][31][32][33][34][35][36][37], especially for DNNs [38,39]. ...
... The algorithm calculates a rank score and eliminates the lowest-ranking features. Previous studies showed significant performance improvements by employing RFE, including predicting mental states (brain activity) [31,32], Parkinson [33], skin disease [34], autism [35], Alzheimer [36], and T2D [37]. They showed that SVM-RFE achieved superior performance than several comparison methods. ...
... To our best knowledge, only Kumar et al. [37] considered RFE for diabetes prediction. Kumar et al. used SVM-RFE to identify the most discriminatory gene target for T2D. ...
Article
Full-text available
Extracting information from individual risk factors provides an effective way to identify diabetes risk and associated complications, such as retinopathy, at an early stage. Deep learning and machine learning algorithms are being utilized to extract information from individual risk factors to improve early-stage diagnosis. This study proposes a deep neural network (DNN) combined with recursive feature elimination (RFE) to provide early prediction of diabetic retinopathy (DR) based on individual risk factors. The proposed model uses RFE to remove irrelevant features and DNN to classify the diseases. A publicly available dataset was utilized to predict DR during initial stages, for the proposed and several current best-practice models. The proposed model achieved 82.033% prediction accuracy, which was a significantly better performance than the current models. Thus, important risk factors for retinopathy can be successfully extracted using RFE. In addition, to evaluate the proposed prediction model robustness and generalization, we compared it with other machine learning models and datasets (nephropathy and hypertension–diabetes). The proposed prediction model will help improve early-stage retinopathy diagnosis based on individual risk factors.
... Reference [21] proposed an SVM-RFE model by modifying SVM. It ranks the genes of the data on the basis of discriminatory power, and the genes not participating are removed. ...
Article
Full-text available
At present, the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level. Accurate prediction of diabetes patients is an important research area.Many researchers have proposed techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is a key concept in preprocessing. Thus, the features that are relevant to the disease are used for prediction. This condition improves the prediction accuracy. Selecting the right features in the whole feature set is a complicated process, and many researchers are concentrating on it to produce a predictive model with high accuracy. In this work, a wrapper-based feature selection method called recursive feature elimination is combined with ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm for overcoming the overfitting problem of data set. Overfitting is a major problem in feature selection, where the new data are unfit to the model because the training data are small. Ridge regression is mainly used to overcome the overfitting problem. The features are selected by using the proposed feature selection method, and random forest classifier is used to classify the data on the basis of the selected features. This work uses the Pima Indians Diabetes data set, and the evaluated results are compared with the existing algorithms to prove the accuracy of the proposed algorithm. The accuracy of the proposed algorithm in predicting diabetes is 100%, and its area under the curve is 97%. The proposed algorithm outperforms existing algorithms.
... They have been used T2D-associated genes, ATAC sequences, T2D variants, mitochondrial DNA (mtDNA) sequences in their studies. In addition, in other studies in the literature, deep learning-based models [12][13][14][15], pathway analysis [16], CNN models, statistical analysis, Support Vector Machine Recursive Feature Elimination (SVM-RFE) approach [17] and some machine learning methods have been used to diagnosis T2D from genomic signals [19][20][21][22][23]. Moreover, Ensemble-based methods have been used for the prediction of diabetes [24][25][26][27][28]. Table 1 lists current studies for the detection of T2D-related genes. ...
Article
In Genome-Wide Association Studies (GWAS), detection of T2D-related variants in genome sequences and accurate modeling of the complex structure of the relevant gene are of great importance for the diagnosis of diabetes. For this purpose, this paper presents a novel strong algorithm to accurately and effectively identify Type 2 Diabetes (T2D) risk variants at high-performance rates. The proposed algorithm consists of five important phases. The first stage is to collect T2D-associated DNA sequences and to digitize them by the Entropy-based technique. The second stage is to transform these digitized DNA sequences into 224 9 224 pixels size spectrum images. The third is to extract a distinctive feature set from these spectrum images using the ResNet and VGG19 architectures. The fourth is to classify the effective feature set using SVM and k-NN methods. The last stage is to evaluate the system with k-fold cross-validation. As a result of the developed algorithm, the performances of the used Convolutional Neural Network (CNN) methods, the Entropy-based technique, and the classifiers were compared in relation. As a result of the study a combination model of the proposed Entropy-based technique, ResNet and Support Vector Machine (SVM) achieved the highest accuracy rate with 99.09%. With this study, the performance of the system in the extraction of epigenetic features and prediction of T2D from spectrogram images was investigated. The results show that the system will contribute to the identification of all genes in diabetes-related tissue and studies on new drug targets.
... They have been used T2D-associated genes, ATAC sequences, T2D variants, mitochondrial DNA (mtDNA) sequences in their studies. In addition, in other studies in the literature, deep learning-based models [12][13][14][15], pathway analysis [16], CNN models, statistical analysis, Support Vector Machine Recursive Feature Elimination (SVM-RFE) approach [17] and some machine learning methods have been used to diagnosis T2D from genomic signals [19][20][21][22][23]. Moreover, Ensemble-based methods have been used for the prediction of diabetes [24][25][26][27][28]. Table 1 lists current studies for the detection of T2D-related genes. ...
Article
Full-text available
In Genome-Wide Association Studies (GWAS), detection of T2D-related variants in genome sequences and accurate modeling of the complex structure of the relevant gene are of great importance for the diagnosis of diabetes. For this purpose, this paper presents a novel strong algorithm to accurately and effectively identify Type 2 Diabetes (T2D) risk variants at high-performance rates. The proposed algorithm consists of five important phases. The first stage is to collect T2D-associated DNA sequences and to digitize them by the Entropy-based technique. The second stage is to transform these digitized DNA sequences into 224 × 224 pixels size spectrum images. The third is to extract a distinctive feature set from these spectrum images using the ResNet and VGG19 architectures. The fourth is to classify the effective feature set using SVM and k-NN methods. The last stage is to evaluate the system with k-fold cross-validation. As a result of the developed algorithm, the performances of the used Convolutional Neural Network (CNN) methods, the Entropy-based technique, and the classifiers were compared in relation. As a result of the study a combination model of the proposed Entropy-based technique, ResNet and Support Vector Machine (SVM) achieved the highest accuracy rate with 99.09%. With this study, the performance of the system in the extraction of epigenetic features and prediction of T2D from spectrogram images was investigated. The results show that the system will contribute to the identification of all genes in diabetes-related tissue and studies on new drug targets.
... Kumar et al. [29] developed SVM-based technique to classify the most discriminatory gene target for diabetes mellitus. Barkana et al. [30] carried out analysis is related to the performance of descriptive statistical features to specify retinal vessel segmentation due to diabetes mellitus problems known as diabetic retinopathy. ...
... The proposed work obtained high accuracy than others.Paper[13] identified the insulin resistance using non invasive approaches of machine learning techniques. Experimented the work with CALERIE data set with 18 parameters such as age, gender,height etc., The selected attributes of feature selection is given as input to the classification algorithms such as logistic regression, CART, SVM,LDA,KNN etc., the analysis results shows high accuracy of 97% to identify the insulin resistance while using logistic regression and SVM.Paper[14] proposes an SVMRFE model by the modification of SVM. It just rank the genes of the data based on the discriminatory power and the gene not participated are removed. ...
Preprint
Full-text available
In day today life, diabetes illness is increasing in count due to the body not able to metabolize the glucose level. The prediction of the right diabetes patients is an important research area that many researchers are proposing the techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is one of the key concept in preprocessing so that the features that are relevant to the disease will be used for prediction. This will improve the prediction accuracy. Selecting right features among the whole feature set is a complicated process and many researchers are concentrating on it to produce the predictive model with high accuracy. In this proposed work, the wrapper based feature selection method called Recursive Feature Elimination (RFE) is combined with Ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm to overcome the overfilling problem of the data set. Over fitting is the major problem in feature selection which means that the new data are not fit to the model since the training data is small. Ridge regression is mainly used to overcome the overfitting problem. Once the features are selected using the proposed feature selection method, random forest classifier is used to classify the data based on the selected features. The proposed work is experimented in PIDD data set and the evaluated results are compared with the existing algorithms to prove the accuracy effect of the proposed algorithm. From the results obtained by proposed algorithm, the accuracy of predicting the diabetes disease is high compared to other existing algorithms.
Article
It has been demonstrated that melatonin influences the developmental competence of both in vivo and in vitro matured oocytes. It modulates oocyte-specific gene expression patterns among mammalian species. Due to differences among study systems, the identification of the classifier orthologs—the homologous genes related among mammals that could universally categorize oocytes matured in environments with varied melatonin levels is still limitedly studied. To gain insight into such orthologs, cross-species transcription profiling meta-analysis of in vitro matured bovine oocytes and in vivo matured human oocytes in low and high melatonin environments was demonstrated in the current study. RNA-Seq data of bovine and human oocytes were retrieved from the Sequence Read Archive database and pre-processed. The used datasets of bovine oocytes obtained from culturing in the absence of melatonin and human oocytes from old patients were regarded as oocytes in the low melatonin environment (Low). Datasets from bovine oocytes cultured in 10–9 M melatonin and human oocytes from young patients were considered as oocytes in the high melatonin environment (High). Candidate orthologs differentially expressed between Low and High melatonin environments were selected by a linear model, and were further verified by Zero-inflated regression analysis. Support Vector Machine (SVM) was applied to determine the potentials of the verified orthologs as classifiers of melatonin environments. According to the acquired results, linear model analysis identified 284 candidate orthologs differentially expressed between Low and High melatonin environments. Among them, only 15 candidate orthologs were verified by Zero-inflated regression analysis (FDR ≤ 0.05). Utilization of the verified orthologs as classifiers in SVM resulted in the precise classification of oocyte learning datasets according to their melatonin environments (Misclassification rates < 0.18, area under curves > 0.9). In conclusion, the cross-species RNA-Seq meta-analysis to identify novel classifier orthologs of matured oocytes under different melatonin environments was successfully demonstrated in this study-delivering candidate orthologs for future studies at biological levels. Such verified orthologs might provide valuable evidence about melatonin sufficiency in target oocytes-by which, the decision on melatonin supplementation could be implied.
Article
Full-text available
Even after so much advancement in gene expression microarray technology, the main hindrance in analyzing microarray data is its limited number of samples as compared to a number of factors, which is a major impediment in revealing actual gene functionality and valuable information from the data. Analyzing gene expression data can indicate the factors which are differentially expressed in the diseased tissue. As most of these genes have no part to play in causing the disease of interest, thus, identification of disease-causing genes can reveal not just the case of the disease, but also its pathogenic mechanism. There are a lot of gene selection methods available which have the capacity to remove irrelevant genes, but most of them are not sufficient enough in removing redundancy in genes from microarray data, which increases the computational cost and decreases the classification accuracy. Combining the gene expression data with the gene ontology information can be helpful in determining the redundancy which can then be removed using the algorithm mentioned in the work. The gene list obtained after these sequential steps of the algorithm can be analyzed further to obtain the most deterministic genes responsible for type 2 diabetes. © 2015, International Association of Scientists in the Interdisciplinary Areas and Springer-Verlag Berlin Heidelberg.
Article
Full-text available
Background We have previously shown that EphrinA1/EphA expression profile changes in response to myocardial infarction (MI), exogenous EphrinA1-Fc administration following MI positively influences wound healing, and that deletion of the EphA2 Receptor (EphA2-R) exacerbates injury and remodeling. To determine whether or not ephrinA1-Fc would be of therapeutic value in the hyperglycemic infarcted heart, it is critical to evaluate how ephrinA1/EphA signaling changes in the hyperglycemic myocardium in response to MI. Methods Streptozotocin (STZ)-induced hyperglycemia in wild type (WT) and EphA2-receptor mutant (EphA2-R-M) mice was initiated by an intraperitoneal injection of STZ (150 mg/kg) 10 days before surgery. MI was induced by permanent ligation of the left anterior descending coronary artery and analyses were performed at 4 days post-MI. ANOVAs with Student-Newman Keuls multiple comparison post-hoc analysis illustrated which groups were significantly different, with significance of at least p < 0.05. Results Both WT and EphA2-R-M mice responded adversely to STZ, but only hyperglycemic EphA2-R-M mice had lower ejection fraction (EF) and fractional shortening (FS). At 4 days post-MI, we observed greater post-MI mortality in EphA2-R-M mice compared with WT and this was greater still in the EphA2-R-M hyperglycemic mice. Although infarct size was greater in hyperglycemic WT mice vs normoglycemic mice, there was no difference between hyperglycemic EphA2-R-M mice and normoglycemic EphA2-R-M mice. The hypertrophic response that normally occurs in viable myocardium remote to the infarct was noticeably absent in epicardial cardiomyocytes and cardiac dysfunction worsened in hyperglycemic EphA2-R-M hearts post-MI. The characteristic interstitial fibrotic response in the compensating myocardium remote to the infarct also did not occur in hyperglycemic EphA2-R-M mouse hearts to the same extent as that observed in the hyperglycemic WT mouse hearts. Differences in neutrophil and pan-leukocyte infiltration and serum cytokines implicate EphA2-R in modulation of injury and the differences in ephrinA1 and EphA6-R expression in governing this are discussed. Conclusions We conclude that EphA2-mutant mice are more prone to hyperglycemia-induced increased injury, decreased survival, and worsened LV remodeling due to impaired wound healing.
Article
Full-text available
Genes are also known to play a role in the occurrence of infectious diseases like tuberculosis and AIDS as well as some non-communicable diseases like cancer and diabetes. A discriminative gene can act as a target which is a molecular structure that will undergo a specific interaction with drugs because they are administered to treat or diagnose a disease. One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. The main problem of microarray data is its limited number of samples with respect to number of genes. Fisher's linear discriminant is a classification method that projects high-dimensional data onto a line and performs classification in this one-dimensional space. In this work the Fisher criteria is used for selection for discriminatory gene set.
Article
Full-text available
Diabetic nephropathy (DN) is a common and characteristic microvascular complication of diabetes; the mechanisms that cause DN have not been clarified, and the epigenetic mechanism was promised in the pathology of DN. Furthermore, ubiquitination and small ubiquitin-like modifier (SUMO) were involved in the progression of DN. MG132, as a ubiquitin proteasome, could improve renal injury by regulating several signaling pathways, such as NF- κ B, TGF- β , Nrf2-oxidative stress, and MAPK. In this review, we summarize how ubiquitination and sumoylation may contribute to the pathology of DN, which may be a potential treatment strategy of DN.
Article
Full-text available
Previous studies by us and others have reported renal epidermal growth factor receptors (EGFRs) are activated in models of diabetic nephropathy. In the present studies, we examined the effect of treatment with erlotinib, an inhibitor of EGFR tyrosine kinase activity, on progression of diabetic nephropathy in a type 1 diabetic mouse model. Inhibition of renal EGFR activation by erlotinib was confirmed by decreased phosphorylation of EGFR and extracellular signal-related kinase 1/2. Increased albumin/creatinine ratio in diabetic mice was markedly attenuated by erlotinib treatment. Erlotinib-treated animals had less histological glomerular injury as well as decreased renal expression of connective tissue growth factor and collagens I and IV. Autophagy plays an important role in the pathophysiology of diabetes mellitus, and impaired autophagy may lead to increased endoplasmic reticulum (ER) stress and subsequent tissue injury. In diabetic mice, erlotinib-treated mice had evidence of increased renal autophagy, as indicated by altered expression and activity of ATG12, beclin, p62, and LC3A II, hallmarks of autophagy, and had decreased ER stress, as indicated by decreased expression of C/EBP homologous protein, binding immunoglobulin protein, and protein kinase RNA-like ER kinase. The mammalian target of rapamycin (mTOR) pathway, a key factor in the development of diabetic nephropathy and an inhibitor of autophagy, is inhibited by AMP-activated protein kinase (AMPK) activation. Erlotinib-treated mice had activated AMPK and inhibition of the mTOR pathway, as evidenced by decreased phosphorylation of raptor and mTOR and the downstream targets S6 kinase and eukaryotic initiation factor 4B. Erlotinib also led to AMPK-dependent phosphorylation of Ulk1, an initiator of mammalian autophagy. These studies demonstrate that inhibition of EGFR with erlotinib attenuates the development of diabetic nephropathy in type 1 diabetes, which is mediated at least in part by inhibition of mTOR and activation of AMPK, with increased autophagy and inhibition of ER stress.
Article
Full-text available
Background: Smad7 is the main negative regulatory protein in the transforming growth factor-β (TGF-β) downstream signaling pathway, which plays an important role in diabetic nephropathy (DN) and may be related to the ubiquitin proteasome pathway (UPP). Aim: We investigated the role of UPP in regulating TGF-β/SMAD signaling and explored the therapeutic effect of the ubiquitin proteasome inhibitor MG132 on DN. Methods: Wistar rats were randomly divided into a diabetes group and a normal control group. Rats in the diabetes group were injected intraperitoneally with streptozotocin. Diabetic rats were then randomly divided into a diabetic nephropathy group (DN group), an MG132 high concentration (MH) group, and an MG132 low concentration (ML) group. After 8 weeks of treatment, 24-hour urinary microalbumin (UAlb), urinary protein/urinary creatinine (Up/Ucr) values, ALT, AST, Bcr, kidney damage, TGF-β, Smad7, fibronectin (FN), and Smurf2 were detected. Results: The body mass and Smad7 protein expression decreased in DN group, but kidney weight, kidney weight index, UAlb, Up/Ucr, FN and Smurf2 mRNA expression, and TGF-β protein expression increased. However, these changes diminished following treatment with MG132, and a more pronounced effect was evident in MH group compared to ML group. Conclusion: MG132 alleviates kidney damage by inhibiting Smad7 ubiquitin degradation and TGF-β activation in DN.
Article
Full-text available
Hypoxia-inducible factor 1 alpha (HIF-1α), an essential transcription factor which mediates the adaptation of cells to low oxygen tensions, is regulated precisely by hypoxia and hyperglycemia, which are major determinants of the chronic complications associated with diabetes. The process of HIF-1α stabilization by hypoxia is clear; however, the mechanisms underlying the potential deleterious effect of hyperglycemia on HIF-1α are still controversial, despite reports of a variety of studies demonstrating the existence of this phenomenon. In fact, HIF-1α and glucose can sometimes influence each other: HIF-1α induces the expression of glycolytic enzymes and glucose metabolism affects HIF-1α accumulation in some cells. Although hyperglycemia upregulates HIF-1α signaling in some specific cell types, we emphasize the inhibition of HIF-1α by high glucose in this review. With regard to the mechanisms of HIF-1α impairment, the role of methylglyoxal in impairment of HIF-1α stabilization and transactivation ability and the negative effect of reactive oxygen species (ROS) on HIF-1α are discussed. Other explanations for the inhibition of HIF-1α by high glucose exist: the increased sensitivity of HIF-1α to Von Hippel-Lindau (VHL) machinery, the role of osmolarity and proteasome activity, and the participation of several molecules. This review aims to summarize several important developments regarding these mechanisms and to discuss potentially effective therapeutic techniques (antioxidants eicosapentaenoic acid (EPA) and metallothioneins (MTs), pharmaceuticals cobalt chloride (CoCl2), dimethyloxalylglycine (DMOG), desferrioxamine (DFO) and gene transfer of constitutively active forms of HIF-1α) and their mechanisms of action for intervention in the chronic complications in diabetes.
Book
The Advances in Chemical Physics series provides the chemical physics field with a forum for critical, authoritative evaluations of advances in every area of the discipline. This special volume focuses on atoms and photos near meso- and nanobodies, an important area of nontechnology. Nanoscale particles are those between 1 and 100 nm, and they obey neither the laws of quantum physics nor of classical physics due to an extensive delocalization of the valence electrons, which can vary depending on size. This means that different physical properties can be obtained from the same atoms or molecules existing in a nanoscale particle size due entirely to differing sizes and shapes. Nanostructured materials have unique optical, magnetic, and electronic properties depending on the size and shape of the nanomaterials. A great deal of interest has surfaced in this arena as of late due to the potential technological applications.
Article
The purpose of this paper is to design and describe the valuation of Asian option by radial basis function approximation. A one state variable partial differential equation which characterizes the price of European type Asian option is discussed. The governing equation is discretized by the θ-method and the option price is approximated by radial basis function based finite difference method. Numerical experiments are performed with European option and Asian option and results are compared with theoretical and numerical results available in the literature. We show numerically that the scheme is second order accurate. Stability of the scheme is also discussed.
Article
Both obesity and type 2 diabetes mellitus are common. In a gene expression profile study, it was noted that genes concerned with carbohydrate, lipid and amino acid metabolism, and signal transduction pathways are upregulated, while genes involved in cell adhesion, cytokine-cytokine receptor interaction, insulin signaling, immune system pathways, and inflammatory pathway are differentially expressed both in obesity and type 2 diabetes.