SVMRFE based approach for prediction of most discriminatory gene
target for type II diabetes
⁎, D. Jeya Sundara Sharmila
, Sachidanand Singh
Department of Biotechnology and Health Sciences, Karunya University, Coimbatore, Tamil Nadu, India
Department of Nanosciences and Technology, Tamil Nadu Agriculture University, Coimbatore, Tamil Nadu, India
Received 29 November 2016
Received in revised form 7 February 2017
Accepted 15 February 2017
Available online 17 February 2017
Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important
source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the
new potential targets for the drugs which not only control the disease but also can treat it. Support vector ma-
chines are the classiﬁer which has a potential to make a classiﬁcation of the discriminatory genes and
non-discriminatory genes. SVMRFE a modiﬁcation of SVM ranks the genes based on their discriminatory
power and eliminate the genes which are not involved in causing the disease. A gene regulatory network
has been formed with the top ranked coding genes to identify their role in causing diabetes. To further
validate the results pathway study was performed to identify the involvement of the coding genes in
type II diabetes. The genes obtained from this study showed a signiﬁcant involvement in causing the disease,
which may be used as a potential drug target.
© 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
Type II diabetes
Support Vector Machine (SVM), a machine learning technique im-
plied in the area of time series prediction and classiﬁcation [31,36] has
widely been applied in the life science ﬁelds, especially in Bioinformat-
ics. It can handle nonlinear classiﬁcation tasks efﬁciently by mapping
the samples into a higher dimensional feature space by using a nonlin-
ear kernel function. Since the SVM approach is data-driven and model-
free, it has important discriminating power for classiﬁcation. This
characteristic of SVM is obviousin cases where the sample sizes are neg-
ligible and numerous variables are involved (high-dimensional space).
Expression proﬁle come under such a category, which contain a
large number of attributes (genes). This type of expression data is
used to predict the type and occurrence of the disease in a patient
. An important aspect while analyzing such type of expression
data is the feature selection or dimensionality reduction. Most algo-
rithms lose their potency when genes are largein number with different
time series data or dimensionality .
To accomplish the task of dimensionality reduction a modiﬁed ver-
sion of SVM known as SVMRFE (Support Vector Machine Recursive Fea-
ture Elimination) has been used in this work. SVMRFE was used to
identifythe most discriminatory targetgene in four different microarray
data samples of type II diabetes. These samples have been taken from
the Gene Expression Omnibus database (GEO)  and Diabetes
Genome Anatomy Project (DGAP) (http://www.diabetesgenome.org/).
The idea was to build a model wherein the least important features
(genes) can be eliminated at each iterative step based on the weight
assigned to each gene through SVM. The genes identiﬁed through this
approach were then classiﬁed as essential and non-essential genes.
The protein-protein interaction of these non-essential genes revealed
vital information regarding interacting proteins. Functional enrichment
about these proteins shed a light on their regulatory pathways associat-
ed with type II diabetes which can be further explored and conﬁrmed
using experimental approach.
2. Materials and methods
2.1. Collection of data sample
71 samples from Pancreatic Islet and Skeletal muscle of Homo sapiens
were collected from the GEO and DGAP. Out of these 37 samples are of
normal human beings and 34 are of diabetic humans. Table 1 shows the
detail description of each of the data sets which were undertaken for
Fisher linear discriminant was applied to all the above-mentioned
data sets to rankthem based on the Fischer score  which wascontin-
ued with a redundancy reduction step to reduce the redundant data in
the microarray dataset . The gene number present in each data set
Genomics Data 12 (2017) 28–37
E-mail address: email@example.com (A. Kumar).
2213-5960/© 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/gdata
was still high. A t-test  with a signiﬁcance level of 0.05 was applied to
the datasets to ﬁlter out the genes which are not involved in causing
type II diabetes. After this reduction step SVMRFE approach (with linear
kernel function and 6 subsets of the training data)  was applied to
train the data samples for 5 iterations. As a result, discriminatory
genes based on the weighted ranking were obtained. The identiﬁed
genes were identiﬁed as being essential and non-essential using the da-
tabase of essential genes. A geneinteraction and pathway analysis of the
potential non-essential genes was performed to identify the novel tar-
gets for type II diabetes (Fig. 1)
3. Result and discussion
3.1. t-test analysis
For each of the T2D datasets, a t-test analysis was performed with a
signiﬁcance level of 0.05. As a result, there was a high dimensionality
Microarray dataset undertaken for studies.
Source Data No. of samples No. of genes Country
GEO Effect of insulin infusion on human skeletal muscle  6 6 22,215 Sweden
DGAP Human pancreatic islets from normal and Type 2 diabetic subjects (A)  7 5 22,191 Caucasian and Asian
DGAP Human pancreatic islets from normal and Type 2 diabetic subjects (B)  7 5 22,550
DGAP Human skeletal muscle - type 2 diabetes  17 18 22,177 Sweden
Fig. 1. Flow chart of the analysis.
Number of input and output genes from each dataset for t-test analysis.
Name of dataset No of
No of genes
Effect of insulin infusion on human skeletal muscle 1223 24
Human pancreatic islets from normal and type II
diabetic subjects (A)
Human pancreatic islets from normal and type II
diabetic subjects (B)
Human skeletal muscle-type II diabetes 1238 28
p-value of genes following the alternative hypothesis for the dataset “GSE7146”.
Probe id Gene p-Value
213524_s_at G0/G1switch 2 0.00001
216599_x_at Solute carrier family 22 (organic anion transporter),
207295_at Sodium channel, non-voltage-gated 1, gamma 0.0001
218409_s_at DnaJ (Hsp40) homolog, subfamily C, member 1 0.0003
203221_at Transducin-like enhancer of split 1 (E (sp1) homolog,
210452_x_at Cytochrome P450, family 4, subfamily F, polypeptide 2 0.001
201630_s_at Acid phosphatase 1, soluble 0.001
207955_at Chemokine (C-C motif) ligand 27 0.002
208507_at Olfactory receptor, family 7, subfamily C, member 2 0.002
210889_s_at Fc fragment of IgG, low afﬁnity IIb, receptor (CD32) 0.002
207732_s_at Discs, large homolog 3 (neuroendocrine-dlg, Drosophila) 0.002
220636_at Dynein, axonemal, intermediate polypeptide 2 0.002
205863_at S100 calcium binding protein A12 0.002
205603_s_at Diaphanous homolog 2 (Drosophila) 0.003
220979_s_at ST6 (alpha-N-acetyl-neuraminy l-2, 3-beta-galactosy l-1,
3) -N-acetylgalactosaminide alpha-2, 6-sialyltransferase 5
206310_at Serine peptidase inhibitor, Kazal Type II (acrosin-trypsin
210442_at Interleukin 1 receptor-like 1 0.004
201214_s_at Protein phosphatase 1, regulatory subunit 7 0.004
220385_at Junctophilin 2 0.004
205490_x_at Gap junction protein, beta 3, 31 kDa (connexin 31) 0.004
213772_s_at Golgi-associated, gamma adaptin ear containing, ARF
binding protein 2
213950_s_at Protein phosphatase 3 (formerly 2B), catalytic subunit,
gamma isoform (calcineurin A gamma)
201681_s_at Discs, large homolog 5 (Drosophila) 0.004
220782_x_at Kallikrein-related peptidase 12 0.004
29A. Kumar et al. / Genomics Data 12 (2017) 28–37
reduction in each dataset (Table 2). The genes rejecting the nullhypoth-
esis were obtained for each of the data samples. Tables 3–6show the
corresponding p-values of all the genes which have rejected the null hy-
pothesis at signiﬁcance level of 0.05. The Figs. 2–5represent graphically
the p-value of all the genes in the four datasets under consideration. The
p-value for most of the genes was above the signiﬁcance level value of
0.05. This represents that these genes have almost the same expression
value in the normal and diseased and may not be involved in causing
3.2. Identiﬁcation of best-ranked genes from SVMRFE
The subsets of genes based on the p-value were given as an input to
the support vector machine. Recursive Feature Elimination (RFE) is an
iterative procedure for SVM classiﬁer. The recursive feature elimination
algorithm of the support vector machine assigns a weight to each gene.
The weightwas calculated based on the expression value of genes in the
disease and the normal sample for all the dataset. The algorithm classi-
ﬁed the genes (with a classiﬁcation accuracy of 83.9%) based on the de-
scending order of the weight. Then it generated the list of genes which
were found to be the most discriminatory in the normal and disease
samples (Tables 7–10). The outline for SVMRFE in the linear kernel is
Class labels (1 for normal or 0 for diseased)
s = [1, 2,…n]
Limit training samples to good genes
Train the classiﬁer
α= SVM-train (X, y)
Compute the weight from each selected gene:
αkykxkwhere k indicates the k
Compute the ranking criterion for the i
Mark the gene with the lowest ranking
g = arg min (R)
Renew the gene-ranking list
r = [s (g), r]
p-Value of genes following the alternative hypothesis for the dataset “human pancreatic
islets from normal and type II diabetic subjects (A)”.
Probe id Gene p-Value
207406_at Cytochrome P450, family 7, subfamily A, polypeptide 1 0.0003
214046_at Fucosyltransferase 9 (alpha (1,3) fucosyltransferase) 0.0004
213980_s_at C-terminal binding protein 1 0.0005
202854_at Hypoxanthine phosphoribosyltransferase 1 0.0005
215300_s_at Flavin containing monooxygenase 5 0.0007
212894_at Suppressor of var1, 3-like 1 (S. cerevisiae) 0.0012
202605_at Glucuronidase, beta 0.0017
203196_at ATP-binding cassette, sub-family C (CFTR/MRP), member 4 0.0021
205633_s_at Aminolevulinate, delta-, synthase 1 0.0022
207673_at Nephrosis 1, congenital, Finnish type (nephrin) 0.0027
209759_s_at Enoyl-CoA delta isomerase 1 0.003
208926_at Sialidase 1 (lysosomal sialidase) 0.003
205627_at Cytidine deaminase 0.004
210284_s_at TGF-beta activated kinase 1/MAP3K7 binding protein 2 0.004
213931_at Inhibitor of DNA binding 2, dominant negative
213426_s_at Caveolin 2 0.0047
221572_s_at Solute carrier family 26, member 6 0.0049
p-Value of genes following the alternative hypothesis for the dataset “human pancreatic
islets from normal and type II diabetic subjects (B)”.
Probe id Gene p-Value
227787_s_at Thyroid hormone receptor-associated protein 6 0.0001
222478_at Vacuolar protein sorting 36 (yeast) 0.0002
230329_s_at Nudix (nucleoside diphosphate linked moiety X) -type
226424_at Calcyphosine 0.0003
225491_at Solute carrier family 1 (glial high afﬁnity glutamate
transporter), member 2
225016_at Adenomatosis polyposis coli down-regulated 1 0.0005
243043_at RAD50 interactor 1 0.0008
224573_at Ribonuclease, RNase K 0.0012
228133_s_at Myosin, heavy polypeptide 11, smooth muscle 0.0013
225108_at Alkylglycerone phosphate synthase 0.0013
224865_at Male sterility domain containing 2 0.0024
231880_at Family with sequence similarity 40, member B 0.0026
241739_at 2-oxoglutarate and iron-dependent oxygenase domain
228036_s_at F-box protein 2 0.0031
223978_s_at Cardiolipin synthase 1 0.0032
244706_at Protein-L-isoaspartate (D-aspartate) O-methyltransferase
domain containing 1
237718_at Eukaryotic translation initiation factor 4E 0.0033
222999_s_at Cyclin L2 0.0038
230318_at Serpin peptidase inhibitor, clade A (alpha-1
antiproteinase, antitrypsin), member 1
222408_s_at Yippee-like 5 (Drosophila) 0.004
224954_at Serine hydroxymethyltransferase 1 (soluble) 0.0046
p-Value of genes following the alternative hypothesis for the dataset “human skeletal
muscle-type II diabetes”.
Probe id Gene p-Value
219572_at Ca++-dependent secretion activator 2 0.0002
204447_at Leucine zipper, putative tumor suppressor family
221410_x_at Protocadherin beta 3 0.0003
201764_at Transmembrane protein 106C 0.0005
201429_s_at Ribosomal protein L37a 0.0008
204761_at USP6 N-terminal like 0.001
219642_s_at Peroxisomal biogenesis factor 5-like 0.001
218592_s_at Cat eye syndrome chromosome region, candidate 5 0.001
210835_s_at C-terminal binding protein 2 0.001
216695_s_at Tankyrase, TRF1-interacting ankyrin-related ADP-ribose
208067_x_at Ubiquitously transcribed tetratricopeptide repeat
209400_at Solute carrier family 12 (potassium/chloride
transporters), member 4
201262_s_at Biglycan 0.001
203171_s_at Ribosomal RNA processing 8, methyltransferase, homolog
207131_x_at Gamma-glutamyltransferase 1 0.002
219464_at Carbonic anhydrase XIV 0.002
206345_s_at Paraoxonase 1 0.002
210907_s_at Programmed cell death 10 0.002
202641_at ADP-ribosylation factor-like 3 0.002
204969_s_at Radixin 0.003
222289_at Potassium voltage-gated channel, Shaw-related
subfamily, member 2
210318_at Retinol binding protein 3, interstitial 0.003
219301_s_at Contactin associated protein-like 2 0.004
203116_s_at Ferrochelatase 0.004
207242_s_at Glutamate receptor, ionotropic, kainate 1 0.004
214005_at Gamma-glutamyl carboxylase 0.004
215529_x_at DIP2 disco-interacting protein 2 homolog A (Drosophila) 0.004
30 A. Kumar et al. / Genomics Data 12 (2017) 28–37
Eliminate the gene with the lowest ranking
s=s(1:g−1, g + 1: length (s))
Repeat until s = 
A gene-ranking list r
3.3. Identiﬁcation of degree of essentiality and non-essentiality of genes
To identify signiﬁcant and reliable targets, the work was concentrat-
ed on non-essential genes. Essential genes were ruled out based on the
hits obtained from the Database of Essential Genes (DEG 10.9) (http://
tubic.tju.edu.cn/deg/). Essential genes sustain an organism. There-
fore, having them as a potential gene target may induce side effects of
the drugs. Hence, it is important to identify only the non-essential
genes which may be used as a potential drug target. Tables 11–14
show the non-essential genes from the microarray dataset which is
3.4. Gene interaction studies
After obtaining the non-essential genes from the top ranked coding
genes for each of the datasets, gene regulatory network was constructed
using STRING (Search Tool for the Retrieval of Interacting Genes/Pro-
teins) database . The study was mainly done to observe the interac-
tion between non-essential protein-coding genes with other proteins
which are a result of biochemical events and/or electrostatic forces
. The function and activity of a protein are often modulated by
other proteins with which it interacts.
3.4.1. Gene regulatory network of dataset “GSE7146”
In this dataset, out of the ten best coding genes obtained throughthe
SVMRFE approach, only 5 genes (ACP1, FCGR2B, SCNN1G, CCL27, and
DLG3) showed interaction with other protein coding genes (Fig. 6).
The ACP1 showed a direct interaction with EPHA2, which is reported
to increase the chance of myocardial infarction and reduce the survival
Fig. 2. p-Value corresponding to all the genes in the training set for dataset “GSE7146”.
Fig. 3. p-Value corresponding to all the genes in the training set for dataset “human pancreatic islets from normal and type II diabetic subjects (A)”.
31A. Kumar et al. / Genomics Data 12 (2017) 28–37
rate of hyperglycemic mice . LYN showed indirect interaction with
ACP1 via EPHA2 and direct interaction with FCGR2B. Its kinase activa-
tion modulation has been reported to be a novel insulin receptor-poten-
tiating agent. This potentiating agent produces a rapid-onset and a
durable blood glucose-lowering activity in diabetic animals .
FCGR2B also showed direct interaction with PTPN6 which is been re-
ported to negatively regulate insulin action on glucose homeostasis in
the liver and muscle . An analysis of DLG3 has shown its direct inter-
action with GRIN2A and GRIN2B. Both these genes have been reported
to play a potential role in diabetes [11,37,42]. UBC has been reported
to play a major role in the diabetes pathway [8,16,26] and its direct in-
teraction with SCNN1G shows that SCNN1G may also play a role in dia-
betes pathway. CCL27 interacts with CCL25, a protein whose expression
was shown to decrease signiﬁcantly in diabetes .
3.4.2. Gene regulatory network of dataset “human pancreatic islets from
normal and type II diabetic subjects (A)”
Except for ABCC4 and FMO5, all theother four proteins showed a sig-
niﬁcant and strong interaction with other neighboring proteins (Fig. 7).
Purine Nucleoside Phosphorylase (PNP) and Nucleoside Phosphate Ki-
nase (NPK) have reportedly played a major role in diabetes either by
positive or negative metabolic regulation . These two molecules
also showed interaction with the HPRT1 and the CDA. Caveolin has al-
ready been reported to mediate insulin signaling thereby affecting the
glucose uptake . In the other subgroup network FUT3 has three direct
neighbors: FUT1, FUT2, and B4GALT1 of which the B4GALT1 expression
level has been shown to be affected by hyperglycemia .
3.4.3. Gene regulatory network of the dataset “human pancreatic islets from
normal and type II diabetic subjects (B)”
Both the protein coding genes in this dataset (RNASEK andAPCDD1)
have shown a signiﬁcantinteraction with the neighboring proteins(Fig.
8). The involvement of RNASEK in diabetes is still an unanswered ques-
tion, but APCDD1 interaction with its neighbors shows that it may be in-
volved in the pathophysiology of diabetes. LPAR6 (Lysophosphatidic
Acid Receptor 6) interacting directly withAPCCD1 has shown its activity
with PPARγwhich is a potential target for diabetes . Aranda et al., in
2012 also showed that the DM/HG (Diabetes mellitus/High Glucose)
Fig. 4. p-Value corresponding to all the genes in the training set for dataset “human pancreatic islets from normal and type II diabetic subjects (B)”.
Fig. 5. p-Value corresponding to all the genes in the training set for dataset “human skeletal muscle-type II diabetes”.
32 A. Kumar et al. / Genomics Data 12 (2017) 28–37
reprograms signaling pathways in RECs (Retinal Endothelial Cells) to in-
duce a state of LPA (Lysophosphatidic Acid) resistance. In the year 2000,
Figueroa et al.  showed that alterations in LRP5 expression may be
responsible for diabetes susceptibility. Therefore it may be a potential
target for therapeutic intervention. It has been reported that Wnt/
LRP5 (lipoprotein receptor-related protein 5) signaling contributes to
the glucose-induced insulin secretion in the islets .
3.4.4. Gene regulatory network of dataset “human skeletal muscle-type II
The two prominent protein coding genes (USP6NL and ProSAPiP1)
as per SVMRFE analysis showed interaction with a different set of
genes (Fig. 9). This selective network of ProSAPiP1 has not been report-
ed till now, for diabetes. The three genes (SOS1, EGFR, and EGF) in the
interaction network of USP6NL have shown its signiﬁcance in connec-
tion with diabetes. SOS1 has shown its association with reference to
the insulin action , in differential expression of EGFR which is a
major impact on diabetes and associated diseases [1,5,27,28,41,45].
Kasayama et al.  long back in 1989 reported that EGF deﬁciency oc-
curs in diabetes mellitus hence insulin may be important in maintaining
the normal level of EGF in the submandibular gland and plasma.
3.5. Functional enrichment of signiﬁcant genes implying pathway analysis
To further validate the involvement of the identiﬁed genes in type II
diabetes, pathway enrichment was considered. This was solely meant
for all the interacting proteins with the identiﬁed signiﬁcant protein(s).
The study was carried out using Biointerpreter, a web-based biological
interpretation tool for Microarray data analysis (Genotypic Technology
Pvt. Ltd., Bangalore, India). The pathway analysis showed that some of
the interacting proteins were involvedin pathways which were directly
or indirectly associated with type II diabetes.
3.5.1. Pathway enrichment for the interacting proteins of the dataset “effect
of insulin infusion on human skeletal muscle”
GRIN2A (Glutamate [NMDA] receptor subunit epsilon-1) and
GRIN2B (Glutamate [NMDA] receptor subunit epsilon-2), the two
Best ranked genes for dataset “GSE7146”.
Transducin-like enhancer of split 1 (E (sp1) homolog, Drosophila)
Acid phosphatase 1, soluble
DnaJ (Hsp40) homolog, subfamily C, member 1
Golgi-associated, gamma adaptin ear containing, ARF binding protein 2
Protein phosphatase 1, regulatory subunit 7
Interleukin 1 receptor-like 1
Discs, large homolog 5 (Drosophila)
Cytochrome P450, family 4, subfamily F, polypeptide 2
Protein phosphatase 3 (formerly 2B), catalytic subunit, gamma
isoform (calcineurin A gamma)
Gap junction protein, beta 3, 31 kDa (connexin 31)
Diaphanous homolog 2 (Drosophila)
Olfactory receptor, family 7, subfamily C, member 2
Solute carrier family 22 (organic anion transporter), member 6
Serine peptidase inhibitor, Kazal Type II (acrosin-trypsin inhibitor)
Chemokine (C-C motif) ligand 27
Dynein, axonemal, intermediate chain 2
Kallikrein-related peptidase 12
S100 calcium binding protein A12
Discs, large homolog 3 (neuroendocrine-dlg, Drosophila)
Sodium channel, non-voltage-gated 1, gamma subunit
ST6 (alpha-N-acetyl-neuraminyl-2, 3-beta-galactosyl-1, 3)
-N- acetylgalactosaminide alpha-2, 6-sialyltransferase 5
Fc fragment of IgG, low afﬁnity IIb, receptor (CD32)
Best ranked genes for dataset “human pancreatic islets from normal and type
II diabetic subjects (A)”.
Enoyl-CoA delta isomerase 1
C-terminal binding protein 1
Inhibitor of DNA binding 2, dominant negative helix-loop-helix protein
Hypoxanthine phosphoribosyltransferase 1
Sialidase 1 (lysosomal sialidase)
ATP-binding cassette, sub-family C (CFTR/MRP), member 4
Aminolevulinate, delta-, synthase 1
Suppressor of var1, 3-like 1 (S. cerevisiae)
Flavin-containing monooxygenase 5
Solute carrier family 26, member 6
TGF-beta activated kinase 1/MAP3K7 binding protein 2
Nephrosis 1, congenital, Finnish type (nephrin)
Fucosyltransferase 9 (alpha (1,3) fucosyltransferase)
Cytochrome P450, family 7, subfamily A, polypeptide 1
Best ranked genes for dataset “human pancreatic is-
lets from normal and type II diabetic subjects (B)”.
Adenomatosis polyposis coli down-regulated 1
Ribonuclease, RNase K
Best ranked genes for dataset “human skeletal muscle-type II diabetes”.
Protocadherin beta 3
Leucine zipper, putative tumor suppressor family member 3
USP6 N-terminal like
Ubiquitously transcribed tetratricopeptide repeat containing, Y-linked
Non-essential genes for dataset “GSE7146”.
Gene symbol Gene name
G0S2 G0/G1switch 2
ACP1 Acid phosphatase 1, soluble
CCL27 Chemokine (C-C motif) ligand 27
JPH2 Junctophilin 2
KLK12 Kallikrein-related peptidase 12
S100A12 S100 calcium binding protein A12
DLG3 Discs, large homolog 3 (neuroendocrine-dlg, Drosophila)
SCNN1G Sodium channel, non-voltage-gated 1, gamma subunit
ST6GALNAC5 ST6 (alpha-N-acetyl-neuraminyl-2, 3-beta-galactosyl-1, 3)
-N-acetylgalactosaminide alpha-2, 6-sialyltransferase 5
FCGR2B Fc fragment of IgG, low-afﬁnity IIb, receptor (CD32)
Non-essential genes for dataset “human pancreatic islets fromnormal and type II diabetic
Gene symbol Gene name
HPRT1 Hypoxanthine phosphoribosyltransferase 1
ABCC4 ATP-binding cassette, sub-family C (CFTR/MRP), member 4
FMO5 Flavin-containing monooxygenase 5
CAV2 Caveolin 2
FUT3 Fucosyltransferase 9 (alpha (1, 3) fucosyltransferase)
CDA Cytidine deaminase
33A. Kumar et al. / Genomics Data 12 (2017) 28–37
proteins interacting mainly with the identiﬁed protein DLG3 have been
shown to be involved in 3 different pathways viz. Neuroactive ligand-
receptor interaction, circadian entrainment and Long-term potentiation
(Fig. 10). The proteins present in the Neuroactive ligand-receptor inter-
action have shown a signiﬁcant role in the pathobiology of obesity and
type II diabetes . The second pathway, circadian entrainment is
the biological process that displays an endogenous oscillation of about
24 h. Studies show that exposure to light at night lowers glucose-stim-
ulated insulin secretion due to a decrease in insulin secretory pulse
mass. Potential mechanisms have been identiﬁed by which distur-
bances in the circadian rhythms due to modern lifestyle can lead to
islet failure in the type II diabetes . It has also been reported that
the impaired energy utilization from insulin deﬁciency impairs a long-
term potentiation in diabetes .
Non-essential genes fordataset “human pancreatic islets from normal and typeII diabetic
Gene symbol Gene name
APCDD1 Adenomatosis polyposis coli down-regulated 1
RNASEK Ribonuclease, RNase K
Non-essential genes for dataset “human skeletal muscle-type II diabetes”.
Gene symbol Gene name
USP6NL Leucine zipper, putative tumor suppressor family member 3
PROSAPIP1 USP6 N-terminal like
Fig. 6. Gene regulatory network of dataset “GSE7146”.
Fig. 7. Gene regulatory network of dataset “human pancreatic islets from normal and type II diabetic subjects (A)”.
34 A. Kumar et al. / Genomics Data 12 (2017) 28–37
3.5.2. Pathway enrichment for the interacting proteins of the dataset “hu-
man pancreatic islets from normal and type II diabetic subjects (A)”
The protein B4GALT1, interacting with the identiﬁed protein FUT3 is
involved in several metabolic pathways, connected to type II diabetes
(Fig. 11). The protein B4GALT1 participates both in glycoconjugate
and lactose biosynthesis. It has shown to be a biomarker in hepatocellu-
lar carcinoma, mainly caused due to the insulin resistance syndrome. Fi-
nally, the ailment manifests as obesity and later as diabetes .
3.5.3. Pathway enrichment for the interacting proteins of the dataset “hu-
man pancreatic islets from normal and type II diabetic subjects (B)”
The protein PNPT1 interacting with the RNASEK is reported to be in-
volved in pyrimidine and purine metabolism and the RNA degradation
(Fig. 12). Effects of the insulin regulation of purine and pyrimidine
metabolism had shown to cause some late complications of the diabetic
disease . In 2009, Kocic et al.  reported that an impaired dsRNA
metabolism may lead to increased levels of different sized RNAs in type
II diabetic patients and may have an inﬂuence on further ineffective re-
sponse against the different pathogens.
3.5.4. Pathway enrichment for the interacting proteins of dataset “human
skeletal muscle-type II diabetes”
EGFR protein interacting with the identiﬁed protein USP6NL has al-
ready been reported by many researchers to be involved in diabetes [1,
5,27,28,41,45]. With the pathway studies, it was identiﬁed that the main
pathways in which EGFR is involved, is also leading directly to or indi-
rectly to diabetes (Fig. 13). Hypoxia-inducible factor 1 alpha (HIF-1α)
is regulated precisely by hypoxia and hyperglycemia. It had also been
Fig. 8. Gene regulatory network of dataset “human pancreatic islets from normal and type II diabetic subjects (B)”.
Fig. 9. Gene regulatory network of dataset “human skeletal muscle-type II diabetes”.
35A. Kumar et al. / Genomics Data 12 (2017) 28–37
shown that the HIF-1αand glucose can sometimes inﬂuence each other
. It has been reported that the components of the MAPK/ERK path-
way act as modiﬁers of the cellular insulin responsiveness. The insulin
resistance was due to downregulation of the insulin-like receptor gene
expression following persistent MAPK/ERK inhibition. The mechanism
permits physiological adjustment of insulin sensitivity and the subse-
quent maintenance of the circulating glucose at appropriate levels
. MAPK and GnRh-Glp-1 pathways in the ileum have also been re-
ported to be involved in the improvement of the blood glucose level
Analysis of type II diabetes expression data from two different tissue
samples i.e. skeletal muscle and pancreatic islet has given a deep insight
into genes which may be possibly involved in the pathophysiology of
the disease. The most discriminatory genes obtained in each dataset
after complete analysis, have been found to be associated with diabetes
either directly or indirectly. However, themajority of the geneshave not
been previously reported in association with diabetes. The genes identi-
ﬁed in the current study viz. FCGR2B,DLG3,SCNN1G,FUT3,HPRT1,
APCDD1,USP6NL,ProSAPiP1 and RNASEK may act as a potential drug tar-
get. The signiﬁcant pathways identiﬁed through the overall approach
were Neuroactive ligand-receptor interaction, circadian entrainment,
Long-term potentiation, pyrimidine and purine metabolism, dsRNA me-
tabolism, MAPK/ERK pathway, and GnRh-Glp-1. This study gave the in-
sight to focus on these associated pathways with the above-reported
proteins to study in pathway models or mouse model to elucidate
them as drug targets or markers for type II diabetes.
Conﬂict of interest
The authors declare that there is no conﬂict of interest in the present
Appendix A. Supplementary data
Supplementary data to this article can be found online at http://dx.
 A. Advani, K.J. Wiggins, A.J. Cox, Y.Zhang, R.E. Gilbert,D.J. Kelly, Inhibition of the epi-
dermal growth factor receptor preserves podocytes and attenuates albuminuria in
experimental diabetic nephropathy. Nephrology 16 (6) (2011 Aug 1) 573–581.
 J. Aranda, R. Motiejunaite, E. Im, A. Kazlauskas, Diabetesdisrupts the response of ret-
inal endothelial cells to the angiomodulator lysophosphatidic acid. Diabetes 61 (5)
(2012 May 1) 1225–1233.
 P. Baldi, A.D. Long, A Bayesian framework for the analysis of microarray expression
data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17
(6) (2001 Jun 1) 509–519.
Fig. 10. Involvement of GRIN2A and GRIN2B in different pathways.
Fig. 11. Involvement of B4GALT1 in different pathways
Fig. 12. Involvement of PNPT1 in different pathways.
Fig. 13. Involvement of EGFR in different pathways.
36 A. Kumar et al. / Genomics Data 12 (2017) 28–37
 I. Barroso, J.A. Luan , R.P. Middelberg, A.H. Harding, P.W . Franks, R.W. Jakes, D.
Clayton, A.J. Schafer, S. O'Rahilly, N.J. Wareham, Candidate gene association study
in type 2 diabetes indicates a rolefor genes involved in β-cell function as well as in-
sulin action. PLoS Biol. 1 (1) (2003 Oct 13), e20.
 S. Belmadani, D.I. Palen, R.A. Gonzalez-Villalobos, H.A. Boulares, K. Matrougui, Ele-
vated epidermal growth factor receptor phosphorylation induces resistance artery
dysfunction in diabetic db/db mice. Diabetes 57 (2008) 1629–1637.
 A.W. Cohen, T.P. Combs, P.E. Scherer, M.P. Lisanti, Role of caveolin and caveolae in
insulin signaling and diabetes. Am. J. Physiol. Endocrinol. Metab. 285 (6) (2003
Dec 1) E1151–E1160.
 F. Cordero, M. Botta,R.A. Calogero, Microarray data analysis and mining approaches.
Brief. Funct. Genomic. Proteomic. 6 (4) (2007 Dec 1) 265–281.
 S. Costes, C.J. Huang, T. Gurlo, M. Daval, A.V. Matveyenko, R.A. Rizza, A.E. Butler, P.C.
Butler, β-cell dysfunctional ERAD/ubiquitin/proteasome system in type 2 diabetes
mediated by islet amyloid polypeptide–induced UCH-L1 deﬁciency. Diabetes 60
(1) (2011 Jan 1) 227–238.
 P.A. Craven, R.K. Studer, F.R. DeRubertis, Impaired nitricoxide-dependentcyclic gua-
nosine monophosphate generation in glomeruli from diabetic rats. Evidence for pro-
tein kinase C-mediated suppression of the cholinergic response. J. Clin. Investig. 93
(1) (1994 Jan) 311.
 U.N. Das, A.A. Rao, Gene expression proﬁle in obesity and type 2 diabetes mellitus.
Lipids Health Dis. 6 (1) (2007 Dec 14) 1.
 A.M. Davalli, C. Perego, F.B. Folli, The potential role of glutamate in the current dia-
betes epidemic. Acta Diabetol. 49 (3) (2012 Jun 1) 167–183.
 A. DuSablon, S. Kent, A. Coburn, J. Virag, EphA2-receptor deﬁciency exacerbates
myocardial inf arction and reduces survival in hyperglycem ic mice. Cardiovasc.
Diabetol. 13 (1) (2014 Aug 13) 1.
 R. Edgar,M. Domrachev, A.E. Lash, Gene Expression Omnibus: NCBI geneexpression
and hybridization array data repository. Nucleic Acids Res. 30 (1) (200 2 Jan 1)
 D.J. Figueroa, J.F. Hess, B. Ky, S.D. Brown, V. Sandig, A. Hermanowski-Vosatka, R.C.
Twells, J.A. Todd, C.P. Austin, E xpression of the type I diabetes-associated gene
LRP5 in macrophages, vitamin A system cells, and the Islets of Langerhans suggests
multiple potential roles in diabetes. J. Histochem. Cytochem. 48 (10) (2000 Oct 1)
 T. Fujino, H. Asaba,M.J. Kang, Y. Ikeda, H.Sone, S. Takada, D.H. Kim, R.X. Ioka,M. Ono,
H. Tomoyori, M. Okubo, Low-density lipoprotein receptor-related protein 5 (LRP5)
is essential for normal cholesterol metabolism and glucose-induced insulin secre-
tion. Proc. Natl. Acad. Sci. 100 (1) (2003 Jan 7) 229–234.
 C. Gao, W. Huang, K. Kanasaki, Y. Xu, The role of ubiquitination and sumoylation in
diabetic nephropathy. Biomed. Res. Int. (2014 Jun 4) 2014.
 R. Guerrero-Preston, M. Kim, A. Blanco, C. LeBron, R. Santella, M. Berdasco, M. Fraga,
M. Esteller,D. Sidransky, B4GALT1 as a potential epigenetic marker of metabolic dis-
ruptions associated with Non-Alcoholic Fatty Liver Disease. Cancer Res. 68 (9 Sup-
plement) (2008 May 1) (3827-).
 J.E. Gunton, R.N. Kulkarni, S. Yim, T. Okada, W.J. Hawthorne, Y.H. Tseng, R.S.
Roberson, C.Ricordi, P.J. O'Connell, F.J. Gonzalez, C.R. Kahn, Loss of ARNT/HIF1βme-
diates altered gene expression and pancreatic-islet dysfunction in human type 2 di-
abetes. Cell 122 (3) (2005 Aug 12) 337–349.
 S. Kasayama, Y. Ohba, T. Oka,Epidermal growth factor deﬁciencyassociated with di-
abetes mellitus. Proc. Natl. Acad. Sci. 86 (19) (1989 Oct 1) 7644–7648.
 G.M. Kocic, R. Kocic, R. Pavlovic, T. Jevtovic-Stoimenov, D. Sokolovic, G. Nikolic, V.
Pavlovic, S. Stojanovic, J. Basic, A. Veljkovic, D. Pavlovic, Possible impact of impaired
double-stranded RNA degradation and nitrosative stress on immuno-inﬂammatory
cascade in type 2 diabetes. Exp. Clin. Endo crinol. Diabetes 117 (09) (2009 Oct)
 A. Kumar, D.J. Sharmila, R. Kant, Selection of discriminatory gene set for type II dia-
betes using ﬁsher linear discriminant. Int J Adv Comput Mathe Sci. 5 (2) (2014)
 A. Kumar, D.J. Sharmila, Algorithmic approach for removing the redundancy in dia-
betic gene categories based on semantic similarity and gene expression data. Inter-
disciplinary Sciences: Computational Life Sciences 17 (2015 Mar) 1–7.
 J. Li, A. Mahajan, M.D. Tsai, Ankyrin repeat: a unique motif mediating protein-pro-
tein interactions. Biochemistry 45 (51) (2006 Dec 26) 15168–15178.
 Y. Liang, F. Zhang, J. Wang, T. Joshi, Y. Wang, D. Xu, Prediction of drought-resistant
genes in Arabidopsis thaliana using SVM-RFE. PLoS One 6 (7) (2011 Jul 15), e21750.
 K. Liu, H.Y. Liu, W. Ye, J.H. Jiang, X. Xu, The initial investigation of the expression of
glycosyltransferases in the retina of streptomycin diabetic rats. [Zhonghua Yan Ke
Za Zhi] Chinese Journal of Ophthalmology 46 (7) (2010 Jul) 580–584.
 M.D. López-Avalos, V.F. Duvivier-Kali, G. Xu, S. Bonner-Weir, A. Sharma, G.C. Weir,
Evidence for a role of the ubiquitin-proteasome pathway in pancreatic islets. Diabe-
tes 55 (5) (2006 May 1) 1223–1231.
 K. Matrougui, Diabetes and microvascular pathophysiology: role of epidermal
growth factor receptor tyrosine kinase. Diabetes Metab. Res. Rev. 26 (1) (2010 Jan
 P.J. Miettinen, J.Ustinov, P. Ormio, R. Gao, J. Palgi, E. Hakonen, L. Juntti-Berggren, P.O.
Berggren, T. Otonkoski, Downregulation of EGF receptor signaling in pancreatic is-
lets causes diabetes due to impaired po stnatal β-cell growth. Diabetes 55 (12)
(2006 Dec 1) 3299–3308.
 V.K. Mootha, C.M. Lindgren, K.F. Eriksson, A. Subramanian, S. Sihag, J. Le har, P.
Puigserver, E. Carlsson, M. Ridderstråle, E. Laurila, N. Houstis, PGC-1α-responsive
genes involved in oxidative phosphorylation are coordinately downregulated in
human diabetes. Nat. Genet. 34 (3) (2003 Jul 1) 267–273.
 P.R. Nagib, J. Gameiro, L.G. Sti vanin-Silva, M.S. de Arruda, D .M. Villa-Verde, W.
Savino, L. Verinaud, Thymic microenvironmental alterations in experimentally in-
duced diabetes. Immunobiology 215 (12) (2010 Dec 31) 971–979.
 K.L. Ng, S.K. Mishra, De novo SVM classiﬁcation of precursor microRNAs from geno-
mic pseudo hairpins using global and intrinsic folding measures. Bioinformatics 23
(11) (2007 Jun 1) 1321–1330.
 A.R. Ochman, C.A. Lipinski, J.A. Handler, A.G. Reaume, M.S. Saporito, The Lyn kinase
activator MLR-1023 is a novel insulin receptor potentiator that elicits a rapid-onset
and durable improvement in glucose homeostasis in animal models of type 2 diabe-
tes. J. Pharmacol. Exp. Ther. 342 (1) (2012 Jul 1) 23–32.
 H. Parikh, E. Carlsson, W.A. Chutkow, L.E. Johansson, H. Storgaard, P. Poulsen, R.
Saxena, C. Ladd, P.C. Schulze, M.J. Mazzini, C.B. Jensen, TXNIP regulates peripheral
glucose metabolism in humans. PLoS Med. 4 (5) (2007 May 1), e158.
 K. Pillwein, M.A. Reardon, H.N. Jayaram, Y. Natsumeda, W.L. Elliott, M.A. Faderan, N.
Prajda, W. Sperl, G. Weber, Insulin regulatory effects on purine-and pyrimidine me-
tabolism in alloxan diabetic rat liver. Padiatr. Padol. 23 (2) (1987 Dec) 135–144.
 J. Qian, G.D. Block, C.S. Colwell, A.V. Matveyenko, Consequences of exposure to light
at night on the pancreatic islet circadian clockand function in rats. Diabetes 62 (10)
(2013 Oct 1) 3469–3478.
 S.B. Rice, G. Nenadic, B.J. Stapley, Mining protein function from text using term-
based support vector machines. BMC Bioinformatics 6 (1) (2005 May 24) 1.
 A.R. Santiago, J.M. Gaspar, F.I. Baptista, A.J. Cristóvão, P.F. Santos, W. Kamphuis, A.F.
Ambrósio, Diabetes changes the levels of ionotropic glutamate receptors in the rat
retina. Mol. Vis. 15 (2009) 1620–1630.
 C.M. Stapleton, D.G. Mashek, S. Wang, C.A. Nagle, G.W. Cline, P. Thuillier, L.M.
Leesnitzer, L.O. Li, J.B. Stimmel, G.I. Shulman, R.A. Coleman, Lysophosphatidic acid
activates peroxisome proliferator activated receptor-γin CHO cells that over-ex-
press glycerol 3 -phosphate acyltransferase-1. PLoS One 6 (4) (2 011 Apr 20),
 S. Van Dieren, J.W. Beulens, A.P. Kengne, L.M. Peelen, G.E. Rutten, M. Woodward, Y.T.
Van der Schouw, K.G. Moons, Prediction models for the risk of cardiovascular dis-
ease in patients with type 2 diabetes: a syste matic review. Heart 98 (5) (20 12
Mar 1) 360–369.
 C. Von Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, B. Snel, STRING: a database
of predicted functional associations between proteins. Nucleic Acids Res. 31 (1)
(2003 Jan 1) 258–261.
 L. Wassef, D.J. Kelly, R.E. Gilbert, Epidermal growth factor receptor inhibition atten-
uates early kidney enlargement in experimental diabetes. Kidney Int. 66 (5) (2004
Nov 1) 1805–1814.
 C. Wilson, Diabetes: pathogenesis of diabetes mellitus: does glutamate have a role?
Nat. Rev. Endocrinol. 7 (5) (2011 May 1) 248.
 H. Xiao, Z. Gu, G. Wang, T. Zhao, The possible mechanisms underlying the impair-
ment of HIF-1αpathway signaling in hyperglycemia and the beneﬁcial effects of
certain therapies. Int. J. Med. Sci. 10 (10) (2013 Jan 1) 1412–1421.
 E. Xu, A. Charbonneau, Y. Rolland, K. Bellmann, L. Pao, K.A.Siminovitch, B.G. Neel, N.
Beauchemin, A. Marette, Hepatocyte-speciﬁc Ptpn6 deletion protects from obesity-
linked hepatic insulin resistance. Diabetes 61 (8) (2012 Aug 1) 1949–1958.
 M.Z. Zhang, Y. Wang, P. Paueksakon, R.C. Harris, Epidermal growth factor receptor
inhibition slows progression of diabetic nephropathy in association with a decrease
in endoplasmic reticulum stress and an increase in autophagy. Diab etes 63 (6)
(2014 Jun 1) 2063–2072.
 R. Zhang, H.Y. Ou, C.T. Zhang, DEG: a database of essential genes. Nucleic Acids Res.
32 (Suppl. 1) (2004 Jan 1) D271–D272.
 Y. Izumi, K.A. Yamada, M. Matsukawa, C.F. Zorumski, Effects of insulin on long-term
potentiation in hippocampal slices from diabetic rats. Diabetologia 46 (7) (2003)
 W. Zhang, B.J. Thompson, V. Hietakangas, S.M. Cohen, MAPK/ERK signaling regulates
insulin sensitivity to control glucose metabolism in Drosophila. PLoSGenetics 7 (12)
37A. Kumar et al. / Genomics Data 12 (2017) 28–37