ArticlePDF Available

A Neural Network Model to Classify Liver Cancer Patients Using Data Expansion and Compression

Authors:

Abstract and Figures

We develop a neural network model to classify liver cancer patients into high-risk and low-risk groups using genomic data. Our approach provides a novel technique to classify big data sets using neural network models. We preprocess the data before training the neural network models. We first expand the data using wavelet analysis. We then compress the wavelet coefficients by mapping them onto a new scaled orthonormal coordinate system. Then the data is used to train a neural network model that enables us to classify cancer patients into two different classes of high-risk and low-risk patients. We use the leave-one-out approach to build a neural network model. This neural network model enables us to classify a patient using genomic data as a high-risk or low-risk patient without any information about the survival time of the patient. The results from genomic data analysis are compared with survival time analysis. It is shown that the expansion and compression of data using wavelet analysis and singular value decomposition (SVD) is essential to train the neural network model.
Content may be subject to copyright.
A Neural Network Model to Classify Liver Cancer Patients Using Data
Expansion and Compression
Ashkan Zeinalzadeh1, Tom Wenska, Gordon Okimoto
Abstract We develop a neural network model to classify
liver cancer patients into high-risk and low-risk groups using
genomic data. Our approach provides a novel technique to
classify big data sets using neural network models. We prepro-
cess the data before training the neural network models. We
first expand the data using wavelet analysis. We then compress
the wavelet coefficients by mapping them onto a new scaled
orthonormal coordinate system. Then the data is used to train a
neural network model that enables us to classify cancer patients
into two different classes of high-risk and low-risk patients. We
use the leave-one-out approach to build a neural network model.
This neural network model enables us to classify a patient using
genomic data as a high-risk or low-risk patient without any
information about the survival time of the patient. The results
from genomic data analysis are compared with survival time
analysis. It is shown that the expansion and compression of
data using wavelet analysis and singular value decomposition
(SVD) is essential to train the neural network model.
I. INTRODUCTION
The goal of this study is to build a neural network model
to classify patients into high-risk and low-risk patients based
on genomic data. To build this model we use the genomic
data of 390 patients. This model enables us to determine
the risk status for a new patient without any knowledge
about the patient’s survival time, although the results of
classification using the neural network must be comparable
with the classification result from survival-time analysis.
The liver cancer data are big data sets. Neural network
models are not computationally effective for big data sets.
The first challenge to apply the neural network to the
genomic data is the size of the data. The genomic data
include over 20000 genes. We define genes as the parameters
of a desired signal. The high number of parameters increases
the calculation complexity for the classification of the data in
the neural networks [1]-[4]. The authors have developed an
algorithm to reduce the number of genes (parameters) to less
than 40 [5]-[7]. We consider the data as a matrix, in which
rows are the genes and columns are samples (patients).
Based on the survival-time analysis, less than 17 percent of
patients are low-risk and the rest are high-risk. This type of
data is called imbalanced data in the literature [2], in which
one or a few subsets of the clustered data has significantly
smaller size in comparison to the rest of subsets in the
clustered data. There are different techniques to analyze the
imbalanced data, e.g. regenerating the data by resampling [8].
The techniques for resampling from data are data dependent.
Authors in [9], develop a method to resample from data based
1Ashkan Zeinalzadeh was with the Cancer Research Center, University
of Hawaii at Manoa, HI, USA azeinalz@nd.edu
on the statistics of the data. We do not regenerate the data
using resampling. Building a neural network model to cluster
imbalanced data is difficult, because the subsets of data with
small sizes are ignored in the neural model. Despite the fact
that the low-risk patients are a small subset of the patients,
it is important not to recognize a low-risk patient as a high-
risk patient because of the consequences of the subsequent
treatment.
The data is composed of two different sets of censored data
and uncensored data. The censored data corresponds to the
patients for which survival time is unknown (live patients)
and uncensored data corresponds to those for which survival
time is known (deceased patients).
We compute the one-dimensional continuous wavelet coef-
ficients for each patient separately. We perform the wavelet
analysis using the Mexican hat wavelet. We vectorize the
output of the wavelet and construct a new matrix by replacing
each column of the original data matrix with the vectorized
form of the wavelet coefficients. We map the data onto
orthonormal bases, which are the left singular vectors of the
new data matrix. The new data matrix is used to train a neural
network model. The parameters of the neural network models
are trained based on an iterative method. These parameters
are optimized to provide classification results comparable
with those from survival time analysis. It is shown that the
expansion and compression of data leads to a better neural
network model to classify the liver cancer patients.
The rest of the paper is structured as follows. In Section II-
A, the survival time analysis is used to classify the patients.
In Section II-B, the wavelet analysis and singular value de-
composition is applied to expand the data to time-frequency
waveforms and compress them respectively. In Section II-
C, a neural network model is trained to classify the patients.
We describe how the parameters of the neural network model
are optimized. In Section II-D, we show the numerical results
obtained through simulation evaluations. Finally, concluding
remarks are provided in Section III.
II. METHODS
A. Survival-Time Analysis
Patients with survival-time of more than 5years are
called low-risk patients. The deceased patients (uncensored
patients) with survival time less than 5 years are called high-
risk patients. The risk status of a live patient (censored
patient) with survival time, ST , of less than 5years is
determined based on the Kaplan-Meier estimator, [10]. The
advantage of the Kaplan-Meier estimator is to take into
account the censored data. From 390 patients, 173 patients
arXiv:1611.07588v2 [stat.ML] 25 Nov 2016
(44%) are censored and 217 patients (56%) are uncensored.
Let ST be a random variable that denotes the survival time,
Pbe the Kaplan-Meier cumulative distribution function for
ST . For a given tless than 5, the conditional cumulative
distribution function (CDF) is defined as follows
P(ST 5|S T t) = 1P(ST 5)
1P(ST t).(1)
If the conditional CDF P(ST 5|S T t)is equal to
or bigger than 0.75, then the patient is considered low-
risk, otherwise the patient is considered high-risk. Out of
390 patients, 67 patients (17%) are low-risk and 323 (83%)
are high-risk. It is observed that a small number of patients
are low risk patients. This can complicate the algorithm for
training the neural network model to classify the patients. In
figure 1, the Kaplan-Meier CDF for 390 patients is plotted.
The vertical axis is the probability of survival time and the
horizontal axis is the survival time in number of days. The
red point in the figure 1 represents the five year survival time.
It is observed that P(ST 5) = 0.71.
Fig. 1. The Kaplan-Meier CDF for the survival time of 390 patients.
B. Eigen Wavelet Features
Let X= [x1x2··· xn]be an m×nmatrix. Each
column of Xcontains the data for one patient. The rows
of the matrix Xcorrespond to genes. We use a continuous
wavelet transform to analyze how the frequency content
of a signal changes over a patient’s genes. The wavelet
transform compares the signal xwith shifted and scaled
copies of a basic wavelet. We use the Mexican hat wavelet
as the mother wavelet. This wavelet is very advantageous for
analyzing genetic signals because of its explicit mathematical
expression, smoothness, symmetry, and rapid decay. Let
Ψ(k)denotes the Mexican hat function of width σ
Ψ(k) = 2
3σπ 1
4
(1 k2
σ2) exp
k2
2σ2.(2)
The larger the value of σthe more the energy of Ψ(k)is
spread out over the genes (horizontal axis). The continuous
wavelet transform (CWT) of a signal xat a scale s > 0and
position τRis expressed by the following integral
Wi(s, τ ;xi(t),Ψ(t)) = 1
p|s|Zx(t(tτ
s)dt, (3)
where denotes the complex conjugate. The CWT coeffi-
cients are affected by scale, s, position, τ, and the mother
wavelet function Ψ. In this analysis, the value of scale s
is fixed and the the position τis between 1and T. An
appropriate window size Tis chosen for the time-frequency
localization. The value for Tis chosen by visually inspecting
the wavelet coefficients. Wiis a matrix of T×m. Let Vi
be the vectorized form of the matrix Wi. We reconstruct the
matrix H= [V1···Vn]. The wavelet coefficients His a
matrix of T m ×n. The wavelet transform (w) expands the
data from the space m×nto the space T m ×n,
Xw
H. (4)
The expansion using wavelet analysis maps the data onto a
larger space. It also expands a signal (through the genes)
into a waveform whose time and frequency properties are
the same as the original signal. We center the data matrix H
by taking the average of each row of data and subtracting it
from each entry in that row.
We then compress the wavelet coefficients by mapping
them onto a number of bases. These bases are the left
singular vectors of the waveform coefficients H. The singular
value decomposition of the matrix His given as
H=LSRT.(5)
We project the data onto the first kleft singular vectors of the
matrix H. Let liand ribe the ith column of the orthonormal
matrix Land Rrespectively
ˆ
H= [l1···lk]TH= [σ1r1·· · σkrk]T.(6)
The mapping in (6), compresses the waveform coeffi-
cients onto a new scaled orthonormal coordinate system
[σ1r1···σkrk],
H[l1···lk]
ˆ
H. (7)
Thus we shrink the size of the waveform coefficients from
T m ×nto k×nand filter unwanted signals and noise.
Choosing a right value for the parameter kis crucial to
classifying the patients correctly. The experimental results
shows that the value of kshould be less than the number
of genes m. We use an iterative method to choose a range
of values for kfrom 1to m. We finally select a value
for kthat is able to classify the low-risk and high-risk
patients efficiently. Choosing the value of kdepends on other
parameters of the neural network model as will be described
in the next section.
C. Neural Network Model
We first classify the patients into two groups of high
risk and low risk patients using survival time analysis as
explained in Section II-A. We randomly choose Ppercent
of the high-risk and low-risk patients as the training data
and 1Ppercent of the high-risk and low-risk patients as
the validation data. The value of Pis chosen based on an
iterative method that will be explained in the next paragraph.
The training and validation data are disjoint. The neural
models are trained for nseparate times on all the patients’
data except for one patient, and then prediction is made for
that patient. The output of the neural network is a number
between zero and one. We therefore choose a threshold
between zero and one. If the output of the neural network is
larger than the threshold, then the patient is considered high-
risk, otherwise the patient is considered low-risk. Choosing
the right value for the threshold is crucial for the analysis as
described below.
The parameters for expansion, compression, and the train-
ing of the neural network are chosen based on an iterative
method. These parameters are summarized as follows:
Window size Tfor the expansion of data using wavelet
analysis.
Number of right singular vectors kfor the compression
of data.
Percentage of the data Pthat is used for the validation
of the neural network model.
Number of hidden layers h.
Threshold for the output of the neural network T h.
After iteration of the algorithm over a range of values, a set
of values for these parameters are chosen to give us a high
value of the true positive rate for low-risk patients and a
small value for the false positive rate for the corresponding
group of patients. The number of hidden layers hmust be
smaller than or equal to the number of right singular vectors
k. The threshold T h is a number between zero and one.
D. Validation Results
In this work, false Positive Rate (FPR) is defined as the
probability that a high-risk patient is recognized as a low-risk
patient. True Positive Rate (TPR) is defined as the probability
that a low-risk patient is recognized as a low-risk patient. Our
class of interest is the low-risk patients. We consider two sets
of genes, one including 36 genes and another one including
40 genes. These genes have been found by analyzing larger
genomic data sets by the authors, as described in [7]. The
genes in this signatures carry signal information that classify
the ovarian cancer patients by their response to standard
chemotherapy. We do the analysis for two separate groups of
patients. The first group has 54 patients. Out of 54 patients,
20 patients (37%) are low-risk and the rest are high-risk. The
second group has 99 patients. Out of 99 patients, 19 patients
(19%) are low-risk and the rest are high-risk. In figures 2-
5, the receiver operating characteristics (ROCs) for the two
sets of genes and two groups of patients are plotted. The
vertical axis is the True Positive Rate and the horizontal axis
is the False Positive Rate. An optimal threshold for the neural
network model and a set of parameters for the data expansion
and compression, is chosen based on an iterative algorithm
as given in Table II. The true positive rate, false positive
rate, and area under the ROC, for the optimal parameters are
given in Table I.
We second classify the patients into two groups of high
risk and low risk patients using regular neural network
model. We consider 390 of patients as the training data. We
train a model to classify 54 patients. Out of 390 patients,
67 patients (17%) are low-risk and 323 (83%) are high-risk.
We randomly consider 164 of high-risk patients and all the
low-risk patients (67 patients) as the training data. The rest
of high-risk patients are not considered in the analysis. We
train the neural network model on 231 patients, in which
29% are low-risk and 71% are high-risk. Then, prediction is
made for 54 patients. In figures 6-7, the receiver operating
characteristics (ROCs) for the two sets of genes are plotted.
It is observed that the result from leave-ne-out is similar to
regular neural network when the number of high-risk patient
is reduced. Similar to the leave-one approach Ppercent of
the high-risk and low-risk patients as the training data and
1Ppercent of the high-risk and low-risk patients as the
validation data. The training and validation data are disjoint.
Similarly, a threshold is chosen for the output of the neural
network.
Fig. 2. ROC for 36 genes and 54 patients using leave-one-out approach.
Fig. 3. ROC for 36 genes and 99 patients using leave-one-out approach.
Fig. 4. ROC for 40 genes and 54 patients using leave-one-out approach.
Fig. 5. ROC for 40 genes and 99 patients using leave-one-out approach.
TABLE I
TRUE POSITIVE RATE (TPR), FAL SE POSITIVE RATE (FPR), ARE A
UN DER TH E ROC WITH EXPANSION AND COMPRESSION (AREA1), AND
AR EA UND ER THE RO C WITHOUT EXPANSION AND COMPRESSION
(AREA2)
Genes Patients TPR FPR AREA1 AREA2
40 99 0.83 0.07 0.80 0.24
40 54 0.97 0.11 0.88 0.27
36 99 0.92 0.13 0.83 0.36
36 54 0.97 0.07 0.91 0.36
III. CONCLUSIONS
It is observed that expansion and compression of the
data, enable the neural network model to classify patients
significantly better. The results from leave-one-out approach
is comparable with results from regular neural network.
Choosing the right parameters for the compression, expan-
sion and training model is crucial for the analysis. Reducing
the number of high-risk patients helps to train a neural
network model to classify high-risk and low-risk patients.
REFERENCES
[1] MA Jarrahi, H Samet, H Raayatpisheh, A Jafari, M Rakhshan, An
ANFIS-Based fault classification approach in double-circuit transmis-
sion line using current samples, International Work-Conference on
Artificial Neural Networks, pp. 225–236, 2015.
[2] M Rakhshan, F Shabani-nia, M ShaSadeghi, ANFIS Approach for
tracking control of mems triaxial gyroscope, Modeling and Simulation
in Electrical and Electronics Engineering 1 (1), pp. 35–40, 2015.
[3] M Rakhshan, E Moula, F Shabani-nia, B Safarinejadian, S Khorshidi,
Active noise control using wavelet function and network approach,
Journal of Low Frequency Noise, Vibration and Active Control 35
(1), pp. 4–16, 2016.
[4] M Rakhshan, S Khorshidi, B Safarinejadian, Active noise control in
presence of disturbance using adaptive neuro fuzzy inference system
Journal of Computational Intelligence and Electronic Systems 3 (2),
pp. 99–105, 2014.
[5] A Zeinalzadeh, T Wenska, G Okimoto, Integrated analysis of multiple
high-dimensional data sets by joint rank-1 matrix approximations,
2015 54th IEEE Conference on Decision and Control (CDC), 3852–
3857.
[6] Ashkan Zeinalzadeh, An iterated version of the generalized singular
value decomposition for the joint analysis of two high-dimensional
data sets, University of Hawaii at Manoa, 2013.
[7] G. Okimoto, A. Zeinalzadeh, T. Wenska, M. Loomis, J.B. Nation, T.
Fabre, M. Tiirikainen, B. Hernandez, O. Chan, L. Wong, S. Kwee,
Joint analysis of multiple high-dimensional data types using sparse
matrix approximations of rank-1with applications to ovarian and liver
cancer, BioData Mining, 9 (1), 24.
TABLE II
THE PARAMETERS OF EXPANSION,COMPRESSION AND NEURAL
NE TWOR K
Genes Patients T k P h Th
40 99 5 7 0.7 5 0.84
40 54 5 7 0.8 6 0.83
36 99 5 7 0.7 6 0.84
36 54 5 7 0.8 3 0.83
Fig. 6. ROC for 36 genes and 54 patients using regular neural network
approach.
[8] R. Longadge, S. S. Dongre, L. Malik, Class imbalance problem in
data mining, International Journal of Computer Science and Network,
Vol 2, Issue 1, pp. 83-87, 2013.
[9] R. T. Hadke, P. Khobragade, An approach for class imbalance using
oversampling technique, International Journal of Innovative Research
in Computer and Communication Engineering, Vol. 3, Issue 11, pp.
11451–11455 ,2015.
[10] E. L. Kaplan, and P. Meier, Nonparametric estimation from incomplete
observations, Journal of the American Statistical Association, Vol. 53,
No. 282, pp. 457–481,1958.
Fig. 7. ROC for 40 genes and 54 patients using regular neural network
approach.
... The algorithm achieved high performance in terms of memory, time, and speed with a success rate of 93.3%. Authors in [12] developed a neural network model to classify liver cancer patients into high-risk and low-risk groups using genomic data. The authors preprocessed the data using wavelet analysis and then compressed the wavelet coefficients by mapping them onto a new scaled orthonormal coordinate system. ...
... In this study, a deep learning-based classification method was proposed for the identification of genomic markers of liver cancer. To date, various methods have been utilized for the identification of liver cancer from DNA sequences, including Rule Learning [8], Particle Swarm Optimization [9], Boolean Algebra [10], Hidden Markov Model [11], Survival-Time Analysis, Eigen Wavelet, and Neural Network [12]. Table 9 shows a comparison of our method with the methods proposed in the literature with regard to performance results of liver cancer DNA sequences classification. ...
Article
DNA biomarkers are considered to be an important diagnostic factor for early detection of cancer. With the development of technology, many machine learning methods have been introduced to detect cancer biomarkers from DNA sequences in the literature. In this study, a new approach was proposed for the detection of cancer genes, which is an important step for the prediction of cancer. In the proposed approach, the gene sequences were digitized by three different numerical mapping techniques. Following digitization, these DNA sequences were initially examined with two different ways as one-dimensional signal and two-dimensional spectrogram images. Firstly, the digitized sequences were examined with the designed CNN model as a one-dimensional signal. Secondly, DNA signals were converted to 2D spectrogram images and examined with two different 2D CNN models. In the first model, feature vectors were obtained by VGG16 and classified by SVM. In the second model, new layers were added to the final output layers of VGG16, and fine-tuning was applied. An accuracy of 80.36% was obtained in one-dimensional CNN model, an accuracy of 98.86% was achieved in the model where features were extracted with VGG16 and classified with SVM, and an accuracy of 100% was observed in the model where fine tuning was applied to the layers of VGG16. The proposed method indicated that effective features were extracted with CNN models to distinguish liver cancer and normal liver gene sequences. The application results showed that the system is ready to be tested with a larger dataset and different cancer types.
... Given a set of N matrices A i ∈ M m i n with full column rank. The SVD of stacked matrix 3 The jth column of P can be written as ...
Article
Full-text available
We develop an Iterative version of the Singular Value Decomposition (ISVD) that jointly analyzes a finite number of data matrices to identify signals that correlate among the rows of matrices. It will be illustrated how the supervised analysis of a big data set by another complex, multi-dimensional phenotype using the ISVD algorithm could lead to signal detection.
Article
Full-text available
Background: Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. Methods: The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single "sparsity" parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on "residual" data matrices that result from a given sparse approximation. Results: We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Conclusions: Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.
Conference Paper
Full-text available
Transmission line protective relaying is an essential feature of a reliable power system operation. Fast detecting, isolating, locating and repairing of the different faults are critical in maintaining a reliable power system operation. On the other hand, classification of the different fault types plays very significant role in digital distance protection of the transmission line. Accurate and fast fault classification can prevent from more damages in the power system. In this paper, an approach is presented to classify the fault in a double-circuit transmission line based on the adaptive Neuro- Fuzzy Inference System (ANFIS) using three phase current samples of only one terminal. This method is independent of effects of variation of fault inception angle, fault location, fault resistance and load angle. MATLAB/Simulink is used to produce fault signals. The proposed method is tested by simulating different scenarios on a given transmission line model. The simulation results denote that the proposed approach for fault identification is able to classify all the faults on the parallel transmission line within half cycle after the inception of fault.
Article
Full-text available
Active noise control (ANC) is based on the principle of superposition of waves. It means that an algorithm is used to tune a secondary source to make an anti-noise with equal amplitude but opposite phase with the primary source. In this paper, a wavelet function and network (WAVENET) approach is designed for ANC. The algorithm is used to train parameters of an anti-noise filter for omitting the undesired noise. FXLMS and NLMS are the conventional methods of ANC that need complex acoustic plant models and these necessities make the methods complex and inaccurate. In the WAVENET approach, this complexity can be accounted for. Numerical simulations for a WAVENET approach are presented to demonstrate the performance of the WAVENET approach scheme.
Article
Active Noise Control is based on the principle of superposition of the waves. It means that an algorithm is used to tune a secondary source to make an anti-noise with equal amplitude but opposite phase with primary source. In this paper, an ANFIS approach is designed for Active Noise Control (ANC). This algorithm is used to train parameters of an anti-noise filter for omitting the undesired noise during the secondary source in presence of disturbances. FXLMS and NLMS are the conventional methods of ANC that need complex acoustic plant models and these necessities makes the methods complex and inaccurate. In ANFIS approach, this complexity can be regarded. Numerical simulations for the ANFIS approach are looked into to check the performance of the ANFIS approach scheme.
Article
In lifetesting, medical follow-up, and other fields the observation of the time of occurrence of the event of interest (called a death) may be prevented for some of the items of the sample by the previous occurrence of some other event (called a loss). Losses may be either accidental or controlled, the latter resulting from a decision to terminate certain observations. In either case it is usually assumed in this paper that the lifetime (age at death) is independent of the potential loss time; in practice this assumption deserves careful scrutiny. Despite the resulting incompleteness of the data, it is desired to estimate the proportion P(t) of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t). The observation for each item of a suitable initial event, marking the beginning of its lifetime, is presupposed. For random samples of size N the product-limit (PL) estimate can be defined as follows: List and label the N observed lifetimes (whether to death or loss) in order of increasing magnitude, so that one has 0t1t2tN.0 \leqslant t_1^\prime \leqslant t_2^\prime \leqslant \cdots \leqslant t_N^\prime . Then P^(t)=Πr[(Nr)/(Nr+1)]\hat P\left( t \right) = \Pi r\left[ {\left( {N - r} \right)/\left( {N - r + 1} \right)} \right], where r assumes those values for which trtt_r^\prime \leqslant t and for which trt_r^\prime measures the time to death. This estimate is the distribution, unrestricted as to form, which maximizes the likelihood of the observations. Other estimates that are discussed are the actuarial estimates (which are also products, but with the number of factors usually reduced by grouping); and reduced-sample (RS) estimates, which require that losses not be accidental, so that the limits of observation (potential loss times) are known even for those items whose deaths are observed. When no losses occur at ages less than t the estimate of P(t) in all cases reduces to the usual binomial estimate, namely, the observed proportion of survivors.
An iterated version of the generalized singular value decomposition for the joint analysis of two high-dimensional data sets
  • Ashkan Zeinalzadeh
Ashkan Zeinalzadeh, An iterated version of the generalized singular value decomposition for the joint analysis of two high-dimensional data sets, University of Hawaii at Manoa, 2013.
Class imbalance problem in data mining
  • R Longadge
  • S S Dongre
  • L Malik
R. Longadge, S. S. Dongre, L. Malik, Class imbalance problem in data mining, International Journal of Computer Science and Network, Vol 2, Issue 1, pp. 83-87, 2013.
An approach for class imbalance using oversampling technique
  • R T Hadke
  • P Khobragade
R. T. Hadke, P. Khobragade, An approach for class imbalance using oversampling technique, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 3, Issue 11, pp. 11451-11455,2015.
ANFIS Approach for tracking control of mems triaxial gyroscope
  • M Rakhshan
  • Shabani-Nia
  • Shasadeghi
M Rakhshan, F Shabani-nia, M ShaSadeghi, ANFIS Approach for tracking control of mems triaxial gyroscope, Modeling and Simulation in Electrical and Electronics Engineering 1 (1), pp. 35-40, 2015.