[Show abstract][Hide abstract] ABSTRACT: High accuracy in cancer prediction is important to improve the quality of the
treatment and to improve the rate of survivability of patients. As the data
volume is increasing rapidly in the healthcare research, the analytical
challenge exists in double. The use of effective sampling technique in
classification algorithms always yields good prediction accuracy. The SEER
public use cancer database provides various prominent class labels for
prognosis prediction. The main objective of this paper is to find the effect of
sampling techniques in classifying the prognosis variable and propose an ideal
sampling method based on the outcome of the experimentation. In the first phase
of this work the traditional random sampling and stratified sampling techniques
have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested.
Much of the initial time has been focused on performing the pre_processing of
the SEER data set. The classification model for experimentation has been built
using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naive Bayes and K-Nearest
Neighbor. The three prognosis factors survival, stage and metastasis have been
used as class labels for experimental comparisons. The results shows a steady
increase in the prediction accuracy of balanced stratified model as the sample
size increases, but the traditional approach fluctuates before the optimum
[Show abstract][Hide abstract] ABSTRACT: Time series analysis is the process of building a model using statistical
techniques to represent characteristics of time series data. Processing and
forecasting huge time series data is a challenging task. This paper presents
Approximation and Prediction of Stock Time-series data (APST), which is a two
step approach to predict the direction of change of stock price indices. First,
performs data approximation by using the technique called Multilevel Segment
Mean (MSM). In second phase, prediction is performed for the approximated data
using Euclidian distance and Nearest-Neighbour technique. The computational
cost of data approximation is O(n ni) and computational cost of prediction task
is O(m |NN|). Thus, the accuracy and the time required for prediction in the
proposed method is comparatively efficient than the existing Label Based
Forecasting (LBF) method .
[Show abstract][Hide abstract] ABSTRACT: Email proves to be a convenient and powerful communication tool but it has given rise to unwanted mails. Spam mails leads to wastage of server storage space, consumption of network bandwidth and heavy financial losses to the organization, thus a serious research issue. Filtering mails is one of the popular approaches used to block spam mails. In this work, we propose RePID-OK (Repetitive Preprocessing technique using Imbalanced Data set by selecting Optimal number of Keywords) model for spam detection. Using the data set Ling-Spam, we show that efficiency of the proposed model is more powerful and effective than existing schemes. The performance of the proposed RePID-OK has been checked across the identified parameters and also evaluated against other existing models, thus demonstrating the efficiency of the proposed technique over other models in this area of research.
[Show abstract][Hide abstract] ABSTRACT: Rapid changes in communication technology coupled with problem of congestion has led to increase in research into Interoperable networks and Traffic splitting in recent years. This paper addresses the problem of traffic splitting on a “hybrid” node which consists of both WiMAX and WiFi enabled for communication simultaneously. The main objective of this simulation is to design a hybrid node that performs traffic splitting over two radio channels and identify the cases in which traffic splitting will prove efficient in transmitting data. Data traffic is routed using the AODV routing protocol, slightly modified for our network. In order to perform traffic splitting, we define a traffic splitting coefficient which denotes the percentage of data traffic split. Our simulations show that traffic splitting peaks at certain values of the split coefficients and that traffic splitting over two channels allows for higher data rates than when only one radio channel is used.
[Show abstract][Hide abstract] ABSTRACT: Opinions are highly essential for decision making and popular among the internet users. People with malicious intentions tend to give fake reviews to encourage or degrade the products. Reviewing movies is gaining popularity among web users, at the same time cannot be trusted. In this work, we propose a model Sentiment Classification of Movie Reviews using Efficient Repetitive Pre-processing (SentReP) that is based on tested parameters and a focused pre-processing technique to classify opinions. Working on the Cornell Movie review data set, this work significantly proves the accuracy and effectiveness of SentReP across different volumes of data and when compared to other different prevailing approaches. Overall this approach is very efficient in analyzing sentiments of movie reviews.
TENCON 2013 - 2013 IEEE Region 10 Conference (31194); 01/2013
[Show abstract][Hide abstract] ABSTRACT: Information extraction is a very challenging task because remote sensing images are very complicated and can be influenced by many factors. The information we can derive from a remote sensing image mostly depends on the image segmentation results. Image segmentation is an important processing step in most image, video and computer vision applications. Extensive research has been done in creating many different approaches and algorithms for image segmentation. Labeling different parts of the image has been a challenging aspect of image processing. Various algorithms for automating the segmentation process have been proposed, tested and evaluated to find the most ideal algorithm to be used for different types of images. In this paper we explore segmentation techniques of satellite images using two algorithms - Quick Shift and Level Set. Quick Shift is a mode seeking algorithm which instead of iteratively shifting each point towards a local mean forms a tree of links to the nearest neighbor which increases the density. Level Set is a curve propagation algorithm based on the Partial-Differential Equation (PDE) method that provides a direct way to estimate the geometric properties of the evolving structure. The aim of this paper is to analyse the above two approaches and declare one of the two methods to be efficient for segmentation on remote sensing applications.
India Conference (INDICON), 2012 Annual IEEE; 01/2012
[Show abstract][Hide abstract] ABSTRACT: A Wireless Sensor Network (WSN) is a set of sensor nodes that collects the information from environment and sends to base station (Header Node or Central Node). WSNs are application specific, hence all design considerations are different for each application. This paper highlights the technique like animals as mobile biological sensors and it is based on existing animals tracking systems used for zoological studies and space-bound/earth-bound observational techniques. The abnormal behaviour of animals prior to earthquake occurrence in seismically active region can be used to predict the earthquake because animals have relatively more capability than humans to perceive certain kind of geophysical stimuli which may precede earthquake. The space-bound and earth-bound observational techniques can be used to detect early warning signals from places where stress builds up deep in the earths crust that may lead to a catastrophic earthquake. Middleware for WSN is a software infrastructure with heterogeneous features that binds together the different applications, network hardware, operating systems, and network stacks. The task of middleware is to act as an interface between WSNs and end user system, it will process data which is taken from sensor devices and bio-sensors.
Advanced Computing (ICoAC), 2012 Fourth International Conference on; 01/2012
[Show abstract][Hide abstract] ABSTRACT: IEEE 802.16 (WiMAX) and IEEE 802.11 (WiFi) are different wireless access technologies with the former being used in Metropolitan Area Networks and the latter used in Local Area Networks. With the increase in popularity of WiMAX and WiFi networks, it is expected that mobile devices in future will be equipped with these radios for data access. Such a scenario presents us with an opportunity to utilize both these radios to optimize throughput. In this paper, we present an analysis of Traffic splitting over an abstract Multi-radio, Multi-hop wireless mesh network for improvements in throughput, where each device is equipped with WiMAX and WiFi radios. We attempt to setup a mathematical model that deals with the following issues: What constitutes a Hybrid Network Node; How should traffic splitting decisions be made; What parameters a suitable routing protocol should look out for when making routing decisions in such a network; finally, How does traffic splitting improve network performance? From the analysis performed it is seen that traffic splitting improves throughput and reduces end-to-end delay of a flow.
Parallel Distributed and Grid Computing (PDGC), 2012 2nd IEEE International Conference on; 01/2012
[Show abstract][Hide abstract] ABSTRACT: This work proposes two main contributions to statistical steganalysis of Yet Another Steganographic Scheme (YASS) in JPEG images. Firstly, this work presents a reliable blind steganalysis technique to predict YASS which is one of recent and least statistically detectable embedding scheme using only five features, four Huffman length statistics (H) and the ratio of file size to resolution (FR Index). Secondly these features are shown to be unique, accurate and monotonic over a wide range of settings for YASS and several supervised classifiers with the accuracy of prediction superior to most blind steganalyzers in vogue. Overall, the proposed model having Huffman Length Statistics as its linchpin predicts YASS with an average accuracy of over 94 percent.
International Journal of Hybrid Information Technology. 08/2011; 4.
[Show abstract][Hide abstract] ABSTRACT: With the rapid advancements in information and communication technology in the world, crimes committed are becoming technically intensive. When crimes committed use digital devices, forensic examiners have to adopt practical frameworks and methods to recover data for analysis which can pose as evidence. Data Generation, Data Warehousing and Data Mining, are the three essential features involved in the investigation process. This paper proposes a unique way of generating, storing and analyzing data, retrieved from digital
devices which pose as evidence in forensic analysis. A statistical approach is used in validating the reliability of the pre-processed data. This work proposes a practical framework for digital forensics on flash drives.
International Journal of Web Engineering and Technology 07/2011; 2(3):313-319.
[Show abstract][Hide abstract] ABSTRACT: Our upcoming livelihood environments are to be expected to rely on the information provided by various types of devices connected over different types of networks. An inclusive of effective middleware should provide applications from various platforms to the user. This leaves the securty of MANET vulnerable. In this paper, we proposed a novel approach to restrict the entry of selfish or malicious nodes into the MANET. Initially all Ad Hoc nodes will be deployed with a node address, group Id and its random session renewal period called threshold T, its performance P, minimum performance limit PMin. T is dependent on P's value, if T expires then service has to look over P's value, if P > PMin, then renew the time stamp as this node is already well known since initial deployment. If P PMin, then irrespective of T's value such node will be discarded from the communication. We have also considered three middleware services like Heterogeneity, Scalability and Topology and expressed results of the network performance through simulation.
Proceedings of the 2011 International Conference on Communication, Computing & Security, ICCCS 2011, Odisha, India, February 12-14, 2011; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Automated security is one of the major concerns of modern times. Secure and reliable authentication systems are in great demand. A biometric trait like Finger Knuckle Print (FKP) of a person is unique and secure. In this paper, we propose a human authentication system based on FKP image of a person. We apply Gabor Wavelet on pre-processed FKP image. Then we identify the peak points in Gabor Wavelet graph. The successive distances between those points are calculated and are stored in a vector. Now, the elements in distance vector stored in database and that of input image are compared. Such a match is considered to be success if the difference between two such elements is lesser than the threshold value. Now, the probability of success is computed. The person is authenticated based on the value of computed probability. The proposed system has the FAR of about 1.24% and FRR as 1.11%.
[Show abstract][Hide abstract] ABSTRACT: Automated security is one of the major concerns of modern times. Secure and reliable authentication systems are in great demand. A biometric trait like Finger Knuckle Print (FKP) of a person is unique and secure. In this paper, we propose a human authentication system based on FKP image of a person. Depending on the security level required by an organization that implements the proposed system, we provide two modes of security viz. basic mode and advanced mode. The Radon Transform is applied on pre-processed FKP image and Eigen values are computed. For basic mode, we compute the correlation coefficient between the set of Eigen values stored in the database and that of input image to authenticate a person. For advanced level of security, we identify the peak points in Radon graph. The successive distances between those points are calculated and are stored in a vector. Now, the elements in distance vector stored in database and that of input image are compared. Such a match is considered to be success if the difference between two such elements is lesser than the threshold value. Now, the probability of success is computed. To authenticate a person in advanced mode, we use the correlation coefficient between Eigen values and the probability. For real time implementation, suitable GUI can be developed. The basic mode of security system is found to have FAR as 6.79% and FRR as 0.0517%. The advanced system has the FAR of about 1.55% and FRR as 1.02%.
Proceedings of the 4th Bangalore Annual Compute Conference, Compute 2011, Bangalore, India, March 25-26, 2011; 01/2011 · 0.41 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Automated security is one of the major concerns of modern times. Secure and reliable authentication systems are in great demand.
A biometric trait like the electrocardiogram (ECG) of a person is unique and secure. In this paper, we propose an authentication
technique based on Radon transform. Here, ECG wave is considered as an image and Radon transform is applied on this image.
Standardized Euclidean distance is applied on the Radon image to get a feature vector. Correlation coefficient between such
two feature vectors is computed to authenticate a person. False Acceptance Ratio of the proposed system is found to be 2.19%
and False Rejection Ratio is 0.128%. We have developed two more approaches based on statistical features of an ECG wave as
our ground work. The result of proposed technique is compared with these two approaches and also with other state-of-the-art
Signal Image and Video Processing 01/2011; 5:485-493. · 0.41 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Spam mails are one of the greatest challenges faced by internet service providers, organizations and internet users in unison. Spam mails may be targeted, with a malicious intent or just as a commercial marketing activity - on the whole unwanted by everyone except the dispatcher. Spam filters continuously evolve as spammers go techno-savvy and creative. Machine learning algorithms have been popularly used for classifying and predicting mails as spam or ham (the good emails). This work presents a spam filter, BeaKS, with a focused preprocessing phase that weaves both the content of the email and two behavioral characteristics extracted from the email, to predict the category a mail belongs to: spam or ham. The accuracy of the proposed prediction model using Random Forests as the classifier is shown to be superior over other recent techniques. This approach is simple, easy to implement and reliable.
TENCON 2011 - 2011 IEEE Region 10 Conference; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Healthcare organizations aim at deriving valuable insights employing data mining and soft computing techniques on the vast
data stores that have been accumulated over the years. This data however, might consist of missing, incorrect and most of
the time, incomplete instances that can have a detrimental effect on the predictive analytics of the healthcare data. Preprocessing
of this data, specifically the imputation of missing values offers a challenge for reliable modeling. This work presents a
novel preprocessing phase with missing value imputation for both numerical and categorical data. A hybrid combination of Classification
and Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature Maps (SOFM)
to impute categorical values is adapted in this work. Further, Artificial Neural Networks (ANN) is used to validate the improved
accuracy of prediction after imputation. To evaluate this model, we use PIMA Indians Diabetes Data set (PIDD), and Mammographic
Mass Data (MMD). The accuracy of the proposed model that emphasizes on a preprocessing phase is shown to be superior over
the existing techniques. This approach is simple, easy to implement and practically reliable.
Advances in Computing and Communications - First International Conference, ACC 2011, Kochi, India, July 22-24, 2011, Proceedings, Part III; 01/2011
[Show abstract][Hide abstract] ABSTRACT: Automated security is one of the major concerns of modern times. Secure and reliable authentication of a person is in great
demand. A biometric trait like the electrocardiogram (ECG) of a person is unique and secure. In this paper we propose an authentication
system based on ECG by using statistical features like mean and variance of ECG waves. Statistical tests like Z −test, t −test and χ
2 −tests are used for checking the authenticity of an individual. Then confusion matrix is generated to find False Acceptance
Ratio (FAR) and False Rejection Ratio (FRR). This methodology of authentication is tested on data set of 200 waves prepared
from ECG samples of 40 individuals taken from Physionet QT Database. The proposed authentication system is found to have FAR
of about 2.56% and FRR of about 0.13%. The overall accuracy of the system is found to be 99.81%.
[Show abstract][Hide abstract] ABSTRACT: Targeted steganalysis aims at detecting hidden data embedded by a particular algorithm without any knowledge of the ‘cover’
image. In this paper we propose a novel approach for detecting Perturbed Quantization Steganography (PQ) by HFS (Huffman FR
index Steganalysis) algorithm using a combination of Huffman Bit Code Length (HBCL) Statistics and File size to Resolution
ratio (FR Index) which is not yet explored by steganalysts. JPEG images spanning a wide range of sizes, resolutions, textures
and quality are used to test the performance of the model. In this work we evaluate the model against several classifiers
like Artificial Neural Networks (ANN), k-Nearest Neighbors (k-NN), Random Forests (RF) and Support Vector Machines (SVM) for
steganalysis. Experiments conducted prove that the proposed HFS algorithm can detect PQ of several embedding rates with a
better accuracy compared to the existing attacks.
KeywordsSteganography–classifiers–Huffman coding–perturbed quantization