[Show abstract][Hide abstract] ABSTRACT: The emergence of cloud computing has made dynamic provisioning of elastic
capacity to applications on-demand. Cloud data centers contain thousands of
physical servers hosting orders of magnitude more virtual machines that can be
allocated on demand to users in a pay-as-you-go model. However, not all systems
are able to scale up by just adding more virtual machines. Therefore, it is
essential, even for scalable systems, to project workloads in advance rather
than using a purely reactive approach. Given the scale of modern cloud
infrastructures generating real time monitoring information, along with all the
information generated by operating systems and applications, this data poses
the issues of volume, velocity, and variety that are addressed by Big Data
approaches. In this paper, we investigate how utilization of Big Data analytics
helps in enhancing the operation of cloud computing environments. We discuss
diverse applications of Big Data analytics in clouds, open issues for enhancing
cloud operations via Big Data analytics, and architecture for anomaly detection
and prevention in clouds along with future research directions.
[Show abstract][Hide abstract] ABSTRACT: White matter lesions (WMLs) are small groups of dead cells that clump together in the white matter of brain. In this paper, we propose a reliable method to automatically segment WMLs. Our method uses a novel filter to enhance the intensity of WMLs. Then a feature set containing enhanced intensity, anatomical and spatial information is used to train a random forest classifier for the initial segmentation of WMLs. Following that a reliable and robust edge potential function based Markov Random Field (MRF) is proposed to obtain the final segmentation by removing false positive WMLs. Quantitative evaluation of the proposed method is performed on 24 subjects of ENVISion study. The segmentation results are validated against the manual segmentation, performed under the supervision of an expert neuroradiologist. The results show a dice similarity index of 0.76 for severe lesion load, 0.73 for moderate lesion load and 0.61 for mild lesion load. In addition to that we have compared our method with three state of the art methods on 20 subjects of Medical Image Computing and Computer Aided Intervention Society's (MICCAI's) MS lesion challenge dataset, where our method shows better segmentation accuracy compare to the state of the art methods. These results indicate that the proposed method can assist the neuroradiologists in assessing the WMLs in clinical practice.
Computerized medical imaging and graphics: the official journal of the Computerized Medical Imaging Society 08/2015; 45. DOI:10.1016/j.compmedimag.2015.08.005 · 1.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Monitoring and predicting traffic conditions are of utmost importance in reacting to emergency events in time and for computing the real-time shortest travel-time path. Mobile sensors, such as GPS devices and smartphones, are useful for monitoring urban traffic due to their large coverage area and ease of deployment. Many researchers have employed such sensed data to model and predict traffic conditions. To do so, we first have to address the problem of associating GPS trajectories with the road network in a robust manner. Existing methods rely on point-by-point matching to map individual GPS points to a road segment. However, GPS data is imprecise due to noise in GPS signals. GPS coordinates can have errors of several meters and, therefore, direct mapping of individual points is error prone. Acknowledging that every GPS point is potentially noisy, we propose a radically different approach to overcome inaccuracy in GPS data. Instead of focusing on a point-by-point approach, our proposed method considers the set of relevant GPS points in a trajectory that can be mapped together to a road segment. This clustering approach gives us a macroscopic view of the GPS trajectories even under very noisy conditions. Our method clusters points based on the direction of movement as a spatial-linear cluster, ranks the possible route segments in the graph for each group, and searches for the best combination of segments as the overall path for the given set of GPS points. Through extensive experiments on both synthetic and real datasets, we demonstrate that, even with highly noisy GPS measurements, our proposed algorithm outperforms state-of-the-art methods in terms of both accuracy and computational cost.
International Journal of Geographical Information Science 07/2015; 29(12):1-29. DOI:10.1080/13658816.2015.1072202 · 1.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper a reliable and robust method is presented for the quantification of Focal Arteriolar Narrowing (FAN), a precursor for hypertension, stroke and other cardiovascular diseases. Our contribution in this paper is that we have proposed a novel edge based retinal blood vessel segmentation technique which is very effective in low contrast retinal images. In addition to that we developed a robust and reliable measurement technique to quantify Focal Arteriolar Narrowing. For initial results and quantitative evaluation of the proposed method, we evaluate our proposed method on a dataset of 53 manually graded vessel segments. The experimental results indicate a strong correlation between the computed Focal Arteriolar Narrowing (FAN) values and the expert grading (Spearman coefficient of 0.76, P < 0.0001). In addition, the results also show that the system can detect healthy and FAN affected cases with an accuracy of 93%. This has demonstrated the reliability of the proposed method for automatic focal narrowing assessment. The quantitative measurements provided by the system may help to establish a more reliable link between Focal Arteriolar Narrowing (FAN) and known systemic and eye diseases, which will be investigated further.
[Show abstract][Hide abstract] ABSTRACT: Recent studies show that, cerebral White MatterLesion (WML) is related to cerebrovascular diseases,cardiovascular diseases, dementia and psychiatric disorders.Manual segmentation of WML is not appropriate for long termlongitudinal studies because it is time consuming and it showshigh intra- and inter-rater variability. In this paper, a fullyautomated segmentation method is utilized to segment WMLfrom brain Magnetic Resonance Imaging (MRI). The segmentationmethod uses a combination of global neighbourhoodgiven contrast feature-based Random Forest (RF) classifier andMarkov Random Field (MRF) to segment WML. To removefalse positive lesions we use a rule based morphological postprocessingoperation. Quantitative evaluation of the proposedmethod was performed on 24 subjects of ENVIS-ion study.The segmentation results were validated against the manualsegmentation, performed by an experienced radiologist andwere compared to a recenlty published WML segmentationmethod. The results show a dice similarity index of 0.75 forhigh lesion load, 0.71 for medium lesion load and 0.60 for lowlesion load.
[Show abstract][Hide abstract] ABSTRACT: Graphs are a powerful representation of relational data, such as social and biological networks. Often, these entities form groups and are organised according to a latent structure. However, these groupings and structures are generally unknown and it can be difficult to identify them. Graph clustering is an important type of approach used to discover these vertex groups and the latent structure within graphs. One type of approach for graph clustering is non-negative matrix factorisation However, the formulations of existing factorisation approaches can be overly relaxed and their groupings and results consequently difficult to interpret, may fail to discover the true latent structure and groupings, and converge to extreme solutions. In this paper, we propose a new formulation of the graph clustering problem that results in clusterings that are easy to interpret. Combined with a novel algorithm, the clusterings are also more accurate than state-of-the-art algorithms for both synthetic and real datasets.
[Show abstract][Hide abstract] ABSTRACT: Scientific workflows are used to model applications of high throughput computation and complex large scale data analysis. In recent years, Cloud computing is fast evolving as the target platform for such applications among researchers. Furthermore, new pricing models have been pioneered by Cloud providers that allow users to provision resources and to use them in an efficient manner with significant cost reductions. In this paper, we propose a scheduling algorithm that schedules tasks on Cloud resources using two different pricing models (spot and on-demand instances) to reduce the cost of execution whilst meeting the workflow deadline. The proposed algorithm is fault tolerant against the premature termination of spot instances and also robust against performance variations of Cloud resources. Experimental results demonstrate that our heuristic reduces up to 70% execution cost as against using only on-demand instances.
[Show abstract][Hide abstract] ABSTRACT: Altered expression profiles of microRNAs (miRNAs) are linked to many diseases including lung cancer. miRNA expression profiling is reproducible and miRNAs are very stable. These characteristics of miRNAs make them ideal biomarker candidates.
This work is aimed to detect 2-and 3-miRNA groups, together with specific expression ranges of these miRNAs, to form simple linear discriminant rules for biomarker identification and biological interpretation. Our method is based on a novel committee of decision trees to derive 2-and 3-miRNA 100%-frequency rules. This method is applied to a data set of lung miRNA expression profiles of 61 squamous cell carcinoma (SCC) samples and 10 normal tissue samples. A distance separation technique is used to select the most reliable rules which are then evaluated on a large independent data set.
We obtained four 2-miRNA and three 3-miRNA top-ranked rules. One important rule is that: If the expression level of miR-98 is above 7.356 and the expression level of miR-205 is below 9.601 (log2 quantile normalized MirVan miRNA Bioarray signals), then the sample is normal rather than cancerous with specificity and sensitivity both 100%. The classification performance of our best miRNA rules remarkably outperformed that by randomly selected miRNA rules. Our data analysis also showed that miR-98 and miR-205 have two common predicted target genes FZD3 and RPS6KA3, which are actually genes associated with carcinoma according to the Online Mendelian Inheritance in Man (OMIM) database. We also found that most of the chromosomal loci of these miRNAs have a high frequency of genomic alteration in lung cancer. On the independent data set (with balanced controls), the three miRNAs miR-126, miR-205 and miR-182 from our best rule can separate the two classes of samples at the accuracy of 84.49%, sensitivity of 91.40% and specificity of 77.14%.
Our results indicate that rule discovery followed by distance separation is a powerful computational method to identify reliable miRNA biomarkers. The visualization of the rules and the clear separation between the normal and cancer samples by our rules will help biology experts for their analysis and biological interpretation.
[Show abstract][Hide abstract] ABSTRACT: It is well known that processing big graph data can be costly on Cloud. Processing big graph data introduces complex and multiple iterations that raise challenges such as parallel memory bottlenecks, deadlocks, and inefficiency. To tackle the challenges, we propose a novel technique for effectively processing big graph data on Cloud. Specifically, the big data will be compressed with its spatiotemporal features on Cloud. By exploring spatial data correlation, we partition a graph data set into clusters. In a cluster, the workload can be shared by the inference based on time series similarity. By exploiting temporal correlation, in each time series or a single graph edge, temporal data compression is conducted. A novel data driven scheduling is also developed for data processing optimization. The experiment results demonstrate that the spatiotemporal compression and scheduling achieve significant performance gains in terms of data size and data fidelity loss.
Journal of Computer and System Sciences 12/2014; 80(8). DOI:10.1016/j.jcss.2014.04.022 · 1.14 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Mining GPS trajectories of moving vehicles has led to many research
directions, such as traffic modeling and driving prediction. An
important challenge is how to map GPS traces to a road network
accurately under noisy conditions. However, to the best of our
knowledge, there is no existing work that first simplifies a trajectory
to improve map matching. In this paper we propose three trajectory
simplification algorithms that can deal with both offline and online
trajectory data. We use weighting functions to incorporate spatial
knowledge, such as segment lengths and turning angles, into our
simplification algorithms. In addition, we measure the noise degree of a
GPS point based on its spatio-temporal relationship to its neighbors.
The effectiveness of our algorithms is comprehensively evaluated on real
trajectory datasets with varying the noise levels and sampling rates.
Our evaluation shows that under highly noisy conditions, our proposed
algorithms considerably improve map matching accuracy and reduce
computational costs compared to the state-of-the-art methods.
Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems; 11/2014
[Show abstract][Hide abstract] ABSTRACT: Retinal arteriovenous (AV) nicking is a precursor for hypertension, stroke and other cardiovascular diseases. In this paper, an effective method is proposed for the analysis of retinal venular widths to automatically classify the severity level of AV nicking. We use combination of intensity and edge information of the vein to compute its widths. The widths at various sections of the vein near the crossover point are then utilized to train a random forest classifier to classify the severity of AV nicking. We analyzed 47 color retinal images obtained from two population based studies for quantitative evaluation of the proposed method. We compare the detection accuracy of our method with a recently published four class AV nicking classification method. Our proposed method shows 64.51% classification accuracy in-contrast to the reported classification accuracy of 49.46% by the state of the art method.
[Show abstract][Hide abstract] ABSTRACT: Dynamic resource provisioning and the notion of seemingly unlimited resources are attracting scientific workflows rapidly into Cloud computing. Existing works on workflow scheduling in the context of Clouds are either on deadline or cost optimization, ignoring the necessity for robustness. Robust scheduling that handles performance variations of Cloud resources and failures in the environment is essential in the context of Clouds. In this paper, we present a robust scheduling algorithm with resource allocation policies that schedule workflow tasks on heterogeneous Cloud resources while trying to minimize the total elapsed time (make span) and the cost. Our results show that the proposed resource allocation policies provide robust and fault-tolerant schedule while minimizing make span. The results also show that with the increase in budget, our policies increase the robustness of the schedule.
2014 IEEE 28th International Conference on Advanced Information Networking and Applications (AINA); 05/2014
[Show abstract][Hide abstract] ABSTRACT: Retinal imaging can facilitate the measurement and quantification of subtle variations and abnormalities in retinal vasculature. Retinal vascular imaging may thus offer potential as a noninvasive research tool to probe the role and pathophysiology of the microvasculature, and as a cardiovascular risk prediction tool. In order to perform this, an accurate method must be provided that is statistically sound and repeatable. This paper presents the methodology of such a system that assists physicians in measuring vessel caliber (i.e., diameters or width) from digitized fundus photographs. The system involves texture and edge information to measure and quantify vessel caliber. The graphical user interfaces are developed to allow retinal image graders to select individual vessel area that automatically returns the vessel calibers for noisy images. The accuracy of the method is validated using the measured caliber from graders and an existing method. The system provides very high accuracy vessel caliber measurement which is also reproducible with high consistency.
Computers in Biology and Medicine 01/2014; 44(1):1–9. DOI:10.1016/j.compbiomed.2013.07.018 · 1.24 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Infrastructure-as-a-Service cloud providers offer diverse purchasing options and pricing plans, namely on-demand, reservation, and spot market plans. This allows them to efficiently target a variety of customer groups with distinct preferences and to generate more revenue accordingly. An important consequence of this diversification however, is that it introduces a nontrivial optimization problem related to the allocation of the provider’s available data center capacity to each pricing plan. The complexity of the problem follows from the different levels of revenue generated per unit of capacity sold, and the different commitments consumers and providers make when resources are allocated under a given plan. In this work, we address a novel problem of maximizing revenue through an optimization of capacity allocation to each pricing plan by means of admission control for reservation contracts, in a setting where aforementioned plans are jointly offered to customers. We devise both an optimal algorithm based on a stochastic dynamic programming formulation and two heuristics that trade-off optimality and computational complexity. Our evaluation, which relies on an adaptation of a large-scale real-world workload trace of Google, shows that our algorithms can significantly increase revenue compared to an allocation without capacity control given that sufficient resource contention is present in the system. In addition, we show that our heuristics effectively allow for online decision making and quantify the revenue loss caused by the assumptions made to render the optimization problem tractable.
IEEE Transactions on Cloud Computing 01/2014; 3(3):1-1. DOI:10.1109/TCC.2014.2382119
[Show abstract][Hide abstract] ABSTRACT: We discuss a new formulation of a fuzzy validity index that generalizes the Newman-Girvan (NG) modularity function. The NG function serves as a cluster validity functional in community detection studies. The input data is an undirected weighted graph that represents, e.g., a social network. Clusters correspond to socially similar substructures in the network. We compare our fuzzy modularity with two existing modularity functions using the well-studied Karate Club and American College Football datasets.
IEEE Transactions on Fuzzy Systems 12/2013; 21(6). DOI:10.1109/TFUZZ.2013.2245135 · 8.75 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Blockmodelling is an important technique in social network analysis for discovering the latent structure in graphs. A blockmodel partitions the set of vertices in a graph into groups, where there are either many edges or few edges between any two groups. For example, in the reply graph of a question and answer forum, blockmodelling can identify the group of experts by their many replies to questioners, and the group of questioners by their lack of replies among themselves but many replies from experts. Non-negative matrix factorisation has been successfully applied to many problems, including blockmodelling. However, these existing approaches can fail to discover the true latent structure when the graphs have strong background noise or are sparse, which is typical of most real graphs. In this paper, we propose a new non-negative matrix factorisation approach that can discover blockmodels in sparse and noisy graphs. We use synthetic and real datasets to show that our approaches have much higher accuracy and comparable running times.
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management; 10/2013
[Show abstract][Hide abstract] ABSTRACT: Retinal vascular landmark points such as branching points and crossovers are important features for automatic retinal image matching and vascular abnormality detection. These landmark points can enable automatic screening of large dataset through the detection of vascular network abnormalities (i.e., arteriovenous nicking, retinal vein occlusion) which are important for hypertension and cardiovascular disease prediction. Existing methods for crossover point detection use only local information at each image pixel without considering vascular features to detect crossover positions. This leads to the misclassification of very acute crossovers which are represented by two bifurcation points in the skeleton image. In this article, we propose a robust method that utilizes both local information and vascular geometrical features at the crossing to distinguish crossover from non-crossover points in a retinal image. The proposed method was validated on fifteen high resolution retinal images and the results show that our method achieves higher accuracy than any existing methods. In particular, the proposed method can discover more than 74% (recall) of crossovers with a detection accuracy (fraction of detected crossover points that are correct) of 83% (precision). The detected crossovers provide essential results for the automatic detection of vascular network abnormalities, such as arteriovenous nicking, neovascularization, and retinal vein occlusion.
2013 20th IEEE International Conference on Image Processing (ICIP); 09/2013
[Show abstract][Hide abstract] ABSTRACT: In this paper, a fully automated segmentation method is proposed to identify Multiple Sclerosis (MS) related white matter lesions from brain magnetic resonance imaging (MRI) data. The main contribution of this paper is to obtain a new texture feature set for MS Lesion segmentation that is a combination of local and global neighbourhood information. The proposed method adopts a robust intensity normalization technique and lesion contrast enhancementfilter for enhancing the region of interest. We use a Support Vector Machine (SVM) to classify lesion pixels and level set based active contour and morphological filtering to achieve higher accuracy on lesion pixel identification. Quantitative evaluation of the proposed method is carried on real MRI data set provided by MS Lesion Challenge 2008. The results obtained from our method indicate significant improvement in performance compare to three state of the art methods that shows the proposed method's high suitability for assisting the neurologist to detect the MS in clinical practice.
2013 20th IEEE International Conference on Image Processing (ICIP); 09/2013
[Show abstract][Hide abstract] ABSTRACT: In this paper an automated method is presented for the detection of Focal Arteriolar Narrowing (FAN), a precursor for hypertension, stroke and other cardiovascular diseases. Our contribution in this paper is that, we have proposed a novel retinal blood vessel tracing and vessel width measurement algorithm, which is fully automated. We developed a novel method to detect FAN affected vessel segments by analysing their width distribution pattern. For initial results and quantitative evaluation of the proposed method, we evaluate our method on 30 color retinal images which are randomly selected by an experienced grader from SiMES dataset. We achieved the sensitivity of 75% and specificity of 98% in detecting FAN affected vessel segments and sensitivity of 80% and specificity of 86% in identifying healthy and FAN affected images. The acquired result shows the potential suitability of the proposed method for assisting the ophthalmologist to detect the FAN in clinical practice.
Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 07/2013; 2013:7376-7379. DOI:10.1109/EMBC.2013.6611262