J. JaJa

University of Maryland, College Park, College Park, MD, USA

Are you J. JaJa?

Claim your profile

Publications (36)16.76 Total impact

  • Source
    Conference Proceeding: Component-based Data Layout for Efficient Slicing of Very Large Multidimensional Volumetric Data
    Jusub Kim, J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we introduce a new efficient data layout scheme to efficiently handle out-of-core axis-aligned slicing queries of very large multidimensional volumetric data. Slicing is a very useful dimension reduction tool that removes or reduces occlusion problems in visualizing 3D/4D volumetric data sets and that enables fast visual exploration of such data sets. We show that the data layouts based on typical space-filling curves are not optimal for the out-of-core slicing queries and present a novel component-based data layout scheme for a specialized problem domain, in which it is only required to provide fast slicing at every k-th value, for any k > 1. Our component-based data layout scheme provides much faster processing time for any axis-aligned slicing direction at every k-th value, k > 1, requiring less cache memory size and without any replication of data. In addition, the data layout can be generalized to any high dimension.
    Scientific and Statistical Database Management, 2007. SSBDM '07. 19th International Conference on; 08/2007
  • Source
    Conference Proceeding: Information-Aware 2^n-Tree for Efficient Out-of-Core Indexing of Very Large Multidimensional Volumetric Data
    Jusub Kim, J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: We discuss a new efficient out-of-core multidimensional indexing structure, information-aware 2<sup>n</sup>-tree, for indexing very large multidimensional volumetric data. Building a series of (n-1)-Dimensional indexing structures on n-Dimensional data causes a scalability problem in the situation of continually growing resolution in every dimension. However, building a single n-Dimensional indexing structure can cause an indexing effectiveness problem compared to the former case. The information-aware 2<sup>n</sup>-tree is an effort to maximize the indexing structure efficiency by ensuring that the subdivision of space have as similar coherence as possible along each dimension. It is particularly useful when data distribution along each dimension constantly shows a different degree of coherence from each other dimension. Our preliminary results show that our new tree can achieve higher indexing structure efficiency than previous methods.
    Scientific and Statistical Database Management, 2007. SSBDM '07. 19th International Conference on; 08/2007
  • Source
    Conference Proceeding: An efficient and scalable parallel algorithm for out-of-core isosurface extraction and rendering
    Qin Wang, J. JaJa, A. Varshney
    [show abstract] [hide abstract]
    ABSTRACT: We consider the problem of isosurface extraction and rendering for large scale time varying data. Such datasets have been appearing at an increasing rate especially from physics-based simulations, and can range in size from hundreds of gigabytes to tens of terabytes. We develop a new simple indexing scheme, which makes use of the concepts of the interval tree and the span space data structures. The new scheme enables isosurface extraction and rendering in I/O optimal time, using more compact indexing structure and more effective bulk data movement than the previous schemes. Moreover, our indexing scheme can be easily extended to a multiprocessor environment in which each processor has access to its own local disk. The resulting parallel algorithm is provably efficient and scalable. That is, it achieves load balancing across the processors independent of the isovalue, with almost no overhead in the total amount of work relative to the sequential algorithm. We conduct a large number of experimental tests on the University of Maryland Visualization Cluster using the Richtmyer-Meshkov instability dataset, and obtain results that consistently validate the efficiency and the scalability of our algorithm.
    Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International; 05/2006
  • Source
    Conference Proceeding: Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments
    [show abstract] [hide abstract]
    ABSTRACT: Emerging technologies in high speed NAS, hierarchical storage management systems, and networked systems that virtualize interconnected storage over IP and fiber-channel networks, promise to consolidate distributed data stores onto large-scale professionally managed enterprise storage environments. We describe the software architecture of the PAWN (Producer - Archive Workflow Network) environment that enables scalable, reliable marshalling and organization of distributed data into such enterprise storage environments. PAWN was initially developed to capture the core elements required for long term preservation of digital objects as identified by researchers in the digital library and archiving communities. In this paper, we show how PAWN can be extended to enable multiple clients at a number of distributed sites to prepare, organize, and bulk transfer large scale data onto clusters of servers that securely verify the integrity of the data, register the metadata, and store the data into an enterprise storage environment. PAWN allows detailed description, auditing, and organization of the data, and hence will allow for efficient management, access, and disaster recovery. The basic software components are based on open standards and web technologies, and hence are platform independent.
    Mass Storage Systems and Technologies, 2005. Proceedings. 22nd IEEE / 13th NASA Goddard Conference on; 05/2005
  • Source
    Conference Proceeding: Temporal range exploration of large scale multidimensional time series data
    J. JaJa, J. Kim, Qin Wang
    [show abstract] [hide abstract]
    ABSTRACT: We consider the problem of querying large scale multidimensional time series data to discover events of interest, test and validate hypotheses, or to associate temporal patterns with specific events. Large amounts of multidimensional time series data are currently available, and this type of data is growing at a fast rate due to the current trends in collecting time series of business, scientific, demographic, and simulation data. The ability to explore such collections interactively, even at a coarse level, will be critical in discovering the information and knowledge embedded in such collections. We develop indexing techniques and search algorithms to efficiently handle temporal range value querying of multidimensional time series data. Our indexing uses linear space data structures that enable the handling of queries very efficiently, invoking in the worst case a logarithmic number of queries to single time slices. We also show that our algorithm is ideally suited for parallel implementation on clusters of processors achieving a linear speedup in the number of available processors. A particularly simple data structure with provably good bounds is also presented for the case when the number of multidimensional objects is relatively small. These techniques improve significantly over previous techniques for either the serial or the parallel case, and are evaluated by extensive experimental results that confirm their superior performance.
    Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on; 07/2004
  • Article: A perspective on Quicksort
    J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: This article introduces the basic Quicksort algorithm and gives a flavor of the richness of its complexity analysis. The author also provides a glimpse of some of its generalizations to parallel algorithms and computational geometry
    Computing in Science and Engineering 02/2000; · 1.42 Impact Factor
  • Conference Proceeding: Prefix computations on symmetric multiprocessors
    D.R. Helman, J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas Reid-Miller and Blelloch (1996) targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on four symmetric multiprocessors-the IBM SP-2 (High Node), the HP-Convex Exemplar (S-Class), the DEC AlphaServer; and the Silicon Graphics Power Challenge. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems, our algorithm actually matched or exceeded the performance of the standard sequential solution using only a single thread. Moreover, in spite of the fact that the processors must compete for access to main memory, our algorithm still achieved scalable performance with up to 16 processors, which was the largest platform available to us
    Parallel and Distributed Processing, 1999. 13th International and 10th Symposium on Parallel and Distributed Processing, 1999. 1999 IPPS/SPDP. Proceedings; 05/1999
  • Source
    Conference Proceeding: A hierarchical data archiving and processing system to generate custom tailored products from AVHRR data
    [show abstract] [hide abstract]
    ABSTRACT: A novel indexing scheme is described to catalogue satellite data on a pixel basis. The objective of this research is to develop an efficient methodology to archive, retrieve and process satellite data, so that data products can be generated to meet the specific needs of individual scientists. When requesting data, users can specify the spatial and temporal resolution, geographic projection, choice of atmospheric correction, and the data selection methodology. The data processing is done in two stages. Satellite data is calibrated, navigated and quality flags are appended in the initial processing. This processed data is then indexed and stored. Secondary processing such as atmospheric correction and projection are done after a user requests the data to create custom made products. By dividing the processing in to two stages saves time, since the basic processing tasks such as navigation and calibration which are common to all requests are not repeated when different users request satellite data. The indexing scheme described can be extended to allow fusion of data sets from different sensors
    Geoscience and Remote Sensing Symposium, 1999. IGARSS '99 Proceedings. IEEE 1999 International; 02/1999
  • Conference Proceeding: Developing the next generation of Earth science data systems: the Global Land Cover Facility
    [show abstract] [hide abstract]
    ABSTRACT: A recent initiative by NASA has resulted in the formation of a federation of Earth science data partners. These Earth Science Information Partners (ESIPs) have been tasked with creating novel Earth science data products and services as well as distributing new and existing data sets to the Earth science community and the general public. The University of Maryland established its ESIP activities with the creation of the Global Land Cover Facility (GLCF). This joint effort of the Institute for Advanced Computer Studies (UMIACS) and the Department of Geography has developed an operational data archiving and distribution system aimed at advancing current land cover research efforts. The success of the GLCF is tied closely to assessing user needs as well. As the timely delivery of data products to the research community. This paper discusses the development and implementation of a web-based interface that allows users to query the authors' data holdings and perform user requested processing tasks on demand. The GLCF takes advantage of a scaleable, high performance computing architecture for the manipulation of very large remote sensing data sets and the rapid spatial indexing of multiple format data types. The user interface has been developed with the cooperation of the Human-Computer Interaction Laboratory (HCIL) and demonstrates advances in spatial and temporal querying tools as well as the ability to overlay multiple raster and vector data sets. Their work provides one perspective concerning how critical earth science data may be handled in the near future by a coalition of distributed data centers
    Geoscience and Remote Sensing Symposium, 1999. IGARSS '99 Proceedings. IEEE 1999 International; 02/1999
  • Source
    Article: Models and high-performance algorithms for global BRDF retrieval
    [show abstract] [hide abstract]
    ABSTRACT: The authors describe three models for retrieving information related to the scattering of light on the Earth's surface. Using these models, they've developed algorithms for the IBM SP2 that efficiently retrieve this information
    IEEE Computational Science and Engineering 11/1998;
  • Conference Proceeding: Retrieval of bidirectional reflectance distribution function (BRDF) at continental scales from AVHRR data using high performance computing
    [show abstract] [hide abstract]
    ABSTRACT: The authors have used high performance computing techniques to implement three different algorithms to model the Bidirectional Reflectance Distribution Function (BRDF) over land from AVHRR data. AVHRR data from the Pathfinder project has a spatial resolution of 8 km, and four years of data (1983-1986) was used in this study. Two of the models are statistical models, where the coefficients are derived from a set of directional reflectances for each solar zenith angle by curve fitting using a least square routine. The third model is semi-empirical, and the coefficients are derived by model inversion and numerical iteration. The semi-empirical model is computationally more expensive compared to the other two. One of the statistical models describes surface BRDF as a continuous temporal function using Fourier techniques. Analysis of the standard errors between observed and modeled reflectances from the temporal model show that the errors were larger in higher latitudes, probably due to interannual variations in surface conditions caused by changing snow cover in these areas. Results from the other two models are similar. The results from this study are expected to provide valuable inputs into BRDF retrieval algorithms proposed for future Earth Observation System (EOS) instruments
    Geoscience and Remote Sensing, 1997. IGARSS '97. Remote Sensing - A Scientific Vision for Sustainable Development., 1997 IEEE International; 09/1997
  • Article: An operational atmospheric correction algorithm for Landsat Thematic Mapper imagery over the land
    [show abstract] [hide abstract]
    ABSTRACT: An operational atmospheric correction algorithm for Thematic Mapper (TM) imagery has been developed for both sequential and parallel computer environments considering both aerosol and molecular scattering and absorption. The aerosol optical depth is estimated from the image itself using the dark object approach on a moving-window basis, and the surface reflectance is then retrieved by searching lookup tables that are created using a numerical radiative transfer code. The dark object pixels are identified and their surface reflectance estimated using TM channel 7 (2.1 mu m). A variety of techniques are employed to improve computational efficiency. This method is validated by measured aerosol optical depth and extensive visual evaluations accompanied by statistical analysis. Results indicate that the approach is highly stable and useful for both qualitative imagery interpretation (haze removal) and quantitative analysis. Future research activities are also highlighted. The computer codes are available to the general scientific community.
    Journal Of Geophysical Research-Atmospheres. 01/1997; 102(D14):17173-17186.
  • Conference Proceeding: Efficient algorithms for estimating atmospheric parameters for surface reflectance retrieval
    [show abstract] [hide abstract]
    ABSTRACT: The objective of atmospheric correction is to retrieve the surface reflectance from remotely sensed imagery by removing the atmospheric effects. We introduce an efficient algorithm to estimate the optical characteristics of the TM imagery and to remove the atmospheric effects from it. Our algorithm introduces a set of techniques to significantly improve the quality of the retrieved images. We pay a particular attention to the computational efficiency of the algorithm thereby allowing us to correct large TM images quite fast. We also provide a parallel implementation of our algorithm and show its portability and scalability on several parallel machines
    Parallel Processing, 1996., Proceedings of the 1996 International Conference on; 09/1996
  • Conference Proceeding: On combining technology and theory in search of a parallel computation model
    J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: A fundamental problem in parallel computing is to design high-level, architecture independent, algorithms that execute efficiently on general purpose parallel machines. The aim is to be able to achieve portability and high performance simultaneously. A key to accomplishing this is the existence of a computation model that can bridge the gap between the high level programming models and the underlying hardware models. There are currently two factors that make this fundamental problem more tractable. The first is the emergence of a dominant parallel architecture consisting of a number of powerful microprocessors interconnected by either a proprietary interconnect, or a standard off-the-shelf interconnect (such as an ATM switch). The second factor is the emergence of standards, such as the message passing standard MPI, for which efficient implementations are either available or about to appear on most machines. Our recent work has exploited these two developments by developing a methodology based on (1) a simple computation model for the current MIMD platforms that incorporates communication cost into the complexity of the algorithms, and (2) a SPMD programming model that makes effective use of communication primitives. We describe our approach for validating the computation model based on extensive experimentation and the development of benchmarks, and discuss its extension to the emerging clusters of Symmetric Multiprocessors (SMPs) architecture
    Parallel Processing, 1996. Proceedings of the 1996 ICPP Workshop on Challenges for; 09/1996
  • Conference Proceeding: Parallel algorithms for image enhancement and segmentation byregion growing with an experimental study
    [show abstract] [hide abstract]
    ABSTRACT: Presents efficient and portable implementations of a useful image enhancement process, the symmetric neighborhood filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call δ-connected components. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm based on a novel approach for parallel merging. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations). Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data
    Parallel Processing Symposium, 1996., Proceedings of IPPS '96, The 10th International; 05/1996
  • Source
    Conference Proceeding: Practical parallel algorithms for dynamic data redistribution,median finding, and selection
    D.A. Bader, J. Jaja
    [show abstract] [hide abstract]
    ABSTRACT: A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank i, for an arbitrarily given integer i. Practical algorithms needed by our selection algorithm for the dynamic redistribution of data are also discussed. Our general framework is a distributed memory programming model enhanced by a set of communication primitives. We use efficient techniques for distributing, coalescing, and load balancing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in SPLIT-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Gray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results illustrate the scalability and efficiency of our algorithms across different platforms and improve upon all the related experimental results known to the authors
    Parallel Processing Symposium, 1996., Proceedings of IPPS '96, The 10th International; 05/1996
  • Article: Fast algorithms for removing atmospheric effects from satellite images
    [show abstract] [hide abstract]
    ABSTRACT: The varied features of the earth's surface each reflect sunlight and other wavelengths of solar radiation in a highly specific way. This principle provides the foundation for the science of satellite based remote sensing. A vexing problem confronting remote sensing researchers, however, is that the reflected radiation observed from remote locations is significantly contaminated by atmospheric particles. These aerosols and molecules scatter and absorb the solar photons reflected by the surface in such a way that only part of the surface radiation can be detected by a sensor. The article discusses the removal of atmospheric effects due to scattering and absorption, ie., atmospheric correction. Atmospheric correction algorithms basically consist of two major steps. First, the optical characteristics of the atmosphere are estimated. Various quantities related to the atmospheric correction can then be computed by radiative transfer algorithms, given the atmospheric optical properties. Second, the remotely sensed imagery is corrected by inversion procedures that derive the surface reflectance. We focus on the second step, describing our work on improving the computational efficiency of the existing atmospheric correction algorithms. We discuss a known atmospheric correction algorithm and then introduce a substantially more efficient version which we have devised. We have also developed a parallel implementation of our algorithm
    IEEE Computational Science and Engineering 02/1996;
  • Conference Proceeding: Land cover dynamics investigation using parallel computers
    [show abstract] [hide abstract]
    ABSTRACT: A comprehensive and highly interdisciplinary research program is being carried out to investigate global land cover dynamics in heterogeneous parallel computing environments. Some of the problems are addressed including atmospheric correction, mixture modeling, image classifications by Markovian random fields and by segmentation, global image/map databases, object oriented parallel programming and parallel/IO. During the initial two years project, significant progress has been made in all of these areas
    Geoscience and Remote Sensing Symposium, 1995. IGARSS '95. 'Quantitative Remote Sensing for Science and Applications', International; 08/1995
  • Source
    Article: Efficient image processing algorithms on the scan line array processor
    D. Helman, J. JaJa
    [show abstract] [hide abstract]
    ABSTRACT: Develops efficient algorithms for low and intermediate level image processing on the scan line array processor, a SIMD machine consisting of a linear array of cells that processes images in a scan line fashion. For low level processing, the authors present algorithms for block DFT, block DCT, convolution, template matching, shrinking, and expanding which run in real-time. By real-time, the authors mean that, if the required processing is based on neighborhoods of size m×m, then the output lines are generated at a rate of O(m) operations per line and a latency of O(m) scan lines, which is the best that can be achieved on this model. The authors also develop an algorithm for median filtering which runs in almost real-time at a cost of O(m log m) time per scan line and a latency of [m/2] scan lines. For intermediate level processing, the authors present optimal algorithms for translation, histogram computation, scaling, and rotation. The authors also develop efficient algorithms for labelling the connected components and determining the convex hulls of multiple figures which run in O(n log n) and O(n log<sup>2</sup>n) time, respectively. The latter algorithms are significantly simpler and easier to implement than those already reported in the literature for linear arrays
    IEEE Transactions on Pattern Analysis and Machine Intelligence 02/1995; · 4.91 Impact Factor
  • Source
    Article: Scalable data parallel algorithms for texture synthesis using Gibbs random fields.
    [show abstract] [hide abstract]
    ABSTRACT: This article introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. The use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented enables machine-independent scalable algorithms for a number of problems in image processing and analysis.
    IEEE Transactions on Image Processing 02/1995; 4(10):1456-60. · 3.04 Impact Factor