January 2017
·
272 Reads
·
24 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
January 2017
·
272 Reads
·
24 Citations
February 2012
·
12 Reads
·
1 Citation
The FG programming framework allows users to develop external-memory programs that mitigate high latency operations, such as disk accesses and interprocessor communication. Previous research culminated in FG 1.4, which structures programs as software pipelines that operate on large buffers, where each stage of the pipeline runs in its own thread. FG was originally developed with only linear pipeline structures in mind, but later versions, up to 1.4, expanded the original concept to include various nonlinear structures. When we reviewed the codebase for FG 1.4, we found that as FG picked up more features, the code became increasingly convoluted. Consequently, we decided to fully redesign and reimplement FG from a clean slate, producing FG 2.0. The underlying paradigm changes from a pipeline to a network of stages. With the network paradigm, several constructs that required special support in FG 1.4 come for free. Early experiences with FG 2.0 indicate that programs that use it are significantly smaller than they were with FG 1.4 and that they run faster.
May 2010
·
23 Reads
·
2 Citations
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/O and interprocessor communication by overlapping such high-latency operations with other operations. It does so by constructing and executing a coarse-grained software pipeline on each node of the cluster, where each stage of the pipeline runs in its own thread. The sorting program distributes data among the nodes to create sorted runs, and then it merges sorted runs on each node. When distributing data, the rates at which a node sends and receives data will differ. When merging sorted runs, each node will consume data from each of its sorted runs at varying rates. Under these conditions, a single pipeline running on each node is unwieldy to program and not necessarily efficient.We describe how we have extended FG to support multiple pipelines on each node in two forms. When a node might send and receive data at different rates during interprocessor communication, we use disjoint pipelines on each node: one pipeline to send and one pipeline to receive. When a node consumes and produces data from different streams on the node, we use multiple pipelines that intersect at a particular stage. Experimental results show that by using multiple pipelines, an out-of-core, distribution-based sorting program outperforms an out-of-core sorting program based on columnsort-taking approximately 75%-85% of the time-despite the advantages that the columnsort-based program holds.
April 2009
·
128 Reads
January 2009
·
20 Reads
·
2 Citations
January 2009
·
4,517 Reads
·
1,224 Citations
Aimed at any serious programmer or computer science student, the new second edition of _Introduction to Algorithms_ builds on the tradition of the original with a truly magisterial guide to the world of algorithms. Clearly presented, mathematically rigorous, and yet approachable even for the maths- averse, this title sets a high standard for a textbook and reference to the best algorithms for solving a wide range of computing problems. With sample problems and mathematical proofs demonstrating the correctness of each algorithm, this book is ideal as a textbook for classroom study, but its reach doesn't end there. The authors do a fine job at explaining each algorithm. (Reference sections on basic mathematical notation will help readers bridge the gap, but it will help to have some maths background to appreciate the full achievement of this handsome hardcover volume.) Every algorithm is presented in pseudo-code, which can be implemented in any computer language, including C/C++ and Java. This ecumenical approach is one of the book's strengths. When it comes to sorting and common data structures, from basic linked list to trees (including binary trees, red-black and B-trees), this title really shines with clear diagrams that show algorithms in operation. Even if you glance over the mathematical notation here, you can definitely benefit from this text in other ways. The book moves forward with more advanced algorithms that implement strategies for solving more complicated problems (including dynamic programming techniques, greedy algorithms, and amortised analysis). Algorithms for graphing problems (used in such real-world business problems as optimising flight schedules or flow through pipelines) come next. In each case, the authors provide the best from current research in each topic, along with sample solutions. This text closes with a grab bag of useful algorithms including matrix operations and linear programming, evaluating polynomials and the well-known Fast Fourier Transformation (FFT) (useful in signal processing and engineering). Final sections on "NP-complete" problems, like the well-known traveloling salesmen problem, show off that while not all problems have a demonstrably final and best answer, algorithms that generate acceptable approximate solutions can still be used to generate useful, real-world answers. Throughout this text, the authors anchor their discussion of algorithms with current examples drawn from molecular biology (like the Human Genome project), business, and engineering. Each section ends with short discussions of related historical material often discussing original research in each area of algorithms. In all, they argue successfully that algorithms are a "technology" just like hardware and software that can be used to write better software that does more with better performance. Along with classic books on algorithms (like Donald Knuth's three-volume set, _The Art of Computer Programming_), this title sets a new standard for compiling the best research in algorithms. For any experienced developer, regardless of their chosen language, this text deserves a close look for extending the range and performance of real-world software. _--Richard Dragan_
January 2009
·
444 Reads
·
3,361 Citations
July 2006
·
33 Reads
·
6 Citations
Algorithmica
Our goal is to develop a robust out-of-core sorting program for a distributed-memory cluster. The literature contains two dominant paradigms for out-of-core sorting algorithms: merging-based and partitioning-based. We explore a third paradigm, that of oblivious algorithms. Unlike the two dominant paradigms, oblivious algorithms do not depend on the input keys and therefore lead to predetermined I/O and communication patterns in an out-of-core setting. Predetermined I/O and communication patterns facilitate overlapping I/O, communication, and computation for efficient implementation. We have developed several out-of-core sorting programs using the paradigm of oblivious algorithms. Our baseline implementation, 3-pass columnsort, was based on Leighton's columnsort algorithm. Though efficient in terms of I/O and communication, 3-pass columnsort has a restriction on the maximum problem size. As our first effort toward relaxing this restriction, we developed two implementations: subblock columnsort and M-columnsort. Both of these implementations incur substantial performance costs: subblock columnsort performs additional disk I/O, and M-columnsort needs substantial amounts of extra communication and computation. In this paper we present slabpose columnsort, a new oblivious algorithm that we have designed explicitly for the out-of-core setting. Slabpose columnsort relaxes the problem-size restriction at no extra I/O or communication cost. Experimental evidence on a Beowulf cluster shows that unlike subblock columnsort and M-columnsort, slabpose columnsort runs almost as fast as 3-pass columnsort. To the best of our knowledge, our implementations are the first out-of-core multiprocessor sorting algorithms that make no assumptions about the keys and produce output that is perfectly load balanced and in the striped order assumed by the Parallel Disk Model.
October 2005
·
28 Reads
·
3 Citations
Lecture Notes in Computer Science
We compare two algorithms for sorting out-of-core data on a distributed-memory cluster. One algo- rithm, Csort, is a 3-pass oblivious algorithm. The other, Dsort, makes three passes over the data and is based on the paradigm of distribution-based algorithms. In the context of out-of-core sorting, this study is the rst comparison between the paradigms of distribution-based and oblivious algorithms. Dsort avoids two of the four steps of a typical distribution-based algorithm by making simplifying assumptions about the distribution of the input keys. Csort makes no assumptions about the keys. Despite the simplifying assumptions, the I/O and communication patterns of Dsort depend heavily on the exact sequence of input keys. Csort, on the other hand, takes advantage of predetermined I/O and communication patterns, gov- erned entirely by the input size in order to overlap computation, communication, and I/O. Experimental evidence shows that, even on inputs that followed Dsort's simplifying assumptions, Csort fared well. The running time of Dsort showed great variation across ve input cases, whereas Csort sorted all of them in approximately the same amount of time. In fact, Dsort ran signican tly faster than Csort in just one out of the ve input cases: the one that was the most unrealistically skewed in favor of Dsort. A more robust implementation of Dsortóone without the simplifying assumptionsówould run even slower.
May 2005
·
24 Reads
·
9 Citations
We describe new features of FG that are designed to improve performance and extend the range of computations that fit into its framework. FG (short for Framework Generator) is a programming environment for parallel programs running on clusters. It was originally designed to mitigate latency in accessing data by running a program as a series of asynchronous stages that operate on buffers in a linear pipeline. To improve performance, FG now allows stages to be replicated, either statically by the programmer or dynamically by FG itself. FG also now alters thread priorities to use resources more efficiently; again, this action may be initiated by either the programmer or FG. To extend the range of computations that fit into its framework, FG now incorporates fork-join and DAG structures. Not only do fork-join and DAG structures allow for more programs to be designed for FG, but they also can enable significant performance improvements over linear pipeline structures.
... In order to provide a comprehensive analysis of OE, a list of n=18 CFIAs was formed (Table 1) [20]. As listed in Table 1, CFIA OE are ambiguous in significance, then the question arises of forming a representative expert group to determine the appropriate SA, or, according to the provisions of the theory of informatics, a tuple [10,19,[21][22][23][24]. Individual SAs (ISAs) are usually built by the method of pairwise comparison and normative determination of part of the total significance of alternatives. ...
January 2017
... Opaque uses SGX-based ColumnSort for private-data analytics[78]. Unfortunately, any system based on ColumnSort has a maximum problem size that is induced by the permachine private-memory limit[14]. Thus, while ColumnSort's overhead is only 8× the dataset size, it can at most sort 118 million 318-byte records. Sorting is a brute-force way to shuffle: instead of producing any unpredictable data permutation, it picks exactly one unpredictable permutation and then sorts the data accordingly. ...
January 2003
... In order to inherently balance the SM voltages for noncoprime cases, dual-circulant modulation should be applied with the additional set ' alternating with the basic one. According to (3), the associated polynomial of ' would be ( ) = 1 + 2 + 3 + ⋯ + −1 + (8) Using the extended Euclidean algorithm [27], ( ) and ( ) can be represented as ...
January 2001
... An elegant approach to automate the generation of crossover paths is the use of the convex hull algorithm. The convex hull ( ) of a set of 2D planar points is defined as the convex polygon of the shortest perimeter which either contains or encloses all the points in [32]. Then, the set can be defined such that it contains a discretized set of all possible points the fiber can take. ...
January 2001
... It is stated in the text book that "this improvement works if we need only the length of an LCS; if we need to reconstruct the elements of an LCS, the smaller table does not keep enough information to retrace our steps in O(m+n) time" [6] . After modifying the algorithm, I would say that it is possible to retrace the table even after eliminating the first row and first column of the table, and making it smaller in O(m+n) time. ...
Reference:
A Survey on Longest Common Subsequence
... As a result, we can only run a small number of alternatives. Even when we select the best of them, this best-of-a-few is often still much worse than the actually optimal plan which we cannot nd, since for large number of components n, the number 2 n of possible combinations of open-close components becomes larger than the number of particles in the Universe; see, e.g., [1,2,8]. ...
January 1990
... Data-intensive computing is increasingly relying on data centers that host systems with combined processor counts ranging from thousands to millions and main memory sizes up to 10 petabytes. The demand for these shared resources has far exceeded their capacity, significantly increasing costs [17] [7]. This disparity has created a divide between those with access to these resources and those without access. ...
January 2001
... While these metrics do not exactly optimize P1 (because the throughput of nodes and edges changes depending on the graph topology), they serve as useful benchmarks for comparing the performance of different schemes. Moreover, graphs generated by A * always have spanningtree structures [54], ensuring that the graph solutions remain within the feasible domain of P1. ...
January 2001
... Rather than requiring the application to explicitly supply a list of future reads, a prefetching system can automatically generate the list—either from application source code, using static analysis [1, 4, 25, 26] , or from the running application, using speculative execution [3, 7]. Static analysis can generate file read lists, but data dependence and analytic imprecision may limit these methods to simple constructs that do not involve abstractions over I/O. ...
January 1994
... In this paper, we demonstrate that our programming environment, called ABCDEFG (FG for short) [9], reduces source code size for out-of-core implementations of bitmatrix-multiply/complement (BMMC) permutations, fast Fourier transform (FFT), and columnsort. Replacing each of these C and C* programs by a comparable program written with FG saves 468, 1322, and 2004 lines of source code, respectively. ...