How can I sort a huge file without using a large memory?
I need a C# code or algorithm for sorting a file that contain students records. I don't want use all memory for sorting this file and want to sort each record. Do you know similar code or algorithm?
If you are worried about memory consumption, and can subdivide your data a-priori, then you may want to write an algorithm that first partitions your larger file into smaller files (which may be done relatively memory efficient using a StreamReader), and then sorting the individual files.
A second option is to use an external sort algorithm. External sort algorithms store part of the data to be sorted on an external medium - such as a hard disk - and parts in memory. There are numerous implementations of such algorithms in C# out there. Here's one based on merge sort: http://www.splinter.com.au/sorting-enormous-files-using-a-c-external-mer/
I think the best strategy would be to divide the data in chunks. For example, let's say that you only want to spend 2MB of memory. You would read records until that memory is fulfilled, sort them and write them to a data_part.xxx file. When you are done, you only need to open all .xxx files, read one record at once from each file, and write them in order to the final file. That was how it was done in Cobol, at least.
Are you on a Linux environment??? it can be done using a one line command on Ubuntu terminal and definitely will not eat up your memory however large the file is...Simply use cat|grep|sort| Unix utilities...
The general method is to use a routine which uses drive space... break the large file into subfiles, each file is small enough to use a qsort (I.E, fill your internal array with n recod, using the memory amount you please for the array, sort, then write the sorted array).. Then use an interleaving routine to recombine (that is, open all the files, read the record from each file, write the sort order first record to output & increment that input file, repeat until all files exhausted).
Total disk usage is then triple your initial file if you keep all original files until the process is done, double if you delete the initial file after the subfiles are written.
I would try to use an external index file if the result is only to read the content in an ordered way. Anyway using the index you simply can copy the ordered readed element in a new file and then delete unordered one.
the merge sort answer is a good solution for this problem but the most recommend way for this kind of problem is using a b-tree data structure. why? In the merge sort way you can sorted the element but only that if you need to add or delete some information maintain the order could be a little complex and no "optimal". So using the b-tree you can maintain the order when you added and delete data.
This is the way of many databases system use for storage, maintain and order data.
The article provides the discussion of matters associated with the problems of transferring of object-oriented Windows applications from C++ programming language to .Net platform using C# programming language. C++ has always been considered to be the best language for the software development, but the implicit mistakes that come along with the tool...
Modularization of a software system leads to software that is more understandable and maintainable. Hence it is important to assess the modularization quality of a given system. In this paper, we define metrics for quantifying the level of modularization in Scala and C# systems. We propose metrics for Scala systems, measuring modularization with re...
One of the topics that should be covered in a CS1 course is Iterative Control Structures. This is an important but difficult topic for novice students who, for the first time, are learning a programming language. To improve the understanding of loops, a tool called COAC# was developed, to show in a graphic and animated form the loop functionality....