ArticlePDF Available

# Rank Modulation for Flash Memories

Authors:

## Abstract

We explore a novel data representation scheme for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a "push-to-the-top" operation which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. We also investigate optimal rewriting schemes for translating arbitrary input alphabet into n-cell states which minimize the number of programming operations.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 2659
Rank Modulation for Flash Memories
Anxiao (Andrew) Jiang, Member, IEEE, Robert Mateescu, Member, IEEE, Moshe Schwartz, Member, IEEE, and
Jehoshua Bruck, Fellow, IEEE
Abstract—We explore a novel data representation scheme for
multilevel ﬂash memory cells, in which a set of cells stores infor-
mation in the permutation induced by the different charge levels
of the individual cells. The only allowed charge-placement mech-
anism is a “push-to-the-top” operation, which takes a single cell
of the set and makes it the top-charged cell. The resulting scheme
eliminates the need for discrete cell levels, as well as overshoot er-
rors, when programming cells.
We present unrestricted Gray codes spanning all possible -cell
states and using only “push-to-the-top” operations, and also
construct balanced Gray codes. One important application of
the Gray codes is the realization of logic multilevel cells, which
is useful in conventional storage solutions. We also investigate
rewriting schemes for random data modiﬁcation. We present both
an optimal scheme for the worst case rewrite performance and an
approximation scheme for the average-case rewrite performance.
Index Terms—Asymmetric channel, ﬂash memory, Gray codes,
permutations, rank modulation.
I. INTRODUCTION
FLASH memory is a nonvolatile memory both electrically
programmable and electrically erasable. Its reliability,
high storage density, and relatively low cost have made it a
dominant nonvolatile memory technology and a prominent
candidate to replace the well-established magnetic recording
technology in the near future.
The most conspicuous property of ﬂash storage is its inherent
asymmetry between cell programming (charge placement) and
cell erasing (charge removal). While adding charge to a single
cell is a fast and simple operation, removing charge from a
single cell is very difﬁcult. In fact, today, most (if not all) ﬂash
memory technologies do not allow a single cell to be erased but
rather only a large block of cells. Thus, a single-cell erase op-
eration requires the cumbersome process of copying an entire
block to a temporary location, erasing it, and then programming
all the cells in the block.
Manuscript received September 18, 2008; revised January 28, 2009. Current
version published May 20, 2009. This work was supported in part by the Cal-
tech Lee Center for Advanced Networking, by the National Science Foundation
(NSF) under Grant ECCS-0802107 and the NSF CAREER Award 0747415 , by
the GIF under Grant 2179-1785.10/2007, by the NSF-NRI, and by a gift from
Ross Brown. The material in this paper was presented in part at the IEEE Inter-
national Symposium on Information Theory, Toronto, ON, Canada, July 2008
A. Jiang is with the Department of Computer Science, Texas A&M Univer-
sity, College Station, TX 77843-3112 USA (e-mail: ajiang@cs.tamu.edu).
R. Mateescu and J. Bruck are with the Department of Electrical Engineering,
California Institute of Technology, Pasadena, CA 91125 USA (e-mail: ma-
M. Schwartz is with the Department of Electrical and Computer Engineering,
Ben-Gurion University, Beer-Sheva 84105, Israel (e-mail: schwartz@ee.bgu.ac.
il).
Communicated by T. Etzion, Associate Editor for Coding Theory.
Digital Object Identiﬁer 10.1109/TIT.2009.2018336
To keep up with the ever-growing demand for denser storage,
the multilevel ﬂash cell concept is used to increase the number
of stored bits in a cell [8]. Instead of the usual single-bit ﬂash
memories, where each cell is in one of two states (erased/pro-
grammed), each multilevel ﬂash cell stores one of levels and
can be regarded as a symbol over a discrete alphabet of size .
This is done by designing an appropriate set of threshold levels
which are used to quantize the charge level readings to symbols
from the discrete alphabet.
Fast and accurate programming schemes for multilevel ﬂash
memories are a topic of signiﬁcant research and design efforts
[2], [14], [31]. All these and other works share the attempt to
iteratively program a cell to an exact prescribed charge level
in a minimal number of programming cycles. As mentioned
above, ﬂash memory technology does not support charge re-
moval from individual cells. As a result, the programming cycle
sequence is designed to cautiously approach the target charge
level from below so as to avoid undesired global erases in case of
overshoots. Consequently, these attempts still require many pro-
gramming cycles, and they work only up to a moderate number
of levels per cell.
In addition to the need for accurate programming, the move to
multilevel ﬂash cells also aggravates reliability. The same relia-
bility aspects that have been successfully handled in single-level
ﬂash memories may become more pronounced and translate into
higher error rates in stored data. One such relevant example is
errors that originate from low memory endurance [5], by which
a drift of threshold levels in aging devices may cause program-
We therefore propose the rank-modulation scheme, whose
aim is to eliminate both the problem of overshooting while pro-
gramming cells, and the problem of memory endurance in aging
devices. In this scheme, an ordered set of cells stores the in-
formation in the permutation induced by the charge levels of the
cells. In this way, no discrete levels are needed (i.e., no need for
threshold levels) and only a basic charge-comparing operation
(which is easy to implement) is required to read the permutation.
If we further assume that the only programming operation al-
lowed is raising the charge levelof one of the cells above the cur-
rent highest one (push-to-the-top), then the overshoot problem
is no longer relevant. Additionally, the technology may allow in
the near future the decrease of all the charge levels in a block of
cells by a constant amount smaller than the lowest charge level
(block deﬂation), which would maintain their relative values,
and thus leave the information unchanged. This can eliminate
a designated erase step, by deﬂating the entire block whenever
the memory is not in use.
Once a new data representation is deﬁned, several tools are
required to make it useful. In this paper, we present Gray codes
that bring to bear the full representational power of rank mod-
0018-9448/$25.00 © 2009 IEEE Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2660 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 ulation, and data rewriting schemes. The Gray code [13] is an ordered list of distinct length binary vectors such that every two adjacent words (in the list) differ by exactly one bit ﬂip. They have since been generalized in countless ways and may now be deﬁned as an ordered set of distinct states for which every state is followed by a state such that , where is a transition function from a predetermined set deﬁning the Gray code. In the original code, is the set of all possible single bit ﬂips. Usually, the set consists of transi- tions that are minimal with respect to some cost function, thus creating a traversal of the state space that is minimal in total cost. For a comprehensive survey of combinatorial Gray codes, the reader is referred to [33]. One application of the Gray codes is the realization of logic multilevel cells with rank modulation. The traversal of states by the Gray code is mapped to the increase of the cell level in a classic multilevel ﬂash cell. In this way, rank modulation can be naturally combined with current multilevel storage solutions. Some of the Gray code constructions we describe also induce a simple algorithm for generating the list of permutations. Efﬁ- cient generation of permutations has been the subject of much research as described in the general survey [33], and the more speciﬁc [34] (and references therein). In [34], the transitions we use in this paper are called “nested cycling,” and the algorithms cited there produce lists that are not Gray codes since some of the permutations repeat, which makes the algorithms inefﬁcient. We also investigate efﬁcient rewriting schemes for rank mod- ulation. Since it is costly to erase and reprogram cells, we try to maximize the number of times data can be rewritten between two erase operations [4], [21], [22]. For rank modulation, the key is to minimize the highest charge level of cells. We present two rewriting schemes that are, respectively, optimized for the worst case and the average-case performance. Rank modulation is a new storage scheme and differs from existing data storage techniques. There has been some recent work on coding for ﬂash memories. Examples include ﬂoating codes [22], [23], which jointly record and rewrite multiple vari- ables, and buffer codes [4], [37], that keep a log of the recent modiﬁcations of data. Both ﬂoating codes and buffer codes use the ﬂash cells in a conventional way, namely, the ﬁxed discrete cell levels. Floating codes are an extension of the write-once memory (WOM) codes [6], [10], [11], [17], [32], [36], which are codes for effective rewriting of a single variable stored in cells that have irreversible state transitions. The study in this area also includes defective memories [16], [18], where defects (such as “stuck-at faults”) randomly happen to memory cells and how to store the maximum amount of information is considered. In all the above codes, unlike rank modulation, the states of different cells do not relate to each other. Also related is the work on per- mutation codes [3], [35], used for data transmission or signal quantization. The paper is organized as follows: Section II describes a Gray code that is cyclic and complete (i.e., it spans the entire sym- metric group of permutations); Section III introduces a Gray code that is cyclic, complete and balanced, optimizing the tran- sition step and also making it suitable for block deﬂation; Sec- tion IV shows a rewriting scheme that is optimal for the worst case rewrite cost; Section V presents a code optimized for the average rewrite cost with small approximation ratios; Section VI concludes this paper. II. DEFINITIONS AND BASIC CONSTRUCTION Let be a state space, and let be a set of transition func- tions, where every is a function .AGray code is an ordered list of distinct elements from such that for every , for some .If for some , then the code is cyclic.If the code spans the entire space we call it complete. Let denote the set of integers . An ordered set of ﬂash memory cells named , each containing a distinct charge level, induces a permutation of by writing the cell names in descending charge level , i.e., the cell has the highest charge level while has the lowest. The state space for the rank modulation scheme is therefore the set of all permutations over , denoted by . As described in the previous section, the basic minimal-cost operation on a given state is a “push-to-the-top” operation by which a single cell has its charge level increased so as to be the highest of the set. Thus, for our basic construction, the set of minimal-cost transitions between states consists of functions pushing the th element of the permutation, , to the front Throughout this work, our state space will be the set of permutations over , and our set of transition functions will be the set of “push-to-the-top” functions. We call such a code a length- rank modulation Gray code ( -RMGC). Example 1: An example of a 3-RMGC is the following: where the permutations are the columns being read from left to right. The sequence of operations creating this cyclic code is: , , , , , . This sequence will obviously create a Gray code regardless of the choice of the ﬁrst column. One important application of the Gray codes is the realiza- tion of logic multilevel cells. The traversal of states by the Gray code is mapped to the increase of the cell level in a classic mul- tilevel ﬂash cell. As an -RMGC has states, it can simulate a cell of up to discrete levels. Current data storage schemes (e.g., ﬂoating codes [22]) can therefore use the Gray codes as logic cells, as illustrated in Fig. 1, and get the beneﬁts of rank modulation. We will now show a basic recursive construction for -RMGCs. The resulting codes are cyclic and complete, in the sense that they span the entire state space. Our recursion basis is the simple 2-RMGC: , . Now let us assume we have a cyclic and complete -RMGC, which we call , deﬁned by the sequence of transitions and where , i.e., a “push-to-the-top” operation on the second element in the Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2661 Fig. 1. Two multilevel ﬂash-memory cells with six levels, currently storing the value “ .” (a) The ﬁrst is realized using a single multilevel cell with absolute thresholds. The possible transitions between states are shown to its right. (b) The second is realized by combining three ﬂash cells with no thresholds and by using a rank-modulation scheme. The possible transitions between states are given by the 3-RMGC of Example 1. permutation.1We further assume that the transition appears at least twice. We will now show how to construct , a cyclic and complete -RMGC with the same property. We set the ﬁrst permutation of the code to be , and then use the transitions to get a list of permutations which we call the ﬁrst block of the construction. By our assumption, the permutations in this list are all distinct, and they all share the property that their last element is (since all the transitions use just the ﬁrst elements). Furthermore, since , we know that the last permutation generated so far is . We now use to create the ﬁrst permutation of the second block of the construction, and then use again to create the entire second block. We repeat this process times, i.e., use the sequence of transitions a total of times to construct blocks, each containing permutations. The following two simple lemmas extend the intuition given above. 1This last requirement merely restricts us to have used somewhere since we can always rotate the set of transitions to make be the last one used. Lemma 2: The second element in the ﬁrst permutation in every block is . The ﬁrst element in the last permutation in every block is also . Proof: During the construction process, in each block we use the transitions in order. If we were to use the transition next, we would return to the ﬁrst permutation of the block since are the transitions of a cyclic -RMGC. Since the element is second in the initial permutation of the block, it follows that it is the ﬁrst element in the last permutation of the block. By the construction, we now use , thus making the element second in the ﬁrst permutation of the second block. By repeating the above arguments for each block we prove the lemma. Lemma 3: In any block, the last element of all the permuta- tions is constant. The sequence of last elements in the blocks constructed is . The element is never a last element. Proof: The ﬁrst claim is easily proved by noting that the transitions creating a block, , only operate on the ﬁrst positions of the permutations. Also, by the same logic used in the proof of the previous lemma, if the ﬁrst permutation of a block is , then the last permutation in a block is , and thus the ﬁrst permutation of the next block is . It follows that if we examine the sequence containing just the ﬁrst permutation in each block, the element remains ﬁxed, and the rest just rotate by one position each time. By the previous lemma, the ﬁxed element is , and therefore, the sequence of last elements is as claimed. Combining the two lemmas above, the blocks con- structed so far form a cyclic (but not complete) -RMGC, that we call , which may be schematically described as as shown at the bottom of the page (where each box represents a single block, and denotes the sequence of transitions ). It is now obvious that is not complete because it is missing exactly the permutations containing as their last el- ement. We build a block containing these permutations in the following way: we start by rotating the list of transitions such that its last transition is .2For con- venience, we denote the rotated sequence by , where . Assume the ﬁrst permutation in the block is . We set the following permuta- 2The transition must be present somewhere in the sequence or else the last element would remain constant, thus contradicting the assumption that the sequence generates a cyclic and complete -RMGC. . . .. . .. . .. . .. . .. . . Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2662 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 tions of the block to be the ones formed by the sequence of transitions . Thus, the last permutation in is . In , we look for a transition of the following form: . We contend that such a transition must surely exist: does not contain permu- tations in which is last, while it does contain permutations in which is next to last, and some where is the ﬁrst element. Since is cyclic, there must be at least one transition pushing an element from a next-to-last position to the ﬁrst position. At this transition we split and insert as follows: . . .. . .. . .. . . where it is easy to see all transitions are valid. Thus, we have created and to complete the recursion we have to make sure appears at least twice, but that is obvious since the sequence contains at least one occurrence of , and is replicated times, . We therefore reach the following conclusion. Theorem 4: For every integer there exists a cyclic and complete -RMGC. Example 5: We construct a 4-RMGC by recursively using the 3-RMGC shown in Example 1, to illustrate the construction process. The sequence of transitions for the 3-RMGC in Ex- ample 1 is , , , , , . As described in the construction, in order to use this code as a basis for the 4-RMGC construction, we need to have as the last transition. We therefore rotate the sequence of transitions to be , , , , , . The resulting ﬁrst three blocks, denoted , are To create the missing fourth block, , the construction requires a transition sequence ending with , so we use the original sequence , , , , , shown in Example 1. To decide the starting permutation of the block, we search for a transition of the form in . Several such transitions exist, and we arbitrarily choose seen in the ﬁfth and sixth columns of . The resulting missing block, ,is Inserting between the ﬁfth and sixth columns of results in the following 4-RMGC given at the bottom of the page. III. BALANCED -RMGCS While the construction for -RMGCs given in the previous section is mathematically pleasing, it suffers from a practical drawback: while the top-charged cells are changed (having their charge level increased while going through the permuta- tions of a single block), the bottom cell remains untouched and a large gap in charge levels develops between the least charged and most charged cells. When eventually, the least charged cell gets “pushed-to-the-top,” in order to acquire the target charge level, the charging of the cell may take a long time or involve large jumps in charge level (which are prone to cause write-dis- turbs in neighboring cells). The balanced -RMGC described in this section solves this problem. A. Deﬁnition and Construction In the current models of ﬂash memory, it is sometimes the case that due to precision constraints in the charge placement mechanism, the actual possible charge levels are discrete. The rank-modulation scheme is not governed by such constraints, since it only needs to order cell levels unambiguously by means of comparisons, rather than compare the cell levels against pre- deﬁned threshold values. However, in order to describe the fol- lowing results, we will assume abstract discrete levels, that can be understood as counting the number of push-to-the-top op- erations executed up to the current state. In other words, each push-to-the-top increases the maximum charge level by one. Thus, we deﬁne the function , where is the charge level of the th cell after the th programming cycle. It follows that if we use transition in the th pro- gramming cycle and the th cell is, at the time, th from the top, then , and for , . In an optimal setting with no overshoots, . The jump in the th round is deﬁned as , as- suming the th cell was the affected one. It is desirable, when programming cells, to make the jumps as small as possible. We deﬁne the jump cost of an -RMGC as the maximum jump during the transitions dictated by the code. We say an -RMGC Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2663 is nondegenerate if it raises each of its cells at least once. A non- degenerate -RMGC is said to be optimal if its jump cost is not larger than any other nondegenerate -RMGC. Lemma 6: For any optimal nondegenerate -RMGC, , the jump cost is at least . Proof: In an optimal -RMGC, , we must raise the lowest cell to the top charge level at least times. Such a jump must be at least of magnitude . We cannot, however, do these jumps consecutively, or else we return to the ﬁrst permutation after just steps. It follows that there must be at least one other transition , , and so the ﬁrst to be used after it jumps by at least a magnitude of . We call an -RMGC with a jump cost of a balanced -RMGC. We now show a construction that turns any -RMGC (balanced or not) into a balanced -RMGC. The orig- inal -RMGC is not required to be cyclic or complete, but if it is cyclic (complete) the resulting -RMGC will turn out to be also cyclic (complete). The intuitive idea is to base the con- struction on cyclic shifts that push the bottom to the top, and use them as often as possible. This is desirable because does not introduce gaps between the charge levels, so it does not ag- gravate the jump cost of the cycle. Moreover, partitions the set of permutations into orbits of length . Theorem 7 gives a construction where these orbits are traversed consecutively, based on the order given by the supporting -RMGC. Theorem 7: Given a cyclic and complete -RMGC, , deﬁned by the transitions , then the fol- lowing transitions deﬁne an -RMGC, denoted by , that is cyclic, complete and balanced: otherwise for all . Proof: Let us deﬁne the abstract transition , , that pushes to the bottom the th element from the bottom: Because is cyclic and complete, using starting with a permutation of produces a complete cycle through all the permutations of , and using them starting with a permutation of creates a cycle through all the permutations of with the respective ﬁrst element ﬁxed, because they operate only on the last elements. Because of the ﬁrst element being ﬁxed, those permu- tations of produced by , also have the prop- erty of being cyclically distinct. Thus, they are representatives of the distinct orbits of the permutations of under the operation , since represents a simple cyclic shift when operated on a permutation of . Taking a permutation of , then using the transition once, , followed by times using ,is equivalent to using Every transition of the form , , moves us to a different orbit of , while the consecutive executions of generate all the elements of the orbit. It follows that the resulting permutations are distinct. Schematically, the construction of based on is The code is balanced, because in every block of tran- sitions starting with a , we have: the transition has a jump of ; the following transitions have a jump of , and the rest a jump of .In addition, because is cyclic and complete, it follows that is also cyclic and complete. Theorem 8: For any , there exists a cyclic, complete, and balanced -RMGC. Proof: We can use Theorem 7 to recursively construct all the supporting -RMGCs, , with the basis of the recursion being the complete cyclic 2-RMGC: , . A similar construction, but using a more involved second- order recursion, was later suggested by Etzion [9]. Example 9: Fig. 2 shows the transitions of a recursive, bal- anced -RMGC for . The permutations are represented in an matrix, where each row is an orbit generated by . The transitions between rows occur when is the top element. Note how these permutations (the exit points of the orbits), after dropping the at the top and turning them upside down, form a 3-RMGC: This code is equivalent to the code from Example 1, up to a rotation of the transition sequence and the choice of ﬁrst per- mutation. Fig. 3 shows the charge levels of the cells for each programming cycle, for the resulting balanced 4-RMGC. B. Successor Function The balanced -RMGC can be used to implement a logic cell with levels. This can also be understood as a counter that increments its value by one unit at a time. The function takes as input the current permu- tation, and determines the transition to the next permutation in the balanced recursive -RMGC. If , the next transition is always (line 2). Otherwise, if the top element is not , then the current permutation is not at the exit point of its orbit, there- fore the next transition is (line 5). However, if is the top element, then the transition is deﬁned by the supporting cycle. The function is called recursively, on the reﬂected permutation of (line 7). An important practical aspect is the average number of steps required to decide which transition generates the next permu- tation from the current one. A step is deﬁned as a single query Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2664 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Fig. 2. Balanced 4-RMGC. Fig. 3. Charge level growth for the balanced 4-RMGC. of the form “what is the th highest charged cell?” namely the comparison in line 4. The function is asymptotically optimal with re- spect to this measure: Theorem 10: In the function , the asymptotic av- erage number of steps to create the successor of a given permu- tation is one. Proof: A fraction of of the transitions are , and these occur whenever the cell is not the highest charged one, and they are determined in just one step. Of the cases where is highest charged, by recursion, a fraction of the transitions are determined by just one more step, and so on. At the basis of the recursion, permutations over two elements require zero steps. Equivalently, the query “is equal to ” is performed for every permutation, therefore times; the query “is equal to ” is performed only for permutations, therefore times, and so on. Thus, the total number of queries is . Since , the asymptotic av- erage number of steps to generate the next permutation is as stated. C. Ranking Permutations In order to complete the design of the logic cell, we need to deﬁne the correspondence between a permutation and its rank in the balanced -RMGC. This problem is similar to that of ranking permutations in lexicographic order. We will ﬁrst re- view the factoradic numbering system, and then present a new numbering system that we call b-factoradic, induced by the bal- anced -RMGC construction. 1) Review of the Factoradic Numbering System: The fac- toradic is a mixed radix numbering system. The earliest ref- erence appears in [26]. Lehmer [27] describes algorithms that make the correspondence between permutations and factoradic. Any integer number can be repre- sented in the factoradic system by the digits , where for , and the weight of is (with the convention that ). The digit is always , and is sometimes omitted Any permutation has a unique factoradic repre- sentation that gives its position in the lexicographic ordering. The digits are in this case the number of elements smaller than that are to the right of . They are therefore in- version counts, and the factoradic representation is an inversion table (or vector) [15]. There is a large literature devoted to the study of ranking permutations from a complexity perspective. Translating be- tween factoradic and decimal representation can be done in Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2665 arithmetic operations. The bottleneck is how to trans- late efﬁciently between permutations and factoradic. A naive approach similar to the simple algorithms described in [27] requires . This can be improved to by using merge–sort counting, or a binary search tree, or modular arith- metic, all techniques described in [25]. This can be further improved to [30], by using the special data structure of Dietz [7]. In [30] linear time complexity is also achieved by departing from lexicographic ordering. A linear time complexity is ﬁnally achieved in [28], by using the fact that the word size has to be in order to represent numbers up to , and by hiding rich data structures in integers of this size. 2) B-Factoradic—A New Numbering System: We will now describe how to index permutations of the balanced recur- sive -RMGC with numbers from , such that consecutive permutations in the cycle have consecutive ranks modulo . The permutation that gets index is a special permutation that starts a new orbit generated by , and also starts a new orbit in any of the recursive supporting -RMGCs, . The rank of a permutation is determined by its position in the orbit of , and by the rank of the orbit, as given by the rank of the supporting permutation of . The position of a permutation inside an orbit of is given by the position of . If the current permutation is and for , then the position in the current orbit of is (because the orbit starts with in posi- tion ). The index of the current orbit is given by the rank of the supporting permutation of , namely, the rank of (notice that the permu- tation of is reﬂected). Therefore, if , then (1) The above formula can be used recursively to determine the rank of the permutations from the supporting balanced -RMGCs, for . It now becomes clear what permutation should take rank . The highest element in every supporting RMGC should be in the second position, therefore, , , , , and so on, and . Therefore, gets the rank . See Example 9 for the construction of the recursive and balanced 4-RMGC where the permutation has rank . Equation (1) induces a new numbering system that we call b-factoradic (backwards factoradic). A number can be represented by the digits , where and the weight of is . In this case is always and can be omitted. It is easy to verify that this is a valid numbering system, therefore, any has a unique b-factoradic representation such that The weights of the b-factoradic are sometimes called “falling factorials,” and can be represented succinctly by the Pochhammer symbol. Example 11: Let and be the current permutation. We can ﬁnd its b-factoradic representation as follows. We start from the least signiﬁcant digit , which is given by the position of minus modulo ,so (here we keep the elements of the permutation indexed from to ). We now recurse on the residual permuta- tion of ﬁve elements, (notice the reﬂected reading of this permutation, from towards the left). Now is given by the position of ; . The residual permutation is , therefore, . For the next step, and . Finally, and . As always . The b-factoradic representation is therefore , where the subscript indicates the position of the digit. Going from a b-factoradic representation to a permu- tation of the balanced -RMGC can follow a similar reversed procedure. The procedure of Example 11 can be formalized algorithmi- cally, however, its time complexity is , similar to the naive algorithms speciﬁc to translations between permutations in lex- icographic order and factoradic. We can in principle use all the available results for factoradic, described previously, to achieve time complexity of or lower. However, we are not going to repeat all those methods here, but rather describe a linear time procedure that takes a permutation and its factoradic as input and outputs the b-factoradic. We can thus leverage di- rectly all the results available for factoradic, and use them to determine the current symbol of a logic cell. The procedure - - exploits the fact that the inversion counts are already given by the factoradic representation, and they can be used to compute directly the digits of the b-factoradic. A b-factoradic digit is a count of the elements smaller than that lie between and when the permutation is viewed as a cycle. The direction of the count alternates for even and odd values of . The inverse of the input permutation can be computed in time (line 1). The position of every element of the permutation can then be computed in constant time (lines 2 and 5). The test in line 6 decides if we count towards the right or left starting from the position that holds element , until we reach position that holds element . By working out the cases when and we obtain the formulas in lines 7 and 9. Since this computation takes a constant number of arithmetic operations, the entire algorithm takes time. Unranking, namely, going from a number in to a permutation in balanced order is likely to never be necessary in practice, since the logic cell is designed to be a counter. How- ever, for completeness, we describe the simplest proce- dure , that takes a b-factoradic as input and produces the corresponding permutation. The procedure uses variable to simulate the cyclic counting of elements smaller than the cur- rent one. The direction of the counting alternates, based on the test in line 4. Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2666 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 IV. REWRITING WITH RANK-MODULATION CODES In Gray codes, the states transit along a well-designed path. What if we want to use the rank-modulation scheme to store data, and allow the data to be modiﬁed in arbitrary ways? Con- sider an information symbol that is stored using cells. In general, might be smaller than , so we might end up having permutations that are not used. On the other hand, we can map several distinct permutations to the same symbol in order to reduce the rewrite cost. We let denote the set of states (i.e., the set of permutations) that are used to represent information symbols. We deﬁne two functions, an interpreta- tion function,, and an update function, . Deﬁnition 12: The interpretation function maps every state to a value in . Given an “old state” and a “new information symbol” , the update function produces a state such that . When we use cells to store an information symbol, the permutation induced by the charge levels of the cells repre- sents the information through the interpretation function. We can start the process by programming some arbitrary initial per- mutation in the ﬂash cells. Whenever we want to change the stored information symbol, the permutation is changed using the “push-to-the-top” operations based on the update function. We can keep changing the stored information as long as we do not reach the maximal charge level possible in any of the cells. Therefore, the number of “push-to-the-top” operations in each rewrite operation determines not only the rewriting delay but also how much closer the highest cell-charge level is to the system limit (and therefore how much closer the cell block is to the next costly erase operation). Thus, the objective of the coding scheme is to minimize the number of “push-to-the-top” operations. Deﬁnition 13: Given two states , the cost of changing into , denoted , is deﬁned as the min- imum number of “push-to-the-top” operations needed to change into . For example, , . We deﬁne two important measures: the worst case rewrite cost and the average rewrite cost. Deﬁnition 14: The worst case rewrite cost is deﬁned as . Assume input symbols are independent and identically distributed (i.i.d.) random vari- ables having value with probability . Given a ﬁxed , the average rewrite cost given is deﬁned as . If we further assume some stationary probability distribution over the states , where we denote the probability of state as , then the average rewrite cost of the code is deﬁned as . (Note that for all , .) In this section, we present a code that minimizes the worst case rewrite cost. In Section IV-A, we focus on codes with good average rewrite cost. A. Lower Bound We start by presenting a lower bound on the worst case rewrite cost. Deﬁne the transition graph as a directed graph with , that is, with vertices representing the permu- tations in . For any , there is a directed edge from to iff . is a regular digraph, because every vertex has incoming edges and outgoing edges. The diameter of is . Given a vertex and an integer , deﬁne the ball centered at with radius as , and deﬁne the sphere centered at with radius as . Clearly By a simple relabeling argument, both and are independent of , and so will be denoted by and re- spectively. Lemma 15: For any . Proof: Fix a permutation . Let be the set of per- mutations having the following property: for each permutation , the elements appearing in its last positions appear Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2667 in the same relative order in . For example, if , , , and , the last three elements of —namely, —have the same relative order in .Itis easy to see that given , when the elements occupying the ﬁrst positions in are chosen, the last positions become ﬁxed. There are choices for occupying the ﬁrst positions of , hence, . We will show that a vertex is in if and only if . Suppose . It follows that can be obtained from with at most “push-to-the-top” operations. Those elements pushed to the top appear in the ﬁrst positions of , so the last positions of contain elements which have the same relative order in , thus, . Now suppose .For , let denote the element in the th position of . One can transform into by sequentially pushing to the top. Hence, . We conclude that . Since , the second claim follows. The following lemma presents a lower bound on the worst case rewrite cost. Lemma 16: Fix integers and , and deﬁne to be the smallest integer such that . For any code and any state , there exists such that , i.e., the worst case rewrite cost of any code is at least . Proof: By the deﬁnition of , . Hence, we can choose . Clearly, by our choice . B. Optimal Code We now present a code construction. It will be shown that the code has optimal worst case performance. First, let us deﬁne the following notation. Deﬁnition 17: Apreﬁx sequence is a sequence of distinct symbols from . The preﬁx set is deﬁned as all the permutations in which start with the sequence . We are now in a position to construct the code. Construction 18: Arbitrarily choose distinct preﬁx se- quences, , each of length . Let us deﬁne and map the states of to , i.e., for each and , set . Finally, to construct the update function ,given and some , we do the following: let be the ﬁrst elements which appear in all the permu- tations in . Apply push-to-the-top on the elements in to get a permutation for which, clearly, . Set . Theorem 19: The code in Construction 18 is optimal in terms of minimizing the worst case rewrite cost. Proof: First, the number of length preﬁx sequences is . By deﬁnition, the number of preﬁx sequences of length is at least , which allows the ﬁrst of the con- struction. To complete the proof, it is obvious from the descrip- tion of that the worst case rewrite cost of the construction is at most . By Lemma 16 this is also the best we can hope for. Example 20: Let , . Since , it fol- lows that . We partition the states into sets, which induce the mapping The cost of any rewrite operation is at most . V. O PTIMIZING AVERAGE REWRITE COST In this section, we study codes that minimize the average rewrite cost. We ﬁrst present a preﬁx-free code that is optimal in terms of its own design objective. Then, we show that this preﬁx-free code minimizes the average rewrite cost with an ap- proximation ratio if , and when , the ap- proximation ratio is further reduced to . A. Preﬁx-Free Code The preﬁx-free code we propose consists of preﬁx sets (induced by preﬁx sequences ) which we will map to the input symbols: for every and , we set . Unlike in the previous section, the preﬁx sequences are no longer necessarily of the same length. We do, however, require that no preﬁx sequence be the preﬁx of another. A preﬁx-free code can be represented by a tree. First, let us deﬁne a full permutation tree as follows. The vertices in are placed in layers, where the root is in layer and the leaves are in layer . Edges only exist between adjacent layers. For , a vertex in layer has children. The edges are labeled in such a way that every leaf corresponds to a permutation from which may be constructed from the labels on the edges from the root to the leaf. An example is given in Fig. 4(a). A preﬁx-free code corresponds to a subtree of (see Fig. 4(b) for an example). Every leaf is mapped to a preﬁx sequence which equals the string of labels as read on the path from the root to the leaf. For , let denote the preﬁx sequence representing , and let denote its length. For example, the preﬁx sequences in Fig. 4(b) have minimum length and maximum length . The average codeword length is deﬁned as Here, the probabilities are as deﬁned before, that is, infor- mation symbols are i.i.d. random variables having value with probability . We can see that with the preﬁx-free code, for every rewrite operation (namely, regardless of the old permutation before the rewriting), the expected rewrite cost is upper-bounded by . Our objective is to design a preﬁx-free code that minimizes its average codeword length. Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2668 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Fig. 4. Preﬁx-free rank-modulation code for and . (a) The full permutation tree . (b) A preﬁx-free code represented by a subtree of . The leaves represent the preﬁx sequences, which are displayed beside the leaves. Example 21: Let and , and let the preﬁx-free code be as shown in Fig. 4(b). We can map the information symbols to the preﬁx sequences as follows: Then, the mapping from the permutations to the information symbols is Assume that the current state of the cells is , representing the information symbol . If we want to rewrite the information symbol as , we can shift cells 3, 4 to the top to change the state to . This rewrite cost is , which does not exceed . In general, given any current state, considering all the possible rewrites, the expected rewrite cost is always less than , the average code- word length. The optimal preﬁx-free code cannot be constructed with a greedy algorithm like the Huffman code [19], because the in- ternal nodes in different layers of the full permutation tree have different degrees, making the distribution of the vertex de- grees in the code tree initially unknown. The Huffman code is a well-known variable-length preﬁx-free code, and many vari- ations of it have been studied. In [20], the Huffman code con- struction was generalized, assuming that the vertex-degree dis- tribution in the code tree is given. In [1], preﬁx-free codes for inﬁnite alphabets and nonlinear costs were presented. When the letters of the encoding alphabet have unequal lengths, only ex- ponential-time algorithms are known, and it is not known yet whether this problem is NP-hard [12]. To construct preﬁx-free codes for our problem, which minimize the average codeword length, we present a dynamic-programming algorithm of time complexity . Note that without loss of generality, we can assume the length of any preﬁx sequence to be at most . The algorithm computes a set of functions , for , , and . We interpret the meaning of as follows. We take a subtree of that con- tains the root. The subtree has exactly leaves in the layers . It also has at most vertices in the layer . We let the leaves represent the letters from the alphabet with the lowest probabilities : the further the leaf is from the root, the lower the corresponding probability is. Those leaves also form preﬁx sequences, and we call their weighted av- erage length (where the probabilities are weights) the value of the subtree. The minimum value of such a subtree (among all such subtrees) is deﬁned to be . In other words, is the minimum average preﬁx-sequence length when we assign a subset of preﬁx sequences to a subtree of (in the way described above). Clearly, the minimum average codeword length of a preﬁx-free code equals . Without loss of generality, let us assume that . It can be seen that the following recursions hold. • When and • When and • When • When and Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2669 Fig. 5. Three cases for computing in Example 22. The solid-line edges are in the subtree. The dotted-line edges are the remaining edges in the full-permutation tree . The leaves in the subtree are shown as black vertices. (a) No leaf in layer 2. (b) One leaf in layer 2. (c) Two leaves in layer 2. The last recursion holds because a subtree with leaves in layers and at most vertices in layer can have leaves in layer . The algorithm ﬁrst computes , then , and so on, until it ﬁnally computes , by using the above recursions. Given these values, it is straight- forward to determine in the optimal code, how many preﬁx sequences are in each layer, and therefore determine the optimal code itself. It is easy to see that the algorithm returns an optimal code in time . Example 22: Let and , and let us assume that . As an example, let us consider how to compute . By deﬁnition, corresponds to a subtree of with a total of four leaves in layer 2 and layer 3, and with at most three vertices in layer 2. Thus, there are four cases to consider: either there are zero, one, two, or three leaves in layer 2. The corresponding subtrees in the ﬁrst three cases are as shown in Fig. 5(a)–(c), respectively. The fourth case is actually impos- sible, because it leaves no place for the fourth leaf to exist in the subtree. If layer 2 has leaves , then layer 3 has leaves and there can be at most vertices in layer 3 of the sub- tree. To assign to the four leaves and minimize the weighted average distance of the leaves to the root (which is de- ﬁned as ), among the four cases mentioned above, we choose the case that minimizes that weighted average distance. Therefore Now assume that after computing all the ’s, we ﬁnd that That means that in the optimal code tree, there are two leaves in layer 1. If we further assume that we can determine that there are ﬁve leaves in layer 2, and the optimal code tree will be as shown in Fig. 4(b). We can use the preﬁx-free code for rewriting in the following way: to change the information symbol to , push at most cells to the top so that the top-ranked cells are the same as the codeword . B. Performance Analysis We now analyze the average rewrite cost of the preﬁx-free code. We obviously have . When , the code design becomes trivial—each permutation is assigned a distinct input symbol. In this subsection, we prove that the preﬁx-free code has good approximation ratios under mild conditions: when , the average rewrite cost of a preﬁx-free code (that was built to minimize its average codeword length) is at most three times the average rewrite cost of an optimal code (i.e., a code that minimizes the average rewrite cost), and when , the approximation ratio is further reduced to . Loosely speaking, our strategy for proving this approxima- tion ratio involves an initial simple bound on the rewrite cost of any code when considering a rewrite operation starting with a Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2670 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 stored symbol . We then proceed to deﬁne a preﬁx-free code which locally optimizes (up to the approximation ratio) rewrite operations starting with stored symbol . Finally, we in- troduce the globally optimal preﬁx-free code of the previous section, which optimizes the average rewrite cost, and show that it is still within the correct approximation ratio. We start by bounding from below the average rewrite cost of any code, depending on the currently stored information symbol. Suppose we are using some code with an interpreta- tion function and an update function . Furthermore, let us assume the currently stored information symbol is in some state , i.e., . We want to consider rewrite operations which are meant to store the value instead of , for all . Without loss of generality, assume that the probabilities of information symbols are monotonically decreasing Let us denote by the closest permutations to ordered by increasing distance, i.e., and denote for every . We note that are independent of the choice of , and fur- thermore, that while . The average rewrite cost of a stored symbol using a code is the weighted sum This sum is minimized when are assigned the closest permutations to with higher probability in- formation symbols mapped to closer permutations. For conve- nience, let us deﬁne the functions . Thus, the average rewrite cost of a stored symbol , under any code, is lower-bounded by We continue by considering a speciﬁc intermediary preﬁx- free code that we denote by . Let it be induced by the preﬁx sequences . We require the following two proper- ties: P.1 For every , , we require . P.2 . We also note that is not necessarily a preﬁx-free code with minimal average codeword length. Finally, let be a preﬁx-free code that minimizes its average codeword length. Let be induced by the preﬁx sequences , and let be any state such that . Denote by the average rewrite cost of a rewrite operation under starting from state . By the deﬁnition of and we have Since it follows that Since the same argument works for every , we can say that (2) It is evident that the success of this proof strategy hinges on the existence of for every , which we now turn to consider. The following lemma is an application of the well-known Kraft–McMillan inequality [29]. Lemma 23: Let be nonnegative integers. There exists a set of preﬁx sequences with exactly preﬁx sequences of length , for (i.e., there are leaves in layer of the code tree ), if and only if Let us deﬁne the following sequence of integers: , , . We ﬁrst contend that they are all nonnegative. We only need to check and indeed It is also clear that In fact, in the following analysis, represent a partition of the alphabet letters. Lemma 24: When , there exists a set of preﬁx se- quences that contains exactly preﬁx sequences of length , for . Proof: Let us denote (3) Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2671 When respectively. Thus, for all .Wenow show that when , monotonically decreases in . Substituting into (3) we get After some tedious rearrangement, for any integer Hence, monotonically decreases for all which im- mediately gives us for all . By Lemma 23, the proof is complete. We are now in a position to show the existence of , , for . By Lemma 24, let be a list of preﬁx sequences, where exactly of the sequences are of length . Without loss of generality, assume Remember we also assume We now deﬁne to be the preﬁx-free code induced by the preﬁx sequences that is, for all , . Note that for all , the preﬁx sequence represents the information symbol , which is associated with the probability in rewriting. Lemma 25: The properties P.1 and P.2 hold for , . Proof: Property P.2 holds by deﬁnition, since whose length is set to . To prove property P.1 holds, we ﬁrst note that when , for all there are exactly indices for which . On the other hand, when , among the preﬁx sequences we have of them of length when , and the rest of them are of length . Intuitively speaking, we can map the indices for which to distinct preﬁx sequences of length , the indices for which to distinct preﬁx sequences of length , and so on. Since the preﬁx sequences are arranged in ascending length order it follows that for every , Hence, property P.1 holds. We can now state the main theorem. Theorem 26: Fix some and let be a preﬁx-free code over which minimizes its average codeword length. For any rewrite operation with initial stored information symbol i.e., the average cost of rewriting under is at most three times the lower bound. Proof: Deﬁne and consider the input alphabet with input symbols being i.i.d. random variables where symbol appears with probability . We set . Let be a preﬁx-free code over which minimizes its av- erage codeword length. A crucial observation is the following: , the lower bound on the average rewrite cost of symbol , does depend on the probability distribution of the input symbols. Let us therefore distinguish between over , and over . However, by deﬁnition, and by our choice of probability distribution over for every . Since is a more restricted version of ,it obviously follows that for every . By applying inequality (2), and since by Lemma 25, the code exists over , we get that for all . Corollary 27: When , the average rewrite cost of a preﬁx-free code minimizing its average codeword length is at most three times that of an optimal code. Proof: Since the approximation ratio of holds for every rewrite operation (regardless of the initial state and its interpre- tation), it also holds for any average case. With a similar analysis, we can prove the following result: Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. 2672 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Theorem 28: Fix some , , and let be a preﬁx-free code over which minimizes its average codeword length. For any rewrite operation with initial stored information symbol i.e., the average cost of rewriting under is at most twice the lower bound. Proof: See the Appendix. Corollary 29: When , , the average rewrite cost of a preﬁx-free code minimizing its average codeword length is at most twice that of an optimal code. VI. CONCLUSION In this paper, we present a new data storage scheme, rank modulation, for ﬂash memories. We show several Gray code constructions for rank modulation, as well as data rewriting schemes. One important application of the Gray codes is the realization of logic multilevel cells. For data rewriting, an op- timal code for the worst case performance is presented. It is also shown that to optimize the average rewrite cost, a preﬁx-free code can be constructed in polynomial time that approximates an optimal solution well under mild conditions. There are many open problems concerning rank modulation, such as the con- struction of error-correcting rank-modulation codes and codes for rewriting that are robust to uncertainties in the information symbol’s probability distribution. Some of these problems have been addressed in some recent work [24]. APPENDIX In this appendix, we prove Theorem 28. The general approach is similar to the proof of Theorem 26, so we only specify some details that are relatively important here. We deﬁne the following sequence of numbers: , , . As before, we contend that they are all nonnegative. We only need to check and indeed, for We now prove the equivalent of Lemma 24. Lemma 30: When , , there exists a set of preﬁx sequences that contains exactly preﬁx sequences of length , for . Proof: Let us denote (4) When respectively. Thus, for all .Wenow show that when , monotonically decreases in . Substituting into (4) we get After some tedious rearrangement, for any integer Hence, monotonically decreases for all which immediately gives us for all . By Lemma 23, the proof is complete. The remaining lemmas comprising the rest of the proof pro- cedure are similar to those of Section V-B. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers, whose comments helped improve the presentation of the paper. REFERENCES [1] M. B. Baer, “Optimal preﬁx codes for inﬁnite alphabets with nonlinear costs,” IEEE Trans. Inf. Theory, vol. 54, no. 3, pp. 1273–1286, Mar. 2008. [2] A. Bandyopadhyay, G. Serrano, and P. Hasler, “Programming analog computational memory elements to 0.2% accuracy over 3.5 decades using a predictive method,” in Proc. IEEE Int. Symp. Circuits and Sys- tems, Kobe, Japan, May 2005, pp. 2148–2151. [3] T. Berger, F. Jelinek, and J. K. Wolf, “Permutation codes for sources,” IEEE Trans. Inf. Theory, vol. IT-18, no. 1, pp. 160–169, Jan. 1972. [4] V. Bohossian, A. Jiang, and J. Bruck, “Buffer coding for asymmetric multi-level memory,” in Proc. IEEE Int. Symp. Information Theory (ISIT2007), Nice, France, Jun. 2007, pp. 1186–1190. [5] P. Cappelletti and A. Modelli, “Flash memory reliability,” in Flash Memories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni, Eds. Amsterdam, The Netherlands: Kluwer, 1999, pp. 399–441. [6] G. D. Cohen, P. Godlewski, and F. Merkx, “Linear binary code for write-once memories,” IEEE Trans. Inf. Theory, vol. IT-32, no. 5, pp. 697–700, Sep. 1986. [7] P. Dietz, Optimal Algorithms for List Indexing and Subset Rank. London, U.K.: Springer-Verlag, 1989. [8] B. Eitan and A. Roy, “Binary and multilevel ﬂash cells,” in Flash Mem- ories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni, Eds. Ams- terdam, The Netherlands: Kluwer, 1999, pp. 91–152. [9] T. Etzion, Oct. 2007, personal communication. Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. JIANG et al.: RANK MODULATION FOR FLASH MEMORIES 2673 [10] A. Fiat and A. Shamir, “Generalized write-once memories,” IEEE Trans. Inf. Theory, vol. IT-30, no. 3, pp. 470–480, May 1984. [11] F.-W. Fu and A. J. H. Vinck, “On the capacity of generalized write-once memory with state transitions described by an arbitrary directed acyclic graph,” IEEE Trans. Inf. Theory, vol. 45, no. 1, pp. 308–313, Jan. 1999. [12] M. J. Golin and G. Rote, “A dynamic programming algorithm for con- structing optimal preﬁx-free codes with unequal letter costs,” IEEE Trans. Inf. Theory, vol. 44, no. 5, pp. 1770–1781, Sep. 1998. [13] F. Gray, “Pulse Code Communication,” U.S. Patent 2632058, Mar. 1953. [14] M. Grossi, M. Lanzoni, and B. Ricco, “Program schemes for multilevel ﬂash memories,” Proc. IEEE, vol. 91, no. 4, pp. 594–601, Apr. 2003. [15] M. Hall, Jr and D. E. Knuth, “Combinatorial analysis and computers,” Amer. Math. Monthly, vol. 72, no. 2, pp. 21–28, 1965. [16] A. J. H. Vinck and A. V. Kuznetsov, “On the general defective channel with informed encoder and capacities of some constrained memories,” IEEE Trans. Inf. Theory, vol. 40, no. 6, pp. 1866–1871, Nov. 1994. [17] C. D. Heegard, “On the capacity of permanent memory,” IEEE Trans. Inf. Theory, vol. IT-31, no. 1, pp. 34–42, Jan. 1985. [18] C. D. Heegard and A. A. El Gamal, “On the capacity of computer memory with defects,” IEEE Trans. Inf. Theory, vol. IT-29, no. 5, pp. 731–739, Sep. 1983. [19] D. A. Huffman, “A method for the construction of minimum-redun- dancy codes,” Proc. IRA, vol. 40, no. 9, pp. 1098–1101, Sep. 1952. [20] F. K. Hwang, “Generalized Huffman trees,” SIAM J. Appl. Math., vol. 37, no. 1, pp. 124–127, 1979. [21] A. Jiang, “On the generalization of error-correcting WOM codes,” in Proc. IEEE Int. Symp. Information Theory (ISIT2007), Nice, France, Jun. 2007, pp. 1391–1395. [22] A. Jiang, V. Bohossian, and J. Bruck, “Floating codes for joint in- formation storage in write asymmetric memories,” in Proc. IEEE Int. Symp. Information Theory (ISIT2007), Nice, France, Jun. 2007, pp. 1166–1170. [23] A. Jiang and J. Bruck, “Joint coding for ﬂash memory storage,” in Proc. IEEE Int. Symp. Information Theory (ISIT2008), Toronto, ON, Canada, Jul. 2008, pp. 1741–1745. [24] A. Jiang, M. Schwartz, and J. Bruck, “Error-correcting codes for rank modulation,” in Proc. IEEE Int. Symp. Information Theory (ISIT2008), Toronto, ON, Canada, Jul. 2008, pp. 1736–1740. [25] D. E. Knuth, The Art of Computer Programming Volume 3: Sorting and Searching, 2nd ed. Reading, MA: Addison-Wesley, 1998. [26] C. A. Laisant, “Sur la numération factorielle, application aux permuta- tions,” Bull. Société Mathématique de France, vol. 16, pp. 176–183. [27] D. H. Lehmer, “Teaching combinatorial tricks to a computer,” in Proc. Symp. Applied Mathematics and Combinatorial Analysis, 1960, vol. 10, pp. 179–193. [28] M. Mares and M. Straka, “Linear-time ranking of permutations,” Algo- rithms-ESA, pp. 187–193, 2007. [29] B. McMillan, “Two inequalities implied by unique decipherability, IEEE Trans. Inf. Theory, vol. 2, no. 4, pp. 115–116, 1956. [30] W. Myrvold and F. Ruskey, “Ranking and unranking permutations in linear time,” Inf. Process. Lett., vol. 79, no. 6, pp. 281–284, 2001. [31] H. Nobukata, S. Takagi, K. Hiraga, T. Ohgishi, M. Miyashita, K. Kamimura, S. Hiramatsu, K. Sakai, T. Ishida, H. Arakawa, M. Itoh, I. Naiki, and M. Noda, “A 144-Mb, eight-level nand ﬂash memory with optimized pulsewidth programming,” IEEE J. Solid-State Circuits, vol. 35, no. 5, pp. 682–690, May 2000. [32] R. L. Rivest and A. Shamir, “How to reuse a write-once memory,” Inf. Contr., vol. 55, pp. 1–19, 1982. [33] C. D. Savage, “A survey of combinatorial gray codes,SIAM Rev., vol. 39, no. 4, pp. 605–629, Dec. 1997. [34] R. Sedgewick, “Permutation generation methods,” Comput. Surv., vol. 9, no. 2, pp. 137–164, Jun. 1977. [35] D. Slepian, “Permutation modulation,” Proc. IEEE, vol. 53, no. 3, pp. 228–236, Mar. 1965. [36] J. K. Wolf, A. D. Wyner, J. Ziv, and J. Körner, “Coding for a write-once memory,AT&T Bell Labs. Tech. J., vol. 63, no. 6, pp. 1089–1112, 1984. [37] E. Yaakobi, P. H. Siegel, and J. K. Wolf, “Buffer codes for multi-level ﬂash memory,” presented at the Poster Session of the IEEE Int. Symp. Information Theory, Toronto, ON, Canada, Jul. 2008. Anxiao (Andrew) Jiang (S’00)-M’04) received the B.S. degree in electronic engineering from Tsinghua University, Beijing, China, in 1999 and the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Tech- nology, Pasadena, in 2000 and 2004, respectively. He is currently an Assistant Professor in the Computer Science and Engi- neering Department at Texas A&M University, College Station. His research interests include information theory, data storage, networks, and algorithm de- sign. Prof. Jiang is a recipient of the NSF CAREER Award in 2008 for his research on information theory for ﬂash memories. Robert Mateescu (M’08) received the B.S. degree in computer science and mathematics from the University of Bucharest, Bucharest, Romania, in 1997 and the M.S. and Ph.D. degrees in information and computer science from the University of California, Irvine, in 2003 and 2007, respectively. He is currently a Postdoctoral Scholar in Electrical Engineering at the Cal- ifornia Institute of Technology, Pasadena. His research interests include algo- rithms for the representation and inference of information. Moshe Schwartz (M’03) was born in Israel in 1975. He received the B.A., M.Sc., and Ph.D. degrees from the Technion–Israel Institute of Technology, Haifa, Israel, in 1997, 1998, and 2004, respectively, all from the Computer Sci- ence Department. He was a Fulbright Postdoctoral Researcher in the Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, and a Postdoctoral Researcher in the Department of Electrical Engineering, California Institute of Technology, Pasadena. He now holds a position with the Department of Electrical and Computer Engineering, Ben-Gurion University, Beer-Sheva, Israel. His research interests include algebraic coding, combinatorial structures, and digital sequences. Jehoshua Bruck (S’86–M’89–SM’93–F’01) received the B.Sc. and M.Sc. de- grees in electrical engineering from the Technion–Israel Institute of Technology, Haifa, Israel, in 1982 and 1985, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1989. He is the Gordon and Betty Moore Professor of Computation and Neural Sys- tems and Electrical Engineering at the California Institute of Technology (Cal- tech), Pasadena. His research focuses on information theory and systems and the theory biological networks. He has an extensive industrial experience. He worked at IBM Research where he participated in the design and implementa- tion of the ﬁrst IBM parallel computer. He was a cofounder and Chairman of Rainﬁnity, a spinoff company from Caltech that focused on software products for management of network information storage systems. Dr. Bruck received the National Science Foundation Young Investigator award, the Sloan fellowship, and the 2005 S. A. Schelkunoff Transactions prize paper award from the IEEE Antennas and Propagation society. Authorized licensed use limited to: Texas A M University. Downloaded on May 29, 2009 at 19:23 from IEEE Xplore. Restrictions apply. ... Rank modulation was proposed as a solution to the challenges posed by flash memory storages [8]. In the rank modulation framework, codes are permutation codes, where by a permutation code (PC) of length n we simply mean a nonempty subset C of S n , the set of all permutations of [n] := {1, 2, . . . ... ... For two permutations ρ, π ∈ S n , the Kendall τ -distance between ρ and π, d K (ρ, π), is defined as the minimum number of adjacent transpositions needed to transform ρ into π. Under the Kendall τ -distance a PC of length n with minimum distance d can correct up to d−1 2 errors caused by charge-constrained errors [8]. ... ... The maximum size of a PC of length n and minimum Kendall τ -distance d is denoted by P (n, d). Several researchers have presented bounds on P (n, d) (see [1,2,8,10,11,12]), some of these results are shown in Table 1. It is known that P (n, 1) = n! and P (n, 2) = n! 2 . ... Preprint Full-text available We study$P(n,3)$, the size of the largest subset of the set of all permutations$S_n$with minimum Kendall$\tau$-distance$3$. Using a combination of group theory and integer programming, we reduced the upper bound of$P(p,3)$from$(p-1)!-1$to$(p-1)!-\lceil\frac{p}{3}\rceil+2\leq (p-1)!-2$for all primes$p\geq 11$. In special cases where$n$is equal to$6,7,11,13,14,15$and$17$we reduced the upper bound of$P(n,3)$by$3,3,9,11,1,1$and$4$, respectively. ... (ii) Up to now, various coding techniques have been applied to alleviate the detection in case of channel mismatch, such as, rank modulation [32], balanced codes [33][34][35][36][37], and composition check codes [38]. ... ... In rank modulation [32], data is carried by the relative charge levels of many cells and not by the charge level in a single cell. Assume a sequence of the charge levels in 5 cells is (6,1,3,2,10). ... ... Other approaches are errorcorrecting techniques. Up to now, various coding techniques have been applied to alleviate the detection in case of channel mismatch, specifically rank modulation [32], balanced codes [34], and composition check codes [38]. These methods are often considered too expensive in terms of redundancy and complexity. ... ... Flash memory is a non-volatile storage medium that is both electrically programmable and erasable. The rank modulation scheme for flash memories has been proposed in [7]. In this scheme, a permutation corresponds to a relative ranking of all the flash memory cells' levels. ... ... Thus, in these cases, we still have the representation in (10). By (7) and (10), we can obtain the expression of S n K (i) in the above lemma. ... ... Example 5 Let n = 11 and π = [3, 2, 1, 4, 5, 6,7,8,9,10,11]. Consider T (5) 11,(3,3) (π), we obtain the two kinds of permutations in T (5) 11,(3,3) (π). By using an adjacent transpositions on the former 2 elements of π, we obtain the first kind of permutation σ = [2,3,1,4,5,6,7,8,9,10,11]. ... Article Full-text available In the rank modulation scheme for flash memories, permutation codes have been studied. In this paper, we study perfect permutation codes in Sn, the set of all permutations on n elements, under the Kendall τ-metric. We answer one open problem proposed by Buzaglo and Etzion. That is, proving the nonexistence of perfect codes in Sn, under the Kendall τ-metric, for more values of n. Specifically, we present the polynomial representation of the size of a ball in Sn under the Kendall τ-metric for some radius r, and obtain some sufficient conditions of the nonexistence of perfect permutation codes. Further, we prove that there does not exist a perfect t-error-correcting code in Sn under the Kendall τ-metric for some n and t=2,3,4,5,or58n2<2t+1≤n2. ... In RM scheme for flash memory, information is stored in the form of rankings of cell charges [23], [24]. The translocation error, which is an extension of another well-studied error, the adjacent transposition error [1], [5], [7], [24], [31], is caused by moving the rankings of one cell below a certain number of closest ranked cells. ... ... The data may be vulnerable to noises caused by potential cell over-injection, charge leakage, and read/write disturbance. The translocation errors were defined to characterize the noises [13], [23], [24]. For a permutation π = (x 1 , . . . ... Preprint Permutation codes were extensively studied in order to correct different types of errors for the applications on power line communication and rank modulation for flash memory. In this paper, we introduce the neural network decoders for permutation codes to correct these errors with one-shot decoding, which treat the decoding as$n$classification tasks for non-binary symbols for a code of length$n$. These are actually the first general decoders introduced to deal with any error type for these two applications. The performance of the decoders is evaluated by simulations with different error models. ... Another suggestion put forth by [16], and later studied by [21], was to employ the rank-modulation scheme over the profile vectors. Rank modulation has a long history, starting with [4], [6], [22] for vector digitization and signal detection, through communication over power lines [25], and more recently, for information storage in non-volatile memories [14]. In our context, instead of storing the information in the profile vector, whose integer entries count the number of occurrences of each ℓ-gram from Σ ℓ , the information is stored in the permutation over Σ ℓ which is the ranking (by frequency of appearance) of the entries of the profile vector. ... ... The identity of the set A depends on the specifics of the applications. As examples we bring [6] dealing with signal detection with impulsive noise, [25] for powerline communications, and [14] for coding in flash memories. ... Preprint We study permutations over the set of$\ell$-grams, that are feasible in the sense that there is a sequence whose$\ell$-gram frequency has the same ranking as the permutation. Codes, which are sets of feasible permutations, protect information stored in DNA molecules using the rank-modulation scheme, and read using the shotgun sequencing technique. We construct systematic codes with an efficient encoding algorithm, and show that they are optimal in size. The length of the DNA sequences that correspond to the codewords is shown to be polynomial in the code parameters. Non-systematic with larger size are also constructed. ... Researchers have also proposed some innovative data representation schemes with different requirements in terms of read thresholds. For example, rank modulation [17], [18], [19], [20], [21] stores information in the relative voltages between the cells instead of using pre-defined voltage levels. The strategy of writing data represented by rank modulation in parallel to flash memories is studied in [22]. ... Preprint Full-text available A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of re-reads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, the paper attempts to find a read threshold minimizing bit-error-rate (BER) and derives an expression for the resulting codeword-error-rate. For soft decoding, it shows that minimizing BER and minimizing codeword-error-rate are competing objectives in the presence of a limited number of allowed re-reads, and proposes a trade-off between the two. The proposed method does not require any prior knowledge about the noise distribution, but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise for each level but the paper explains how the method could be extended to other scenarios. ... Flash memory is a non-volatile storage medium that is both electrically programmable and erasable. The rank modulation scheme for flash memories has been proposed in [2]. In this scheme, one permutation corresponds to a relative ranking of all the flash memory cells' levels. ... Preprint In the rank modulation scheme for flash memories, permutation codes have been studied. In this paper, we study perfect permutation codes in$S_n$, the set of all permutations on$n$elements, under the Kendall \tau-Metric. We answer one open problem proposed by Buzaglo and Etzion. That is, proving the nonexistence of perfect codes in$S_n$, under the Kendall \tau-metric, for more values of$n$. Specifically, we present the recursive formulas for the size of a ball with radius$r$in$S_n$under the Kendall \tau-metric. Further, We prove that there are no perfect$t$-error-correcting codes in$S_n$under the Kendall$\tau$-metric for some$n$and$t\$=2,3,4,or 5.
Preprint
Permutation matrices form an important computational building block frequently used in various fields including e.g., communications, information security and data processing. Optical implementation of permutation operators with relatively large number of input-output interconnections based on power-efficient, fast, and compact platforms is highly desirable. Here, we present diffractive optical networks engineered through deep learning to all-optically perform permutation operations that can scale to hundreds of thousands of interconnections between an input and an output field-of-view using passive transmissive layers that are individually structured at the wavelength scale. Our findings indicate that the capacity of the diffractive optical network in approximating a given permutation operation increases proportional to the number of diffractive layers and trainable transmission elements in the system. Such deeper diffractive network designs can pose practical challenges in terms of physical alignment and output diffraction efficiency of the system. We addressed these challenges by designing misalignment tolerant diffractive designs that can all-optically perform arbitrarily-selected permutation operations, and experimentally demonstrated, for the first time, a diffractive permutation network that operates at THz part of the spectrum. Diffractive permutation networks might find various applications in e.g., security, image encryption and data processing, along with telecommunications; especially with the carrier frequencies in wireless communications approaching THz-bands, the presented diffractive permutation networks can potentially serve as channel routing and interconnection panels in wireless networks.
Chapter
The selection of a Flash cell approach is a reflection of the market and product features that a company decides to pursue. There are two major markets for Flash memories: one is the traditional embedded memory, and the other is the new emerging market of mass storage.
Article
Storage media such as digital optical disks, PROMS, or paper tape consist of a number of ”write-once” bit positions (wits); each wit initially contains a ”0” that may later be irreversibly overwritten with a ”1”. It is demonstrated that such ”write-once memories” (woms) can be ”rewritten” to a surprising degree. For example, only 3 wits suffice to represent any 2-bit value in a way that can later be updated to represent any other 2- bit value. For large k, 1·29····k wits suffice to represent a k- bit value in a way that can be similarly updated. Most surprising, allowing t writes of a k-bit value requires only t+o(t) wits, for any fixed k. For fixed t, approximately k·t/log(t) wits are required as k→∞. An n-wit WOM is shown to have a ”capacity” (i.e., k·t whenn writing a k-bit value t times) of up to n·log(n) bits.
Article
We consider the problem of constructing prefix-free codes of minimum cost when the encoding alphabet contains letters of unequal length. The complexity of this problem has been unclear for thirty years with the only algorithm known for its solution involving a transformation to integer linear programming. We introduce a new dynamic programming solution to the problem. It optimally encodes n words in O(n C+2) time, if the costs of the letters are integers between 1 and C. While still leaving open the question of whether the general problem is solvable in polynomial time, our algorithm seems to be the first one that runs in polynomial time for fixed letter costs
Article
A write-once memory (WOM) is a binary storage medium in which the individual bit positions can be changed from the 0 state to the 1 state only once. Examples of WOMs are paper tapes, punched cards, and, most importantly, optical disks. For the latter storage medium, the l's are marked by a laser that burns away a portion of the disk. In a recent paper, Rivest and Shamir showed that it is possible to update or rewrite a WOM to a surprising degree, and that the total amount of information which can be stored in an JV-position WOM in many write/read "generations" or "stages" can be much larger than N.1 In this paper we extend their results in several directions. Let C(T, N) be the total number of bits of information that can be stored in an N-position WOM using T write/read generations. We consider the four cases that result when the writer (encoder) and/or reader (decoder) know the state of the memory at the previous generation. For three of these cases, when either the encoder and/or decoder knows the previous state, we show that C(T, N) ∼ N log(T + 1), with T held fixed, as A→∞. For the remaining case, when neither the encoder nor the decoder knows the previous state, we show that C(T, N) < N Π2(6 In 2) ≈AT (2.37) and that this bound can be approached arbitrarily closely with T, N sufficiently large.
Article
The Huffman construction for t-ary trees is generalized to the case where the collection of outdegrees for every internal node is given but the outdegrees are not necessarily a constant. The output trees from this construction are called generalized Huffman trees. Optimality properties for such trees are proved. A criterion is also given to compare the costs of two generalized Huffman trees with the same collection of outdegrees.
Conference Paper
Alexicographic ranking function for the set of all permutations of n ordered symbols translates permutations to their ranks in the lexicographic order of all permutations. This is frequently used for indexing data structures by permutations. We present algorithms for computing both the ranking function and its inverse using O(n) arithmetic operations.