Download citation...
Question
Asked 9 February 2012

Is there an algorithm faster than k-d trees for nearest-neighbor search in the domain of bit arrays of length (n)?

Let ''I'' be a set of bit arrays of length ''n''. Given a bit array ''t'', it is desired to find the array ''i'' in ''I'' which is closest to ''t'' in terms of their Hamming distance [1] ''h'' – which may be 0 if ''t'' is in ''I'', but most likely ''h > 0''.
Clearly this is an instance of the multidimensional nearest-neighbor problem, but if ''n'' is too large (say, ''n = 256''), traditional nearest-neighbor algorithms such as k-d trees become impractical. However, it also seems certain that the domain allows for some optimizations – after all, only two values (0 and 1) are possible for each dimension along the data points.
Does anyone know of a nearest-neighbor algorithm that is optimized for this case?
Thomas P A Debray
University Medical Center Utrecht
is there some kind of structuring of "l"? If so, you can adapt your search algorithm and considerably reduce the search space. If "l" is totally random, it becomes much more difficult. however, one potential optimalization is to explore branches with highest expectations (i.e. branches who have a good overlap already). As long as you dont find an identical match, you'll have to keep searching though...
Therefore, it may be more useful to sort "l" first, this action will cost less than searching the whole tree. There are plenty of fast sorting algorithms. After "l" is sorted, finding the closest match is a piece of cake.