The sequence segmentation problem asks for a partition of the se- quence into k non-overlapping segments that cover all data points such that each segment is as homogeneous as possible. This problem can be solved optimally using dynamic programming in O(n2k) time, where n is the length of the sequence. Given that sequences in practice are too long, a quadratic algorithm is not an adequately fast solution. Here, we present an alternative constant- factor approximation algorithm with running time O(n4=3k5=3). We call this algorithm the DNS algorithm. We also consider the recursive application of the DNS algorithm, that results in a faster algorithm (O(n log log n) running time) with O(log n) approxima- tion factor, and study the accuracy/efficiency tradeoff. Extensive experimental results show that these algorithms outperform other widely-used heuristics. The same algorithms can speed up solu- tions for other variants of the basic segmentation problem while maintaining constant their approximation factors. Our techniques can also be used in a streaming setting, with sublinear memory re- quirements.
Figures - uploaded by
Panayiotis TsaparasAuthor contentAll figure content in this area was uploaded by Panayiotis Tsaparas
Content may be subject to copyright.