Prefix normal words, binary jumbled pattern matching, and bubble languages
A binary word is prefix normal if no substring has more ones than the prefix of the same length. For example, 1001010 is not prefix normal because 101 has more ones than 100. These words were introduced in [Fici & Lipt‡k, On prefix normal words, DLT 2011], where it was shown that all binary words w have a canonical prefix normal form of the same length.
Prefix normal words are important in the context of binary jumbled pattern matching (BJPM). In BJPM, we are given a text w and want to quickly answer queries of the form "Does w have a substring with x ones and y zeros?" This problem has attracted much interest in the string community in recent years. As was demonstrated in [DLT2011], the prefix normal forms of w can be used as an index for this problem.
Bubble languages are an interesting new class of binary languages defined by the following property: L is a bubble language if, for every word w in L, replacing the first occurrence of 01 (if any) by 10, results in another word in L [Ruskey, Sawada, & Williams, JCombTh Series A, 2012; Sawada & Williams, ElJComb, 2012]. This property can be equivalently stated in terms of a closure property in the computation tree of a certain generation algorithm for binary words, leading to Gray codes for each of these languages. Moreover, many bubble languages can be generated efficiently using the bubble property, in some cases even with CAT algorithms (constant amortized time). Many important languages are bubble languages, e.g. fixed-density necklaces, Lyndon words, k-ary Dyck words.
It is easy to see that prefix normal words form a bubble language. This simple observation has led to many new results on prefix normal words, among these an algorithm for generating all prefix normal words of a fixed length in amortized O(n) time per word (conjectured O(log n) time). This constitutes a very substantial improvement over the previous algorithm, which ran in O(2^n n^2) time. In fact, we conjecture that the speedup tends to infinity faster than any polynomial in n. The new algorithm allowed us to gain many new insights into the number and nature of prefix normal words.
This is joint work with PŽter Burcsi, Gabriele Fici, Frank Ruskey, and Joe Sawada.