Zsuzsanna Lipt\'ak, Universit\`a di Verona, Italy
Jumbled String Matching: Motivations, Variants, Algorithms
In the classical exact string matching problem, we are given two strings s (the text) and t (the pattern), and we want to find all occurrences of t as substrings of s. In jumbled string matching, the goal is to find all occurrences of substrings t' of s with the same multiplicity of characters as the pattern, i.e. all permuted (or jumbled) versions of the pattern. Alternatively, the pattern can be given as a vector specifying the character multiplicities (a so-called Parikh vector), and we want to find all occurrences of substrings with this Parikh vector.
For example, the text s=baacabccab has two occurrences of the pattern (2,1,1), i.e. of substrings consisting of two a's, one b, and one c: in position 1 and in position 3. The online version of the problem can be solved optimally in O(n) time with a simple sliding window algorithm. However, the indexed version, where we want to answer multiple queries fast, using an index of the text, has proved challenging.
Variants of this problem are motivated by applications in mass spectrometry data interpretation, by gene clusters, and by motif search in networks.
I will talk about different algorithms for several variants of the problem, including the case of binary alphabets, which has very specific properties, and which has aroused much interest in recent years.