Trapezoidal words are finite words having at most n+1 distinct factors of
length n, for every n>=0. They encompass finite Sturmian words. We distinguish
trapezoidal words into two disjoint subsets: open and closed trapezoidal words.
A trapezoidal word is closed if its longest repeated prefix has exactly two
occurrences in the word, the second one being a suffix of the word. Otherwise
it is open. We show that open trapezoidal words are all primitive and that
closed trapezoidal words are all Sturmian. We then show that trapezoidal
palindromes are closed (and therefore Sturmian).
The Parikh vector p(s) of a string s is defined as the vector of
multiplicities of the characters. Parikh vector q occurs in s if s has a
substring t with p(t)=q. We present two novel algorithms for searching for a
query q in a text s. One solves the decision problem over a binary text in
constant time, using a linear size index of the text. The second algorithm, for
a general finite alphabet, finds all occurrences of a given Parikh vector q and
has sub-linear expected time complexity; we present two variants, which both
use a linear size index of the text.
Let S be a finite set of words over an alphabet Sigma. The set S is said to
be complete if every word w over the alphabet Sigma is a factor of some element
of S*, i.e. w belongs to Fact(S*). Otherwise if S is not complete, we are
interested in finding bounds on the minimal length of words in Sigma* which are
not elements of Fact(S*) in terms of the maximal length of words in S.