Motivated by applications in bioinformatics, we consider the word collector
problem, i.e. the expected number of calls to a random weighted generator of
words of length $n$ before the full collection is obtained. The originality of
this instance of the non-uniform coupon collector lies in the, potentially
large, multiplicity of the words/coupons of a given probability/composition. We
obtain a general theorem that gives an asymptotic equivalent for the expected
waiting time of a general version of the Coupon Collector. This theorem is
especially well-suited for classes of coupons featuring high multiplicities.
Its application to a given language essentially necessitates some knowledge on
the number of words of a given composition/probability. We illustrate the
application of our theorem, in a step-by-step fashion, on three exemplary
languages, revealing asymptotic regimes in $\Theta(\mu(n)\cdot n)$ and
$\Theta(\mu(n)\cdot \log n)$, where $\mu(n)$ is the sum of weights over words
of length $n$.