Inferring an appropriate DTD or XML Schema Definition (XSD) for a given
collection of XML documents essentially reduces to learning deterministic
regular expressions from sets of positive example words. Unfortunately, there
is no algorithm capable of learning the complete class of deterministic regular
expressions from positive examples only, as we will show. The regular
expressions occurring in practical DTDs and XSDs, however, are such that every
alphabet symbol occurs only a small number of times.