Missing web pages, URIs that return the 404 "Page Not Found" error or the
HTTP response code 200 but dereference unexpected content, are ubiquitous in
today's browsing experience. We use Internet search engines to relocate such
missing pages and provide means that help automate the rediscovery process. We
propose querying web pages' titles against search engines. We investigate the
retrieval performance of titles and compare them to lexical signatures which
are derived from the pages' content. Since titles naturally represent the
content of a document they intuitively change over time.
Dereferencing a URI returns a representation of the current state of the
resource identified by that URI. But, on the Web representations of prior
states of a resource are also available, for example, as resource versions in
Content Management Systems or archival resources in Web Archives such as the
Internet Archive. This paper introduces a resource versioning mechanism that is
fully based on HTTP and uses datetime as a global version indicator.