A Survey on Preprocessing Methods for Web Usage Data.

link: http://arxiv.org/abs/1004.1257
Abstract

World Wide Web is a huge repository of web pages and links. It provides
abundance of information for the Internet users. The growth of web is
tremendous as approximately one million pages are added daily. Users' accesses
are recorded in web logs. Because of the tremendous usage of web, the web log
files are growing at a faster rate and the size is becoming huge. Web data
mining is the application of data mining techniques in web data. Web Usage
Mining applies mining techniques in log data to extract the behavior of users
which is used in various applications like personalized services, adaptive web
sites, customer profiling, prefetching, creating attractive web sites etc., Web
usage mining consists of three phases preprocessing, pattern discovery and
pattern analysis. Web log data is usually noisy and ambiguous and preprocessing
is an important process before mining. For discovering patterns sessions are to
be constructed efficiently. This paper reviews existing work done in the
preprocessing stage. A brief overview of various data mining techniques for
discovering patterns, and pattern analysis are discussed. Finally a glimpse of
various applications of web usage mining is also presented.