The estimation of a covariance matrix from an insufficient amount of data is
one of the most common problems in fields as diverse as multivariate
statistics, wireless communications, signal processing, biology, learning
theory and finance. In \cite{MTS}, a new approach to handle singular covariance
matrices was suggested. The main idea was to use dimensionality reduction in
conjunction with an average over the unitary matrices.
Web query log data contain information useful to research; however, release
of such data can re-identify the search engine users issuing the queries. These
privacy concerns go far beyond removing explicitly identifying information such
as name and address, since non-identifying personal data can be combined with
publicly available information to pinpoint to an individual. In this work we
model web query logs as unstructured transaction data and present a novel
transaction anonymization technique based on clustering and generalization
techniques to achieve the k-anonymity privacy.
Background knowledge is an important factor in privacy preserving data
publishing. Distribution-based background knowledge is one of the well studied
background knowledge. However, to the best of our knowledge, there is no
existing work considering the distribution-based background knowledge in the
worst case scenario, by which we mean that the adversary has accurate knowledge
about the distribution of sensitive values according to some tuple attributes.
Considering this worst case scenario is essential because we cannot overlook
any breaching possibility.