A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining.

link: http://arxiv.org/abs/1001.3504
Abstract

Data mining deals with automatic extraction of previously unknown patterns
from large amounts of data. Organizations all over the world handle large
amounts of data and are dependent on mining gigantic data sets for expansion of
their enterprises. These data sets typically contain sensitive individual
information, which consequently get exposed to the other parties. Though we
cannot deny the benefits of knowledge discovery that comes through data mining,
we should also ensure that data privacy is maintained in the event of data
mining. Privacy preserving data mining is a specialized activity in which the
data privacy is ensured during data mining. Data privacy is as important as the
extracted knowledge and efforts that guarantee data privacy during data mining
are encouraged. In this paper we propose a strategy that protects the data
privacy during decision tree analysis of data mining process. We propose to add
specific noise to the numeric attributes after exploring the decision tree of
the original data. The obfuscated data then is presented to the second party
for decision tree analysis. The decision tree obtained on the original data and
the obfuscated data are similar but by using our method the data proper is not
revealed to the second party during the mining process and hence the privacy
will be preserved.