Incorporating domain knowledge into the modeling process is an effective way
to improve learning accuracy. However, as it is provided by humans, domain
knowledge can only be specified with some degree of uncertainty. We propose to
explicitly model such uncertainty through probabilistic constraints over the
parameter space. In contrast to hard parameter constraints, our approach is
effective also when the domain knowledge is inaccurate and generally results in
superior modeling accuracy.
Text documents are complex high dimensional objects. To effectively visualize
such data it is important to reduce its dimensionality and visualize the low
dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
dimensionality reduction methods that draw upon domain knowledge in order to
achieve a better low dimensional embedding and visualization of documents. We
consider the use of geometries specified manually by an expert, geometries
derived automatically from corpus statistics, and geometries computed from
linguistic resources.