Let $#A$ denote the cardinality of a finite set $A$. Let $t(x)=x$ if $x\geq1$
and 1 otherwise. For any two sets $A,B$ denote by $\delta(A,B)$ $=$
$\log_{2}(t(#(B\cap\overline{A})#A))$. We define a new set distance $d(A,B)$
$=$ $\max\{\delta(A,B),\delta(B,A)\} $ motivated by combinatorial notions of
entropy and information \citep{Kolmogorov65}. We prove that $d$ is a
semi-metric on the space of sets of size at least 2. The triangle inequality.
holds for triplets $A$, $B$, $C$ that are not strictly contained one in
another.
Given a random binary sequence $X^{(n)}$ of random variables, $X_{t},$
$t=1,2,...,n$, for instance, one that is generated by a Markov source (teacher)
of order $k^{*}$ (each state represented by $k^{*}$ bits). Assume that the
probability of the event $X_{t}=1$ is constant and denote it by $\beta$.
Consider a learner which is based on a parametric model, for instance a Markov
model of order $k$, who trains on a sequence $x^{(m)}$ which is randomly drawn
by the teacher.
Given a random binary sequence $X^{(n)}$ of random variables, $X_{t},$
$t=1,2,...,n$, for instance, one that is generated by a Markov source (teacher)
of order $k^{*}$ (each state represented by $k^{*}$ bits). Assume that the
probability of the event $X_{t}=1$ is constant and denote it by $\beta$.
Consider a learner which is based on a parametric model, for instance a Markov
model of order $k$, who trains on a sequence $x^{(m)}$ which is randomly drawn
by the teacher.
We investigate a population of binary mistake sequences that result from
learning with parametric models of different order. We obtain estimates of
their error, algorithmic complexity and divergence from a purely random
Bernoulli sequence. We study the relationship of these variables to the
learner's information density parameter which is defined as the ratio between
the lengths of the compressed to uncompressed files that contain the learner's
decision rule. The results indicate that good learners have a low information
density$\rho$ while bad learners have a high $\rho$.
What is the relationship between the complexity of a learner and the
randomness of his mistakes? This question was posed in \cite{rat0903} who
showed that the more complex the learner the higher the possibility that his
mistakes deviate from a true random sequence. In the current paper we report on
an empirical investigation of this problem. We investigate two characteristics
of randomness, the stochastic and algorithmic complexity of the binary sequence
of mistakes.