Graphical model learning and inference are often performed using Bayesian
techniques. In particular, learning is usually performed in two separate steps.
First, the graph structure is learned from the data; then the parameters of the
model are estimated conditional on that graph structure. While the probability
distributions involved in this second step have been studied in depth, the ones
used in the first step have not been explored in as much detail.
Graphical models, and in particular Bayesian networks, have been widely used
to investigate data in the biological and healthcare domains. This can be
attributed to the recent explosion of high-throughput data across these domains
and the importance of understanding the causal relationships between the
variables of interest. However, classic model validation techniques for
identifying significant edges rely on the choice of an ad-hoc threshold, which
is non-trivial and can have a pronounced impact on the conclusions of the
analysis.
In literature there are several studies on the performance of Bayesian
network structure learning algorithms. The focus of these studies is almost
always the heuristics learning algorithms are based on, i.e. the maximization
algorithms used in score-based algorithms or the techniques for learning the
dependencies of each variable in constraint-based algorithms.
The structure of a Bayesian network includes a great deal of information
about the probability distribution of the data, which is uniquely identified
given some general distributional assumptions. Therefore it's important to
study its variability, which can be used to compare the performance of
different learning algorithms and to measure the strength of any arbitrary
subset of arcs.
The aim of this chapter is twofold. In the first part we will provide a brief
overview of the mathematical and statistical foundations of graphical models,
along with their fundamental properties, estimation and basic inference
procedures. In particular we will develop Markov networks (also known as Markov
random fields) and Bayesian networks, which comprise most past and current
literature on graphical models. In the second part we will review some
applications of graphical models in systems biology.
The structure of a Bayesian network encodes most of the information about the
probability distribution of the data, which is uniquely identified given some
general distributional assumptions. Therefore it's important to study the
variability of its network structure, which can be used to compare the
performance of different learning algorithms and to measure the strength of any
arbitrary subset of arcs.
bnlearn is an R package which includes several algorithms for learning the
structure of Bayesian networks with either discrete or continuous variables.
Both constraint-based and score-based algorithms are implemented, and can use
the functionality provided by the snow package to improve their performance via
parallel computing. Several network scores and conditional independence
algorithms are available for both the learning algorithms and independent use.
Advanced plotting options are provided by the Rgraphviz package.