This article proposes a novel approach to statistical alignment of nucleotide
sequences by introducing a context dependent structure on the substitution
process in the underlying evolutionary model. We propose to estimate alignments
and context dependent mutation rates relying on the observation of two
homologous sequences. The procedure is based on a generalized pair-hidden
Markov structure, where conditional on the alignment path, the nucleotide
sequences follow a Markov distribution.
In this work we deal with parameter estimation in a latent variable model,
namely the multiple-hidden i.i.d. model, which is derived from multiple
alignment algorithms. We first provide a rigorous formalism for the homology
structure of k sequences related by a star-shaped phylogenetic tree in the
context of multiple alignment based on indel evolution models. We discuss
possible definitions of likelihoods and compare them to the criterion used in
multiple alignment algorithms.