Fionn Murtagh

  1. The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces.

    Authors: Fionn Murtagh, Pedro Contreras
    Subjects: Information Retrieval
    Abstract

    Consider observation data, comprised of n observation vectors with values on
    a set of attributes. This gives us n points in attribute space. Having data
    structured as a tree, implied by having our observations embedded in an
    ultrametric topology, offers great advantage for proximity searching. If we
    have preprocessed data through such an embedding, then an observation's nearest
    neighbor is found in constant computational time, i.e. O(1) time. A further
    powerful approach is discussed in this work: the inducing of a hierarchy, and
    hence a tree, in linear computational time, i.e.

  2. Current Trends in Evolving Specialization in UK Universities.

    Authors: Fionn Murtagh
    Subjects: Applications
    Abstract

    There are very significant changes taking place in the university sector and
    in related higher education institutes in many parts of the world. In this work
    we look at financial data from 2010 and 2011 from the UK higher education
    sector. Situating ourselves to begin with in the context of teaching versus
    research in universities, we look at the data in order to explore the new
    divergence between the broad agendas of teaching and research in universities.
    The innovation agenda has become at least equal to the research and teaching
    objectives of universities.

  3. Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm.

    Authors: Fionn Murtagh, Pierre Legendre
    Subjects: Machine Learning
    Abstract

    The Ward error sum of squares hierarchical clustering method has been very
    widely used since its first description by Ward in a 1963 publication. It has
    also been generalized in various ways. However there are different
    interpretations in the literature and there are different implementations of
    the Ward agglomerative algorithm in commonly used software systems, including
    differing expressions of the agglomerative criterion. Our survey work and case
    studies will be useful for all those involved in developing software for data
    analysis using Ward's hierarchical clustering method.

  4. Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement.

    Authors: Fionn Murtagh, Pedro Contreras
    Subjects: Machine Learning
    Abstract

    We describe many vantage points on the Baire metric and its use in clustering
    data, or its use in preprocessing and structuring data in order to support
    search and retrieval operations. In some cases, we proceed directly to clusters
    and do not directly determine the distances. We show how a hierarchical
    clustering can be read directly from one pass through the data. We offer
    insights also on practical implications of precision of data measurement. As a
    mechanism for treating multidimensional data, including very high dimensional
    data, we use random projections.

  5. Fast, Linear Time Hierarchical Clustering using the Baire Metric.

    Authors: Fionn Murtagh, Pedro Contreras
    Subjects: Machine Learning
    Abstract

    The Baire metric induces an ultrametric on a dataset and is of linear
    computational complexity, contrasted with the standard quadratic time
    agglomerative hierarchical clustering algorithm. In this work we evaluate
    empirically this new approach to hierarchical clustering. We compare
    hierarchical clustering based on the Baire metric with (i) agglomerative
    hierarchical clustering, in terms of algorithm properties; (ii) generalized
    ultrametrics, in terms of definition; and (iii) fast clustering through k-means
    partititioning, in terms of quality of results.

  6. Methods of Hierarchical Clustering.

    Authors: Fionn Murtagh, Pedro Contreras
    Subjects: Information Retrieval
    Abstract

    We survey agglomerative hierarchical clustering algorithms and discuss
    efficient implementations that are available in R and other software
    environments. We look at hierarchical self-organizing maps, and mixture models.
    We review grid-based clustering, focusing on hierarchical density-based
    approaches. Finally we describe a recently developed very efficient (linear
    time) hierarchical clustering algorithm, which can also be viewed as a
    hierarchical grid-based algorithm.

  7. New Methods of Analysis of Narrative and Semantics in Support of Interactivity.

    Authors: Fionn Murtagh, Adam Ganz, Joe Reddington
    Subjects: Artificial Intelligence
    Abstract

    Our work has focused on support for film or television scriptwriting. Since
    this involves potentially varied story-lines, we note the implicit or latent
    support for interactivity. Furthermore the film, television, games, publishing
    and other sectors are converging, so that cross-over and re-use of one form of
    product in another of these sectors is ever more common. Technically our work
    has been largely based on mathematical algorithms for data clustering and
    display. Operationally, we also discuss how our algorithms can support
    collective, distributed problem-solving.

  8. Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis.

    Authors: Fionn Murtagh
    Subjects: Logic in Computer Science
    Abstract

    Following a review of metric, ultrametric and generalized ultrametric, we
    review their application in data analysis. We show how they allow us to explore
    both geometry and topology of information, starting with measured data. Some
    themes are then developed based on the use of metric, ultrametric and
    generalized ultrametric in logic. In particular we study approximation chains
    in an ultrametric or generalized ultrametric context.

  9. Segmentation and Nodal Points in Narrative: Study of Multiple Variations of a Ballad.

    Authors: Fionn Murtagh, Adam Ganz
    Subjects: Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
    Abstract

    The Lady Maisry ballads afford us a framework within which to segment a
    storyline into its major components. Segments and as a consequence nodal points
    are discussed for nine different variants of the Lady Maisry story of a (young)
    woman being burnt to death by her family, on account of her becoming pregnant
    by a foreign personage. We motivate the importance of nodal points in textual
    and literary analysis. We show too how the openings of the nine variants can be
    analyzed comparatively, and also the conclusions of the ballads.

  10. Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets.

    Authors: Fionn Murtagh, Pedro Contreras
    Subjects: Machine Learning
    Abstract

    Data analysis and data mining are concerned with unsupervised pattern finding
    and structure determination in data sets. "Structure" can be understood as
    symmetry and a range of symmetries are expressed by hierarchy. Such symmetries
    directly point to invariants, that pinpoint intrinsic properties of the data
    and of the background empirical domain of interest. We review many aspects of
    hierarchy here, including ultrametric topology, generalized ultrametric,
    linkages with lattices and other discrete algebraic structures and with p-adic
    number representations.

  11. Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models in Image Segmentation.

    Authors: Fionn Murtagh, Pedro Contreras, Jean-Luc Starck
    Subjects: Computer Vision and Pattern Recognition
    Abstract

    By a "covering" we mean a Gaussian mixture model fit to observed data.
    Approximations of the Bayes factor can be availed of to judge model fit to the
    data within a given Gaussian mixture model. Between families of Gaussian
    mixture models, we propose the R\'enyi quadratic entropy as an excellent and
    tractable model comparison framework.

RSS-материал