Over the past decade, topological data analysis has emerged through pioneering efforts of Edelsbrunner, Carlsson, Ghrist and others who have applied algebraic topology to data analysis. This approach has already achieved very exciting results, for example, the discovery of a new class of breast cancers that are particularly responsive to treatment, insights into gene structure, understand- ing coverage problems in sensor networks. Some of the topological ideas have found commercial applications, for example, through the work of Ayasdi. Supporting these successes is the power of topology, a branch of mathematics that has been developed over the past century specifically to understand the shape of spaces of arbitrary dimension, and to describe that shape through numerical invariants.

Topology offers certain advantages over machine learning: it can discern structure in data in an un-supervised way without the need for an a priori dimension reduction or selection of specific algorithms. Compared to statistics, topological visualisation of spaces offers broader perspectives than the probabilistic methods. However, the topological approach has important weaknesses. For example, it lacks the statistical inference and prediction strategies, while computing topological invariants is still expensive in terms of computing power and time.

Thanks to intense research effort we now understand the relative strengths of the three methodologies, and in this proposal we take the logical but very challenging step to move data analysis to a new level.

In this programme, we will create a powerful fusion of statistics, machine learning and topology able to deal with complex, heterogeneous, time- dependent multi-dimensional data sets. Our investigation is guided by and tested on key problems in medicine and the sciences where the lack of appropriate analytic tools is now a major obstacle to progress. More information is available on the JTD website.