
Wim de Mulder PostDoc
Norwegian University of Science and Technology
Department of Biology
Høgskoleringen 5, Realfagbygget, D1-137, Trondheim, Norway
Research description Application of Machine Learning to Complex Datasets
In the COLOSYS project my role is to apply my expertise in machine learning and statistics to complex data sets. The complexity of the data sets to be analyzed refers to several aspects: 1. the huge amounts of data that has become available and which, according to prominent researchers, exceed the capability of the existing tools to analyze them, 2. the strong heterogeneity of the data sets, due to the fact that they represent measurements from very diverse processes, and 3. certain artifacts that are caused by inaccurate observations, such as missingness and false positives.
These challenges will be handled by appropriate machine learning techniques, whereby statistical tools will be applied to ensure that the data sets are properly preprocessed and analyzed, and that results are interpreted in a statistically correct way. Furthermore, the heterogeneity of the collected data sets requires the use of methods that are able to handle this new kind of data. Since traditional methods have been developed to deal with homogeneous data (typically assuming a limited number of real input variables and one or more real output variables), research will be needed to extend these methods. Possible exploration paths include deep learning, which are able to learn multiple levels of representations that correspond to different levels of abstraction, and committee machines, which combine the results of different methods into a single response.
Publications
An interpretation of radial basis function networks as zero-mean Gaussian process emulators in cluster space. Wim De Mulder, Geert Molenberghs, Geert Verbeke.
Extending Gaussian process emulation using cluster analysis and artificial neural networks to fit big training sets. Wim De Mulder, Bernhard Rengs, Geert Molenberghs, Thomas Fent, Geert Verbeke.
A generalization of inverse distance weighting and an equivalence relationship to noise-free Gaussian process interpolation via Riesz representation theorem. Wim De Mulder, Geert Molenberghs, Geert Verbeke.
Evaluation of Some Validation Measures for Gaussian Process Emulation: a Case Study with an Agent-Based Model. Wim De Mulder, Geert Molenberghs, Geert Verbeke, Bernhard Rengs, Thomas Fent.
A Comparison of Some Simple and Complex Surrogate Models: Make Everything as Simple as Possible?. Wim De Mulder, Geert Molenberghs, Geert Verbeke, Bernhard Rengs, Thomas Fent.
Application of statistical emulation to an agent-based model: assortative mating and the reversal of gender inequality in education in Belgium. Wim de Mulder, André Grow, Geert Molenberghs, Geert Verbeke.
Statistical Emulation Applied to a Very Large Data Set Generated by an Agent-based Model. Wim De Mulder, Geert Molenberghs, Geert Verbeke, Bernhard Rengs, Thomas Fent.
Instability and cluster stability variance for real clusterings. Wim de Mulder.
A Survey on the Application of Recurrent Neural Networks to Statistical Language Modeling. Wim De Mulder, Steven Bethard, Marie-Francine Moens.
Optimal clustering in the context of overlapping cluster analysis. Wim de Mulder.
Spatial Uncertainty Analysis in Fe Problems using Interval Fields. Davind Moens, Wim de Mulder, Wim Verhaeghe, Dirk Vandepitte, Wim Desmet.
Generalized hard cluster analysis. Wim de Mulder.
Robustness and optimality in the context of cluster analysis: theory and applications. Wim de Mulder.
Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. Wim De Mulder, Martin Kuiper, René Boel.