wim

Wim de Mulder PostDoc

Norwegian University of Science and Technology

Department of Biology

Høgskoleringen 5, Realfagbygget, D1-137, Trondheim, Norway

Research description Application of Machine Learning to Complex Datasets

In the COLOSYS project my role is to apply my expertise in machine learning and statistics to complex data sets. The complexity of the data sets to be analyzed refers to several aspects: 1. the huge amounts of data that has become available and which, according to prominent researchers, exceed the capability of the existing tools to analyze them,  2. the strong heterogeneity of the data sets, due to the fact that they represent measurements from very diverse processes, and 3. certain artifacts that are caused by inaccurate observations, such as missingness and false positives.

These challenges will be handled by appropriate machine learning techniques, whereby statistical tools will be applied to ensure that the data sets are properly preprocessed and analyzed, and that results are interpreted in a statistically correct way. Furthermore, the heterogeneity of the collected data sets requires the use of methods that are able to handle this new kind of data. Since traditional methods have been developed to deal with homogeneous data (typically assuming a limited number of real input variables and one or more real output variables), research will be needed to extend these methods. Possible exploration paths include deep learning, which are able to learn multiple levels of representations that correspond to different levels of abstraction, and committee machines, which combine the results of different methods into a single response.

Publications

An interpretation of radial basis function networks as zero-mean Gaussian process emulators in cluster space. Wim De Mulder, Geert Molenberghs, Geert Verbeke.

A reference model for the combination of an arbitrary number of drugs: A generalization of the Bliss independence model. Wim De Mulder, Martin KuiperÅsmund Flobak.

Extending Gaussian process emulation using cluster analysis and artificial neural networks to fit big training sets. Wim De Mulder, Bernhard Rengs, Geert Molenberghs, Thomas Fent, Geert Verbeke.

A generalization of inverse distance weighting and an equivalence relationship to noise-free Gaussian process interpolation via Riesz representation theorem. Wim De Mulder, Geert Molenberghs, Geert Verbeke.

Evaluation of Some Validation Measures for Gaussian Process Emulation: a Case Study with an Agent-Based Model. Wim De Mulder, Geert Molenberghs, Geert Verbeke, Bernhard Rengs, Thomas Fent.

Statistical Emulation Applied to a Very Large Data Set Generated by an Agent-based Model. Wim De Mulder, Geert Molenberghs, Geert Verbeke, Bernhard Rengs, Thomas Fent.

Instability and cluster stability variance for real clusterings. Wim de Mulder.

A Survey on the Application of Recurrent Neural Networks to Statistical Language Modeling. Wim De Mulder, Steven Bethard, Marie-Francine Moens.

Optimal clustering in the context of overlapping cluster analysis. Wim de Mulder.

Spatial Uncertainty Analysis in Fe Problems using Interval Fields. Davind Moens, Wim de Mulder, Wim Verhaeghe, Dirk Vandepitte, Wim Desmet.

Generalized hard cluster analysisWim de Mulder.

Robustness and optimality in the context of cluster analysis: theory and applicationsWim de Mulder.

Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. Wim De Mulder, Martin Kuiper, René Boel.

Validating Clusterings of Gene Expression Data. Wim De Mulder, René Boel, Martin Kuiper.

Initialization Dependence of Clustering Algorithms. Wim De Mulder, Stefan Schliebs, René Boel, Martin Kuiper.