Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
portfolio
publications
causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery
Published in Proceedings of Machine Learning Research (PMLR), 2024
Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly , a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms.
Recommended citation: Göbler, K., Windisch, T., Drton, M., Pychynski, T., Roth, M. & Sonntag, S.. (2024). causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery. Proceedings of the Third Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 236:609-642 Available from https://proceedings.mlr.press/v236/gobler24a.html. https://proceedings.mlr.press/v236/gobler24a.html
High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data
Published in Electronic Journal of Statistics (EJS), 2024
Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well-developed in the case where all variables are either continuous or discrete, including in high dimensions. However, in many applications, data span variables of different types (e.g., continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple but useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation, we propose a flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically.
Recommended citation: Konstantin Göbler, Mathias Drton, Sach Mukherjee, Anne Miloschewski "High-dimensional undirected graphical models for arbitrary mixed data," Electronic Journal of Statistics, Electron. J. Statist. 18(1), 2339-2404, (2024) https://doi.org/10.1214/24-EJS2254
Nonlinear Causal Discovery for Grouped Data
Published in UAI'25, 2025
Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.
Recommended citation: Göbler, K., Windisch, T., Drton, M. (2025). Nonlinear Causal Discovery for Grouped Data. URL:https://arxiv.org/abs/2506.05120 https://arxiv.org/abs/2506.05120
talks
Spotlight talk at CLeaR conference
Published:
Spotlight talk at UCLA.
Talk at the 2025 conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat)
Published:
Talk at DAGStat.
teaching
TA - Causal Structure Learning Seminar
Graduate seminar, TUM, Mathematics Department, 2020
Master Seminar on Causal Structure Learning (CSL)
TA - Introduction to Probability Theory and Statistics
Undergraduate course, TUM, Mathematics Department, 2020
Basic course covering key concepts in probability theory and statistics.
TA - High-Dimensional Statistics
Graduate course, TUM, Mathematics Department, 2022
Graduate course covering key concepts in high-dimensional statistics.