Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

portfolio

publications

causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery

Published in Proceedings of Machine Learning Research (PMLR), 2024

Algorithms for causal discovery have recently undergone rapid advances and increasingly draw on flexible nonparametric methods to process complex data. With these advances comes a need for adequate empirical validation of the causal relationships learned by different algorithms. However, for most real and complex data sources true causal relations remain unknown. This issue is further compounded by privacy concerns surrounding the release of suitable high-quality data. To tackle these challenges, we introduce causalAssembly , a semisynthetic data generator designed to facilitate the benchmarking of causal discovery methods. The tool is built using a complex real-world dataset comprised of measurements collected along an assembly line in a manufacturing setting. For these measurements, we establish a partial set of ground truth causal relationships through a detailed study of the physics underlying the processes carried out in the assembly line. The partial ground truth is sufficiently informative to allow for estimation of a full causal graph by mere nonparametric regression. To overcome potential confounding and privacy concerns, we use distributional random forests to estimate and represent conditional distributions implied by the ground truth causal graph. These conditionals are combined into a joint distribution that strictly adheres to a causal model over the observed variables. Sampling from this distribution, causalAssembly generates data that are guaranteed to be Markovian with respect to the ground truth. Using our tool, we showcase how to benchmark several well-known causal discovery algorithms.

Recommended citation: Göbler, K., Windisch, T., Drton, M., Pychynski, T., Roth, M. & Sonntag, S.. (2024). causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery. Proceedings of the Third Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 236:609-642 Available from https://proceedings.mlr.press/v236/gobler24a.html. https://proceedings.mlr.press/v236/gobler24a.html

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data

Published in Electronic Journal of Statistics (EJS), 2024

Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well-developed in the case where all variables are either continuous or discrete, including in high dimensions. However, in many applications, data span variables of different types (e.g., continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple but useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation, we propose a flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically.

Recommended citation: Konstantin Göbler, Mathias Drton, Sach Mukherjee, Anne Miloschewski "High-dimensional undirected graphical models for arbitrary mixed data," Electronic Journal of Statistics, Electron. J. Statist. 18(1), 2339-2404, (2024) https://doi.org/10.1214/24-EJS2254

Nonlinear Causal Discovery for Grouped Data

Published in UAI'25, 2025

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social science, and industrial manufacturing, the causal units of interest are groups of variables rather than individual scalar measurements. Motivated by these applications, we extend nonlinear additive noise models to handle random vectors, establishing a two-step approach for causal graph learning: First, infer the causal order among random vectors. Second, perform model selection to identify the best graph consistent with this order. We introduce effective and novel solutions for both steps in the vector case, demonstrating strong performance in simulations. Finally, we apply our method to real-world assembly line data with partial knowledge of causal ordering among variable groups.

Recommended citation: Göbler, K., Windisch, T., Drton, M. (2025). Nonlinear Causal Discovery for Grouped Data. URL:https://arxiv.org/abs/2506.05120 https://arxiv.org/abs/2506.05120

talks

Spotlight talk at CLeaR conference

Published: April 01, 2024

Spotlight talk at UCLA.

Talk at the 2025 conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat)

Published: March 24, 2025

Talk at DAGStat.

Konstantin Göbler

Sitemap

Pages

Page Not Found

Overview

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

portfolio

publications

causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery

High-Dimensional Undirected Graphical Models for Arbitrary Mixed Data

Nonlinear Causal Discovery for Grouped Data

talks

Spotlight talk at CLeaR conference

Talk at the 2025 conference of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat)

teaching

TA - Causal Structure Learning Seminar

TA - Introduction to Probability Theory and Statistics

TA - High-Dimensional Statistics