Arrière

Presentation

pystacked and ddml: Machine learning for prediction and causal inference in Stata

Mark E. Schaffer

7 September 2023

Session

pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn.

Stacking is an ensemble method that combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently-supported base learners include regularized regression (lasso, ridge, elastic net), random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used to fit a single base learner and thus provides an easy-to-use API for scikit-learn’s machine learning algorithms.

ddml implements algorithms for causal inference aided by supervised machine learning as proposed in “Double/debiased machine learning for treatment and structural parameters” (Econometrics Journal 2018). Five different models are supported, allowing for allowing for binary or continuous treatment variables and endogeneity in the presence of high-dimensional controls and/or instrumental variables. ddml is compatible with many existing supervised machine learning programs in Stata, and in particular has integrated support for pystacked, making it straightforward to use machine learner ensemble methods in causal inference applications.

Speaker

Mark E. Schaffer