
Meet your Instructor!
Enrol Here
- –
- 2 - 4 Days (Flexible)
- Online
- Stata
Course Overview
In a data-rich world, the ability to transform complex, high-volume datasets into meaningful insights is a vital skill across scientific, policy, and business domains. Machine Learning with Stata is a comprehensive 4-day training course that introduces and deepens your understanding of machine learning techniques using Stata — with a flexible format that allows attendance of either the full course or just the introductory or advanced 2-day components.
Designed with a balance of theory and hands-on practice, this course leverages Stata’s powerful machine learning packages to cover both foundational and advanced techniques. Whether you're just beginning or looking to expand your ML toolkit, this course will help you harness Stata for predictive modeling, classification, variable selection, and more — all through intuitive, graphical learning approaches rather than abstract algebra.
The course is suitable for researchers, analysts, and professionals across disciplines, especially those working with social, economic, and health data.
Course Structure and Highlights
Day 1–2: Introduction to Machine Learning with Stata (5 & 6 June 2025)
Ideal for beginners or those looking to refresh foundational concepts.
-
Introduction to Machine Learning: concepts, goals, and key distinctions (e.g. supervised vs. unsupervised learning)
-
Inference vs. prediction, sampling vs. specification error
-
Goodness-of-fit, validation techniques, bias-variance trade-off
-
Model selection and regularization:
-
Hands-on Stata sessions with real-world examples
Day 3–4: Advanced Machine Learning with Stata (12 & 13 June 2025)
For participants ready to explore more powerful and nuanced methods.
-
Classification techniques:
-
Neural networks:
-
Ensemble methods and non-linear modeling:
-
Kernel-based and global regression methods:
-
Polynomial, spline, and series regressions
-
Practical applications in Stata with an emphasis on interpretability and prediction power
What You Will Learn
By the end of the course, participants will be able to:
- Understand and apply a broad range of machine learning methods in Stata
- Select and tune models using cross-validation and regularisation techniques
- Classify and predict outcomes using supervised learning tools
- Extract signal from noise and detect variable importance
- Apply machine learning in real-world research scenarios, from causal inference to data mining
Who Should Attend?
This course is open to participants from all scientific disciplines, though it is especially designed for:
-
Researchers in medical, epidemiological, and socio-economic fields
-
Data analysts and policy professionals
-
Students and academics looking to integrate machine learning into their research
No prior knowledge of machine learning is required for the introductory sessions. The advanced sessions assume a basic familiarity with regression and classification concepts.
Format & Delivery
-
Duration: 4 days (can be attended as two separate 2-day modules)
-
Mode: Live, instructor-led with practical Stata exercises
-
Materials Provided: Lecture slides, datasets, Stata code templates, and reference guides
Agenda
An Introduction to Machine Learning
Day One: 5 June 2025
The Basics of Machine Learning
Machine Learning: definition, rational, usefulness
- Supervised vs. unsupervised learning
- Regression vs. classification problems
- Inference vs. prediction
- Sampling vs. specification error
Coping with the fundamental non-identifiability of E(y|x)
- Parametric vs. non-parametric models
- The trade-off between prediction accuracy and model interpretability
Goodness-of-fit measures
- Measuring the quality of fit: in-sample vs. out-of-sample prediction power
- The bias-variance trade-off and the Mean Square Error (MSE) minimization
- Training vs. test mean square error
- The information criteria approach
Estimating training and test error
- Validation set, K-fold cross-validation, and the Bootstrap
Model Selection as a Correct Specification Procedure
- Model selection as a correct specification procedure
- The information criteria approach
Subset Selection
- Best subset selection
- Backward stepwise selection
- Forward stepwise Selection
Shrinkage Methods
- Lasso and Ridge, and Elastic regression
- Adaptive Lasso
- Information criteria and cross validation for Lasso
Stata implementation
An Introduction to Machine Learning
Day Two: 6 June 2025
Discriminant Analysis and Nearest-neighbor Classification
- The classification setting
- Bayes optimal classifier and decision boundary
- Misclassification error rate
Discriminant analysis
- Linear and quadratic discriminant analysis
- Naive Bayes classifier
- The K-nearest neighbors classifier
- Stata implementation
Neural Networks
-
The neural network model
- neurons, hidden layers, and multi-outcomes
Training a neural networks
- Back-propagation via gradient descent
- Fitting with high dimensional data
- Fitting remarks
- Cross-validating neural network hyperparameters
Final Session: 1 hour Q&A with the instructor
Advanced Machine Learning
Day One: 12 June 2025
Recap of the Basics of Machine Learning
- Machine Learning: definition, rational, usefulness
- Coping with the fundamental non-identifiability of E(y|x)
- Goodness-of-fit measures
- Estimating training and test error
- Tuning hyper-parameters optimally
Nonparametric Regression
Beyond parametric models: an overview
Local, semi-global, and global approaches:
- Kernel-based and nearest-neighbor regression
- Polynomial and series estimators
- Piecewise polynomials and spline regression
- Generalised additive models
- Partially linear models
Stata Implementation
Advanced Machine Learning
Day Two: 13 June 2025
Tree Based Methods
Regression and Classification trees: an introduction
- Growing a tree via recursive binary splitting
- Optimal tree pruning via cross-validation
Tree based ensemble methods
- Bagging
- Random forests
- Boosting
Stata implementation
Practicing Machine Learning with Stata
- Stata commands for supervised Machine Learning: an overview
- The Stata commands r_ml_stata_cv and c_ml_stata_cv
- Application to real datasets
Final Session: 1 hour Q&A with the instructor
Prerequisites
Knowledge of basic statistics, Stata and econometrics is required, including:
- The notion of conditional expectation and related properties;
- point and interval estimation;
- regression model and related properties;
- probit and logit regression.
Reading List:
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, T., Tibshirani, R., Friedman, J., Springer (2009)
- An Introduction to Statistical Learning, Gareth, J., Witten, D., Hastie, T., Tibshirani, R., Springer (2013)
- Microeconometrics Using Stata, Cameron e Trivedi, Revised Edition, StataPress (2010)
- A Super-Learning Machine for Predicting Economic Outcomes, Giovanni Cerulli
Course Timetable
Terms
- Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
- Additional discounts are available for multiple registrations.
- Temporary, time limited licences for the software(s) used in the course will be provided. You are required to install the software provided prior to the start of the course.
- Payment of course fees required prior to the course start date.
- Registration closes 1-calendar day prior to the start of the course.
- 100% fee returned for cancellations made over 28-calendar days prior to start of the course.
- 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
- No fee returned for cancellations made less than 14-calendar days prior to the start of the course.
Validate your login
Se connecter
Créer un nouveau compte