Online Machine Learning with Stata Masterclass

Name: Online Machine Learning with Stata Masterclass
SKU: Machine Learning with Stata

Master both the fundamentals and advanced techniques of machine learning using Stata in this flexible 4-day course. Students can choose to attend the full course or select either the Introductory (5–6 June) or Advanced (12–13 June) sessions individually. With hands-on training and expert instruction, you'll gain practical skills to extract insights from complex data using Stata's powerful machine-learning tools.

Dr Giovanni Cerulli

Enrol Here

Meet your Instructor!

Enrol Here

360,00 €

Guaranteed safe and secure checkout

: 5 – 13 juin 2025
: 2 - 4 Days (Flexible)
: Online
: Stata

Course Overview

In a data-rich world, the ability to transform complex, high-volume datasets into meaningful insights is a vital skill across scientific, policy, and business domains. Machine Learning with Stata is a comprehensive 4-day training course that introduces and deepens your understanding of machine learning techniques using Stata — with a flexible format that allows attendance of either the full course or just the introductory or advanced 2-day components.

Designed with a balance of theory and hands-on practice, this course leverages Stata’s powerful machine learning packages to cover both foundational and advanced techniques. Whether you're just beginning or looking to expand your ML toolkit, this course will help you harness Stata for predictive modeling, classification, variable selection, and more — all through intuitive, graphical learning approaches rather than abstract algebra.

The course is suitable for researchers, analysts, and professionals across disciplines, especially those working with social, economic, and health data.

Course Structure and Highlights

Day 1–2: Introduction to Machine Learning with Stata (5 & 6 June 2025)

Ideal for beginners or those looking to refresh foundational concepts.

Introduction to Machine Learning: concepts, goals, and key distinctions (e.g. supervised vs. unsupervised learning)
Inference vs. prediction, sampling vs. specification error
Goodness-of-fit, validation techniques, bias-variance trade-off
Model selection and regularization:
Hands-on Stata sessions with real-world examples

Day 3–4: Advanced Machine Learning with Stata (12 & 13 June 2025)

For participants ready to explore more powerful and nuanced methods.

Classification techniques:
Neural networks:
Ensemble methods and non-linear modeling:
Kernel-based and global regression methods:
Polynomial, spline, and series regressions
Practical applications in Stata with an emphasis on interpretability and prediction power

What You Will Learn

By the end of the course, participants will be able to:

Understand and apply a broad range of machine learning methods in Stata
Select and tune models using cross-validation and regularisation techniques
Classify and predict outcomes using supervised learning tools
Extract signal from noise and detect variable importance
Apply machine learning in real-world research scenarios, from causal inference to data mining

Who Should Attend?

This course is open to participants from all scientific disciplines, though it is especially designed for:

Researchers in medical, epidemiological, and socio-economic fields
Data analysts and policy professionals
Students and academics looking to integrate machine learning into their research

No prior knowledge of machine learning is required for the introductory sessions. The advanced sessions assume a basic familiarity with regression and classification concepts.

Format & Delivery

Duration: 4 days (can be attended as two separate 2-day modules)
Mode: Live, instructor-led with practical Stata exercises
Materials Provided: Lecture slides, datasets, Stata code templates, and reference guides

Agenda

An Introduction to Machine Learning

Day One: 5 June 2025

The Basics of Machine Learning

Machine Learning: definition, rational, usefulness

Supervised vs. unsupervised learning
Regression vs. classification problems
Inference vs. prediction
Sampling vs. specification error

Coping with the fundamental non-identifiability of E(y|x)

Parametric vs. non-parametric models
The trade-off between prediction accuracy and model interpretability

Goodness-of-fit measures

Measuring the quality of fit: in-sample vs. out-of-sample prediction power
The bias-variance trade-off and the Mean Square Error (MSE) minimization
Training vs. test mean square error
The information criteria approach

Estimating training and test error

Validation set, K-fold cross-validation, and the Bootstrap

Model Selection as a Correct Specification Procedure

Model selection as a correct specification procedure
The information criteria approach

Subset Selection

Best subset selection
Backward stepwise selection
Forward stepwise Selection

Shrinkage Methods

Lasso and Ridge, and Elastic regression
Adaptive Lasso
Information criteria and cross validation for Lasso

Stata implementation

An Introduction to Machine Learning

Day Two: 6 June 2025

Discriminant Analysis and Nearest-neighbor Classification

The classification setting
Bayes optimal classifier and decision boundary
Misclassification error rate

Discriminant analysis

Linear and quadratic discriminant analysis
Naive Bayes classifier

The K-nearest neighbors classifier
Stata implementation

Neural Networks

The neural network model
Training a neural networks

Final Session: 1 hour Q&A with the instructor

Advanced Machine Learning

Day One: 12 June 2025

Recap of the Basics of Machine Learning

Machine Learning: definition, rational, usefulness
Coping with the fundamental non-identifiability of E(y|x)
Goodness-of-fit measures
Estimating training and test error
Tuning hyper-parameters optimally

Nonparametric Regression

Beyond parametric models: an overview

Local, semi-global, and global approaches:

Kernel-based and nearest-neighbor regression
Polynomial and series estimators
Piecewise polynomials and spline regression
Generalised additive models
Partially linear models

Stata Implementation

Advanced Machine Learning

Day Two: 13 June 2025

Tree Based Methods

Regression and Classification trees: an introduction

Growing a tree via recursive binary splitting
Optimal tree pruning via cross-validation

Tree based ensemble methods

Bagging
Random forests
Boosting

Stata implementation

Practicing Machine Learning with Stata

Stata commands for supervised Machine Learning: an overview
The Stata commands r_ml_stata_cv and c_ml_stata_cv
Application to real datasets

Final Session: 1 hour Q&A with the instructor

Prerequisites

Knowledge of basic statistics, Stata and econometrics is required, including:

The notion of conditional expectation and related properties;
point and interval estimation;
regression model and related properties;
probit and logit regression.

Reading List:

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, T., Tibshirani, R., Friedman, J., Springer (2009)
An Introduction to Statistical Learning, Gareth, J., Witten, D., Hastie, T., Tibshirani, R., Springer (2013)
Microeconometrics Using Stata, Cameron e Trivedi, Revised Edition, StataPress (2010)
A Super-Learning Machine for Predicting Economic Outcomes, Giovanni Cerulli

Course Timetable

*Subject to minor changes*
Morning Session	Afternoon Session	Q&A with Instructor
10am-12pm (London time)	2pm-4pm (London time)	4pm-4:30pm (London time)
10am-12pm (London time)	2pm-4pm (London time)	4pm-4:30pm (London time)

Terms

Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
Additional discounts are available for multiple registrations.
Temporary, time limited licences for the software(s) used in the course will be provided. You are required to install the software provided prior to the start of the course.
Payment of course fees required prior to the course start date.
Registration closes 1-calendar day prior to the start of the course.

100% fee returned for cancellations made over 28-calendar days prior to start of the course.
50% fee returned for cancellations made 14-calendar days prior to the start of the course.
No fee returned for cancellations made less than 14-calendar days prior to the start of the course.

Delivered By

Dr Giovanni Cerulli

IRCrES–CNR

Apprendre encore plus

Student Testimonials

Giovanni's delivery is fantastic; makes great connections between new and prior knowledge and focuses on the key strengths and limitations of the discussed methods. Excellent course design that builds on the Introductory Machine Learning course and knowledge acquired in the PhD Econometrics sequences of courses. This is all nicely supplemented by detailed Stata code with explanations and sample datasets.

Excellent course and great explanations on ML techniques and applications from Giovanni ! I leanred so much including the coding and applications plus the fundamentals of ML.

The 'Advanced Machine Learning (AML)' experience was excellent for trying to gain more experience in Statistics using links Python and STATA.

I'm not a Statistician! However, Giovanni managed to link the 'Fundamentals of Machine Learning (FML) ' to 'Advanced Machine Learning' in his usual excellent way. When starting the AML, for me I am pleased that the FML was a tremendous help and allowed me to use my mathematical knowledge for Physics and Science. I'm looking forward to Giovanni's next course (using large datasets) and his book.

Linking my knowledge of mathematics (from Science and Engineering) to Statistics. I do hope it is leading towards becoming better at 'Medical Statistics' that require very large datasets...and a big thank you to Giovanni!

Very well organized, very useful and relevant content, looking forward to joining future events!

As always great service and real good courses. In addition, thanks to Professor Cerulli for making himself understood in the best way.

The delivery of this course was exceptionally well done. It really helped me to appreciate the concepts as well as the practical applications in Stata. If you are new to this topic, this will provide a good introduction to complex issues.

Very easy to communicate, all emails contained all the information necessary. I think that the course was very well structured and organized. The tutor provided a number of codes that were extremely helpful for understanding. Overall, very useful and easy to follow!

I highly appreciated Professor Giovannu Cerulli course. The classes notes are very clear and well prepared with an extensive coverage of the course subjects. And they are simultanesouly quite objective by focusing on the most important contents. Professor Giovannu Cerulli lectures are very didatic which greately helps the easily assimilation of the corespondent knowledge. Furthermore, the course materials are quite comprehensive and they englobe not only the classes notes, but also the referenced papers as well as data and Stata programs to estimate the models in this software. All in all, I greatly recommend this course, as it really amazingly speeds up the acquaintance of the underlying theory and appied aplication in a very short period of time.

I found the Stata Summer School 2021 very useful and interesting. The course was perfectly structured and organised, with a good progression during the week. The instructors presented the topics covered in an easy and understandable way. There were room for questions and answers when needed. Materials shared for the course were tidy and informative, and I am sure I will use them frequently. This course was arranged online, which in my opinion worked very well. I believe the course delivered as promised and according to information found online when I signed up for the course. Easy to purchase/sign up for the course. User friendly. Quick and timely response.

Very efficient in terms of communication and delivery. Provides a very comprehesnive applied knowledge of stata. I would definitely recommend others to buy from them.

I went UK University of Cambridge for a summer school with Timberlake, it was excellent.

It was a great course and I thoroughly enjoyed it. Many of my fellow participants were eager to share their ideas. I thought the course could help further many people in a similar stage to my career!

Nom	Description	Lifetime
ADD_TO_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
GUEST-VIEW	Stores the Order ID that guest shoppers use to retrieve their order status. Guest orders view. Used in Orders and Returns widgets	1 Year
LOGIN_REDIRECT	Preserves the destination page that was loading before the customer was directed to log in	1 Year
MAGE-BANNERS-CACHE-STORAGE	(Adobe Commerce only) Stores banner content locally to improve performance	1 Year
MAGE-MESSAGES	Tracks error messages and other notifications that are shown to the user	1 Year
MAGE-TRANSLATION-STORAGE	Stores translated content when requested by the shopper	1 Year
MAGE-TRANSLATION-FILE-VERSION	Tracks the version of translations in local storage	1 Year
PRODUCT_DATA_STORAGE	Stores configuration for product data related to Recently Viewed/Compared Products	1 Year
RECENTLY_COMPARED_PRODUCT	Stores product IDs of recently compared products	1 Year
RECENTLY_COMPARED_PRODUCT_PREVIOUS	Stores product IDs of previously compared products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT	Stores product IDs of recently viewed products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT_PREVIOUS	Stores product IDs of recently previously viewed products for easy navigation	1 Year
REMOVE_FROM_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
STF	Records the time messages are sent by the SendFriend	1 Year
X-MAGENTO-VARY	Configuration setting that improves performance when using Varnish static content caching	1 Year
FORM_KEY	A security measure that appends a random string to all form submissions to protect the data from Cross-Site Request Forgery	1 Year
MAGE-CACHE-SESSID	The value of this cookie triggers the cleanup of local cache storage	1 Year
MAGE-CACHE-STORAGE	Local storage of visitor-specific content that enables ecommerce functions	1 Year
MAGE-CACHE-STORAGE-SECTION-INVALIDATION	Forces local storage of specific content sections that should be invalidated	1 Year
PERSISTENT_SHOPPING_CART	Stores the key (ID) of persistent cart to make it possible to restore the cart for an anonymous shopper	1 Year
PRIVATE_CONTENT_VERSION	Appends a random, unique number and time to pages with customer content to prevent them from being cached on the server	1 Year
SECTION_DATA_IDS	Stores customer-specific information related to shopper-initiated actions, such as wish list display and checkout information	1 Year
STORE	Tracks the specific store view/locale selected by the shopper	1 Year

Nom	Description	Lifetime
CUSTOMER_SEGMENT_IDS	Stores your Customer Segment ID	1 Year
EXTERNAL_NO_CACHE	A flag that, indicates whether caching is on or off	1 Year
FRONTEND	Your session ID on the server	1 Year
GUEST-VIEW	Allows guests to edit their orders	1 Year
LAST_CATEGORY	The last category you visited	1 Year
LAST_PRODUCT	The last product you looked at	1 Year
NEWMESSAGE	Indicates whether a new message has been received	1 Year
NO_CACHE	Indicates whether it is allowed to use cache	1 Year

Nom	Description	Lifetime
MG_DNT	Allows you to restrict Adobe Commerce data collection if you have custom code to manage cookie consent on your site	1 Year
USER_ALLOWED_SAVE_COOKIE	Used for cookie restriction mode	1 Year
AUTHENTICATION_FLAG	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_ID	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_GROUP	Indicates a customer's group. This cookie is stored as sha1 checksum of the customer's group ID	1 Year
DATASERVICES_CART_ID	Identifies a shopper's cart actions	1 Year
DATASERVICES_PRODUCT_CONTEXT	Identifies a shopper's product interactions. This cookie contains the customer's unique quote ID in the system	1 Year

Nom	Description	Lifetime
_ga	Used by Google Analytics	1 Year
_ga_*	Used by Google Analytics	1 Year

École d'été d'économétrie 2025, Cambridge

Conférence Stata 2025 au Royaume-Uni, Londres