Apprentissage automatique avancé avec Stata

Poursuivez votre apprentissage du machine learning avec la deuxième partie de notre série « Machine Learning avec Stata ». Ce cours approfondit les méthodes fondamentales enseignées dans la formation d'introduction et se concentre sur des techniques avancées telles que les arbres de régression et de classification, la régression à noyau et les méthodes globales.

Dr. Giovanni Cerulli

Inscrivez-vous ici

230,00 €

Guaranteed safe and secure checkout

: 8 – 9 avr. 2026
: 2 jours
: En ligne via Teams
: Stata

Overview

Recent years have witnessed an unprecedented availability of information on social, economic, and health-related phenomena. Researchers, practitioners, and policymakers now have access to huge datasets (so-called “Big Data”) on people, companies and institutions, web and mobile devices, satellites, etc., at increasing speed and detail.

Machine learning is a relatively new approach to data analytics, which places itself in the intersection between statistics, computer science, and artificial intelligence. Its primary objective is to turn information into knowledge and value by “letting the data speak”. Machine learning limits prior assumptions on data structure, and relies on a model-free philosophy supporting algorithm development, computational procedures, and graphical inspection more than tight assumptions, algebraic development and analytical solutions. Computationally unfeasible a few years ago, machine learning is a product of the computer’s era, of today machines’.

Today, various machine learning packages are available within Stata, but some of these are not known to all Stata users. This course fills this gap by making participants familiar with Stata's potential to draw knowledge and value from rows of large and possibly noisy data. The teaching approach will be based on graphical language and intuition more than on algebra. The sessions will make use of instructional as well as real-world examples and will balance theory and practical sessions evenly.

How It Works

What You’ll Learn

This training course is part two of our Machine Learning in Stata series; this course will build on methods taught in our Introduction to Machine Learning using Stata training.

After the course, participants are expected to have an improved understanding of Stata's potential to perform some of the most used machine learning techniques, thus becoming able to master research tasks including:

Factor-importance detection,
Signal-from-noise extraction,
Model-free regression and classification, both from a data-mining and a causal perspective.

Why This Course?

Meet Dr Giovanni Cerulli, giving an overview of the course.

Some prior knowledge of machine learning techniques is required to attend this course, however, the first session will start from scratch with a fresh introduction to the subject to refresh your knowledge. This course will focus on three specific techniques not covered in the first part of the course, that is: regression and classification trees (including bagging, random forests, and boosting), kernel-based regression, and global methods (step-wise, polynomial, spline, and series regressions).

The teaching approach will be based mainly on graphical language and intuition more than on algebra. The training will use instructional and real-world examples and will evenly balance theory and practical sessions.

Watch Dr Giovanni Cerulli's expertly instructed Machine Learning Regression guide now. In this video demonstration, Giovanni uses the command r_ml_Stata. Some of the model types you are able to create from this command include Elastic net, Regression tree, Neural network, Boosting, Support Vector Machine and Bagging and random forests.

Real-world applications

Informed Decision-Making in Various Domains: Participants will be empowered to apply machine learning techniques in diverse fields, such as social sciences, economics, and health. This knowledge will enable them to make informed decisions based on insights extracted from large datasets.
Enhanced Research Capabilities: Researchers can apply the learned techniques to enhance their research methodologies. The course's focus on correct model specification and model-free classification ensures robust analysis, contributing to the reliability of research findings.
Efficient Data Utilization: Professionals and policymakers will benefit from the ability to extract valuable information from large and possibly noisy datasets. This efficiency in data utilization can lead to improved policy formulation, strategic planning, and business decision-making.

Who Should Attend?

The course is open to people from all scientific fields, but it is mainly targeted at researchers working in the medical, epidemiological and socio-economic sciences.

Agenda

Day 1:

The Basics of Machine Learning

Machine Learning: definition, rational, usefulness
Coping with the fundamental non-identifiability of E(y|x)
Goodness-of-fit measures
Estimating training and test error
Tuning hyper-parameters optimally

Nonparametric Regression

Beyond parametric models: an overview

Local, semi-global, and global approaches:

Kernel-based and nearest-neighbor regression
Polynomial and series estimators
Piecewise polynomials and spline regression
Generalised additive models
Partially linear models

Stata Implementation

Day 2:

Tree Based Methods

Regression and Classification trees: an introduction

Growing a tree via recursive binary splitting
Optimal tree pruning via cross-validation

Tree based ensemble methods

Bagging
Random forests
Boosting

Stata implementation

Practicing Machine Learning with Stata

Stata commands for supervised Machine Learning: an overview
The Stata commands r_ml_stata_cv and c_ml_stata_cv
Application to real datasets

Final Session: 1 hour Q&A with the instructor

Prerequisites

Knowledge of basic statistics, Stata and econometrics is required, including:

The notion of conditional expectation and related properties;
point and interval estimation;
regression model and related properties;
probit and logit regression.

Reading List:

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, T., Tibshirani, R., Friedman, J., Springer (2009)
An Introduction to Statistical Learning, Gareth, J., Witten, D., Hastie, T., Tibshirani, R., Springer (2013)
Microeconometrics Using Stata, Cameron e Trivedi, Revised Edition, StataPress (2010)
A Super-Learning Machine for Predicting Economic Outcomes, Giovanni Cerulli

Course Timetable

*Subject to minor changes*
Morning Session	Afternoon Session	Q&A with Instructor
10am-12pm (London time)	2pm-4pm (London time)	4pm-4:30pm (London time)
10am-12pm (London time)	2pm-4pm (London time)	4pm-4:30pm (London time)

Terms:

Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
Additional discounts are available for multiple registrations.
Temporary, time limited licences for the software(s) used in the course will be provided. You are required to install the software provided prior to the start of the course.
Payment of course fees required prior to the course start date.
Registration closes 1-calendar day prior to the start of the course.

100% fee returned for cancellations made over 28-calendar days prior to start of the course.
50% fee returned for cancellations made 14-calendar days prior to the start of the course.
No fee returned for cancellations made less that 14-calendar days prior to the start of the course

Livré par

Dr. Giovanni Cerulli

IRCrES–CNR

En savoir plus

Nom	Description	Lifetime
ADD_TO_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
GUEST-VIEW	Stores the Order ID that guest shoppers use to retrieve their order status. Guest orders view. Used in Orders and Returns widgets	1 Year
LOGIN_REDIRECT	Preserves the destination page that was loading before the customer was directed to log in	1 Year
MAGE-BANNERS-CACHE-STORAGE	(Adobe Commerce only) Stores banner content locally to improve performance	1 Year
MAGE-MESSAGES	Tracks error messages and other notifications that are shown to the user	1 Year
MAGE-TRANSLATION-STORAGE	Stores translated content when requested by the shopper	1 Year
MAGE-TRANSLATION-FILE-VERSION	Tracks the version of translations in local storage	1 Year
PRODUCT_DATA_STORAGE	Stores configuration for product data related to Recently Viewed/Compared Products	1 Year
RECENTLY_COMPARED_PRODUCT	Stores product IDs of recently compared products	1 Year
RECENTLY_COMPARED_PRODUCT_PREVIOUS	Stores product IDs of previously compared products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT	Stores product IDs of recently viewed products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT_PREVIOUS	Stores product IDs of recently previously viewed products for easy navigation	1 Year
REMOVE_FROM_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
STF	Records the time messages are sent by the SendFriend	1 Year
X-MAGENTO-VARY	Configuration setting that improves performance when using Varnish static content caching	1 Year
FORM_KEY	A security measure that appends a random string to all form submissions to protect the data from Cross-Site Request Forgery	1 Year
MAGE-CACHE-SESSID	The value of this cookie triggers the cleanup of local cache storage	1 Year
MAGE-CACHE-STORAGE	Local storage of visitor-specific content that enables ecommerce functions	1 Year
MAGE-CACHE-STORAGE-SECTION-INVALIDATION	Forces local storage of specific content sections that should be invalidated	1 Year
PERSISTENT_SHOPPING_CART	Stores the key (ID) of persistent cart to make it possible to restore the cart for an anonymous shopper	1 Year
PRIVATE_CONTENT_VERSION	Appends a random, unique number and time to pages with customer content to prevent them from being cached on the server	1 Year
SECTION_DATA_IDS	Stores customer-specific information related to shopper-initiated actions, such as wish list display and checkout information	1 Year
STORE	Tracks the specific store view/locale selected by the shopper	1 Year

Nom	Description	Lifetime
CUSTOMER_SEGMENT_IDS	Stores your Customer Segment ID	1 Year
EXTERNAL_NO_CACHE	A flag that, indicates whether caching is on or off	1 Year
FRONTEND	Your session ID on the server	1 Year
GUEST-VIEW	Allows guests to edit their orders	1 Year
LAST_CATEGORY	The last category you visited	1 Year
LAST_PRODUCT	The last product you looked at	1 Year
NEWMESSAGE	Indicates whether a new message has been received	1 Year
NO_CACHE	Indicates whether it is allowed to use cache	1 Year

Nom	Description	Lifetime
MG_DNT	Allows you to restrict Adobe Commerce data collection if you have custom code to manage cookie consent on your site	1 Year
USER_ALLOWED_SAVE_COOKIE	Used for cookie restriction mode	1 Year
AUTHENTICATION_FLAG	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_ID	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_GROUP	Indicates a customer's group. This cookie is stored as sha1 checksum of the customer's group ID	1 Year
DATASERVICES_CART_ID	Identifies a shopper's cart actions	1 Year
DATASERVICES_PRODUCT_CONTEXT	Identifies a shopper's product interactions. This cookie contains the customer's unique quote ID in the system	1 Year

Nom	Description	Lifetime
_ga	Used by Google Analytics	1 Year
_ga_*	Used by Google Analytics	1 Year