Event Date
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Central Time Zone – however all sessions will be recorded and made available allowing attendees from different time zones to follow.
Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.
In this six-day course, we provide a comprehensive practical and theoretical introduction to statistical machine learning using R. We start by introducing the concepts of supervised and unsupervised learning. We firstly explore unsupervised learning, and introduce k-means and
hierarchical clustering, as well as principal components analysis. We then move to supervised learning methods, and cover logistic regression and regularisation methods (such as ridge regression and the LASSO). After that, we introduce the k-nearest neighbours method, and classification and regression trees (CART). Finally, we explore extensions to CART, such as random forests and, if time allows, Bayesian additive regression trees (BART).
This course is aimed at anyone who is interested in statistical machine learning methods for clustering, classification or prediction, and using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.
Delivered remotely
Time zone – Central Time Zone
Availability – TBC
Duration – 3 days
Contact hours – Approx. 24 hours
ECT’s – Equal to 2 ECT’s
Language – English
This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R.
Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. The course will take place online using Zoom. On each day, the live video broadcasts will occur during UK local time at: 6pm-9pm
All sessions will be video recorded and made available to all attendees as soon as possible. If some sessions are not at a convenient time due to different time zones, attendees are encouraged to join as many of the live broadcasts as possible.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the procedure for asking questions and raising comments.
This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R.
Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. The course will take place online using Zoom. On each day, the live video broadcasts will occur during UK local time at: 6pm-9pm
All sessions will be video recorded and made available to all attendees as soon as possible. If some sessions are not at a convenient time due to different time zones, attendees are encouraged to join as many of the live broadcasts as possible.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the procedure for asking questions and raising comments.
A basic understanding of R and statistical concepts. Specifically, linear regression models, statistical significance, and hypothesis testing.
Familiarity with R. Ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).
A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 1
Section 1: Introductory concepts in statistical machine learning. Unsupervised vs. supervised learning. Useful plots in classification and clustering tasks.
Section 2: Unsupervised learning methods: hierarchical clustering and the k-means method.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 2
Section 3: Dimension reduction techniques and principal components analysis.
Section 4: Regression and classification tasks. Supervised learning methods: linear and logistic regression.
Classes from 12:00 to 16:00 (Central Time Zone)
DAY 3
Section 5: Tree-based methods. Classification and regression trees (CART), random forests.
Section 6: Extensions to tree-based methods. Bayesian additive regression trees (BART). Combining tree-based methods with a parametric regression framework.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 4
Section 7: Generalized additive models and cross-validation techniques.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 5
Section 8: Tree-based methods. Classification and regression trees (CART), random forests.
Section 9: Extensions to tree-based methods. Bayesian additive regression trees (BART). Boruta.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 6
Section 10: Neural networks. Fitting feedforward neural networks and multilayer perceptron using R. Selecting the number of neurons based on cross-validation and information criteria. Neural networks as statistical models.
Section 11: Generalized additive models for location, scale, and shape (GAMLSS). Combining regression trees and neural networks within the GAMLSS regression framework.
Dr. Rafael De Andrade Moral
Rafael is an Associate Professor of Statistics at Maynooth University, Ireland. With a background in Biology and a PhD in Statistics from the University of São Paulo, Rafael has a deep passion for teaching and conducting research in statistical modelling applied to Ecology, Wildlife Management, Agriculture, and Environmental Science. As director of the Theoretical and Statistical Ecology Group, Rafael brings together a community of researchers who use mathematical and statistical tools to better understand the natural world. As an alternative teaching strategy, Rafael has been producing music videos and parodies to promote Statistics in social media and in the classroom. His personal webpage can be found here