Event Date
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Central Time Zone – however all sessions will be recorded and made available allowing attendees from different time zones to follow.
Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.
In this three-day course, we provide a comprehensive practical and theoretical introduction to statistical machine learning using R. We start by introducing the concepts of supervised and unsupervised learning. We firstly explore unsupervised learning, and introduce k-means and
hierarchical clustering, as well as principal components analysis. We then move to supervised learning methods, and cover logistic regression and regularisation methods (such as ridge regression and the LASSO). After that, we introduce the k-nearest neighbours method, and classification and regression trees (CART). Finally, we explore extensions to CART, such as random forests and, if time allows, Bayesian additive regression trees (BART).
This course is aimed at anyone who is interested in statistical machine learning methods for clustering, classification or prediction, and using R fordata science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.
Delivered remotely
Time zone – Central Time Zone
Availability – TBC
Duration – 3 days
Contact hours – Approx. 14 hours
ECT’s – Equal to 1 ECT’s
Language – English
This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R.
Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. For the breaks between sessions, and between days, optional exercises will be provided. Solutions to these exercises and brief discussions of them will take place after each break.
The course will take place online using Zoom. On each day, the live video broadcasts will occur during UK local time at: • 6pm-10pm
All sessions will be video recorded and made available to all attendees as soon as possible.
Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously.
All the sessions will be video recorded, and made available immediately on a private video hosting website. Any materials, such as slides, data sets, etc., will be shared via GitHub
A basic understanding of R and statistical concepts. Specifically, linear regression models, statistical significance, and hypothesis testing.
Familiarity with R. Ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).
A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 1
Section 1: Introductory concepts in statistical machine learning. Unsupervised vs. supervised learning. Useful plots in classification and clustering tasks. Unsupervised learning methods: hierarchical clustering and the k-means method.
Section 2: Dimension reduction techniques and principal components analysis.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 2
Section 3: Regression and classification tasks. Supervised learning methods: linear and logistic regression, regularisation methods (ridge, LASSO and elastic net).
Section 4: More supervised learning methods: smoothing methods, splines, and generalized additive models. Cross-validation techniques.
Classes from 12:00 to 16:00 (Central Time Zone)
Day 3
Section 5: Tree-based methods. Classification and regression trees (CART), random forests.
Section 6: Extensions to tree-based methods. Bayesian additive regression trees (BART). Combining tree-based methods with a parametric regression framework.
Dr. Rafael De Andrade Moral