Event Date
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – GMT – however all sessions will be recorded and made available allowing attendees from different time zones to follow.
Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you).
This intensive 4-day course provides an in-depth exploration of machine learning using the popular open-source statistical software, R. Participants are assumed to have a basic working knowledge of regression and supervised learning techniques and so will gain a further understanding of various intermediate and advanced machine learning algorithms, how they work, and how to implement them using R’s ecosystem of packages. Real-world data sets will be used to offer hands-on experience and help participants understand the practical applications of the covered concepts.
By the end of this course, students should be able to:
Academics and post-graduate students working on projects where advanced machine learning and predictive modelling will be useful.
Delivered remotely
Availability – TBC
Duration – 4 days
Contact hours – Approx. 28 hours
ECT’s – Equal to 3 ECT’s
Language – English
Each day will consist of 2-3 lectures with regular discussion and Q&A sessions. In the afternoons we will cover guided practicals (tutors and students running code and explaining results through worked examples and case studies) and self-guided exercise sheets. Students are welcome to bring their own data and discuss it with the tutors.
A basic understanding of statistical concepts such as linear and logistic regression models. Basic machine learning techniques such as Random Forests, Gradient Boosting, k-NN, SVMs.
Good familiarity with R. Ability to import/export data, manipulate data frames, fit basic machine learning models (list above) and generate simple exploratory and diagnostic plots.
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).
A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
Classes from 09:30 to 17:30 GMT+1
DAY 1
Deep Dive into Supervised Learning
We begin with an introduction to Deep Learning in which we cover the basic concepts and its difference from traditional machine learning. We then extend to Convolutional Neural Networks (CNNs), exploring their architecture, their use in image and video processing, and their role in object detection and recognition. Finally we cover time series models through Recurrent Neural Networks (RNNs) and their application in sequential data analysis and natural language processing.
In the afternoon sessions we implement CNNs and RNNs using real data sets
R Packages used: keras, tensorflow
Classes from 09:30 to 17:30 GMT+1
DAY 2
Advanced Supervised Learning Techniques
On day 2 we cover Transformer models and Bayesian machine learning techniques. We start by understanding the transformer architecture, its self-attention mechanism, and its use in natural language processing tasks. We then cover the basics of Bayesian inference and explore its use in classification and regression tasks, and compare it to traditional machine learning methods.
In the afternoon sessions the students can choose whether they explore either the Transformer or Bayesian methods further by following and extending some example R scripts.
R Packages: keras, tensorflow, rstan, brms, BART
Classes from 09:30 to 17:30 GMT+1
DAY 3
Unsupervised Learning – Clustering and Dimension Reduction
The third day will focus on advanced clustering techniques and dimension reduction. We start by exploring clustering techniques including hierarchical clustering, DBSCAN, and their use in segmentation. We then cover dimension reduction techniques; starting with PCA and extending to t-SNE and UMAP. We explain how these techniques work and explore their use in visualisation of data sets with high dimensions.
In the afternoon session students will explore the use of these techniques through real-world data sets.
R Packages: cluster, dbscan, factoextra, Rtsne, umap
Classes from 09:30 to 17:30 GMT+1
DAY 4
Unsupervised Learning – Anomaly Detection and Course Wrap-up
On the final day we will focus on anomaly detection techniques and bringing together the topics covered throughout the course. We start with various anomaly detection techniques and demonstrate their use in e.g. fraud detection, network security, and health monitoring. We then provide a discussion session where we review the content of the course and talk about future steps in Machine Learning.
In the afternoon students have the opportunity to work on their own data sets and ask questions of the course instructor.
R Packages: anomalize, forecast, e1071
Andrew Parnell is the Hamilton Professor of Statistics in the Hamilton Institute at Maynooth University. His research is in statistics and machine learning for large structured data sets in a variety of application areas. He has co-authored over 90 peer-reviewed papers in journals such as Science, Nature Communications, and Proceedings of the National Academy of Sciences, and has methodological publications in journals such as Statistics and Computing, Journal of Computational and Graphical Statistics, The Annals of Applied Statistics, and Journal of the Royal Statistical Society: Series C. He has many years experience in teaching Bayesian statistics, time series modelling, and statistical machine learning to students at every level from undergraduate to PhD. He enjoys collaborating with other scientists in areas as diverse as climate change, 3D printing, and bioinformatics.