Event Date
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Central Time Zone – however all sessions will be recorded and made available allowing attendees from different time zones to follow.
Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.
COURSE DETAILS
This course provides a comprehensive practical and theoretical introduction to generalized linear models using R. Generalized linear models are generalizations of linear regression models for situations where the outcome variable is, for example, a binary, or ordinal, or count variable, etc. The specific models we cover include binary, binomial, and categorical logistic regression, Poisson and negative binomial regression for count variables, as well as extensions for overdispersed and zero-inflated data. We begin by providing a brief overview of the normal general linear model. Understanding this model is vital for the proper understanding of how it is generalized in generalized linear models. Next, we introduce the widely used binary logistic regression model, which is is a regression model for when the outcome variable is binary. Next, we cover the binomial logistic regression, and the multinomial case, which is for modelling outcomes variables that are polychotomous, i.e., have more than two categorically distinct values. We will then cover Poisson regression, which is widely used for modelling outcome variables that are counts (i.e the number of times something has happened). We then cover extensions to accommodate overdispersion, starting with the quasi-likelihood approach, then covering the negative binomial and beta-binomial models for counts and discrete proportions, respectively. Finally, we will cover zero-inflated Poisson and negative binomial models, which are for count data with excessive numbers of zero observations.
This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.
Delivered remotely
Time zone – GMT+1
Availability – TBC
Duration – 3 x 1/2 days
Contact hours – Approx. 12 hours
ECT’s – Equal to 1 ECT’s
Language – English
This course will be largely practical, hands-on, and workshop based. For each topic, there will first be some lecture style presentation, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Then, we will cover how to perform the various statistical analyses using R. Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session. For the breaks between sessions, and between days, optional exercises will be provided. Solutions to these exercises and brief discussions of them will take place after each break.
The course will take place online using Zoom. On each day, the live video broadcasts will occur during UK local time at:
• 10am-12pm
• 1pm-3pm
• 4pm-6pm
All sessions will be video recorded and made available to all attendees as soon as possible, hopefully soon after each 2hr session.
If some sessions are not at a convenient time due to different time zones, attendees are encouraged to join as many of the live broadcasts as possible. For example, attendees from North America may be able to join the live sessions from 3pm-5pm and 6pm-8pm, and then catch up with the 12pm-2pm recorded session once it is uploaded. By joining any live sessions that are possible will allow attendees to benefit from asking questions and having discussions, rather than just watching prerecorded sessions.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the procedure for asking questions and raising comments.
Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously.
All the sessions will be video recorded, and made available immediately on a private video hosting website. Any materials, such as slides, data sets, etc., will be shared via GitHub
A basic understanding of statistical concepts. Specifically, generalised linear regression models, statistical significance, hypothesis testing.
Familiarity with R. Ability to import/export data, manipulate data frames, fit basic statistical models & generate simple exploratory and diagnostic plots.
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).
A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.
PLEASE READ – CANCELLATION POLICY
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
If you are unsure about course suitability, please get in touch by email to find out more oliverhooker@prstatistics.com
Classes from 12:00 to 16:00 (Central Time Zone)
Topic 1: The general linear model. We begin by providing an overview of the normal, as in normal distribution, general linear model, including using categorical predictor variables. Although this model is not the focus of the course, it is the foundation on which generalized linear models are based and so must be understood to understand generalized linear models.
Topic 2: Binary logistic regression. Our first generalized linear model is the binary logistic regression model, for use when modelling binary outcome data. We will present the assumed theoretical model behind logistic regression, implement it using R’s glm, and then show how to interpret its results, perform predictions, and (nested) model comparisons.
Topic 3: Binomial logistic regression. Here, we show how the binary logistic regresion can be extended to deal with data on discrete proportions. We will also present alternative link functions to the logit, such as the probit and complementary log-log links.
Classes from 12:00 to 16:00 (Central Time Zone)
Topic 4: Categorical logistic regression. Categorical logistic regression, also known as multinomial logistic regression, is for modelling polychotomous data, i.e. data taking more than two categorically distinct values. Like ordinal logistic regression, categorical logistic regression is also based on an extension of the binary logistic regression case.
Topic 5: Poisson regression. Poisson regression is a widely used technique for modelling count data, i.e., data where the variable denotes the number of times an event has occurred.
Classes from 12:00 to 16:00 (Central Time Zone)
Topic 6: Overdispersion models. The quasi-likelihood approach for both the Poisson and binomial models. Negative binomial regression. The negative binomial model is, like the Poisson regression model, used for unbounded count data, but it is less restrictive than Poisson regression, specifically by dealing with overdispersed data. Beta-binomial regression. The beta-binomial model is an overdispersed alternative to the binomial.
Topic 7: Zero inflated models. Zero inflated count data is where there are excessive numbers of zero counts that can be modelled using either a Poisson or negative binomial model. Zero inflated Poisson or negative binomial models are types of latent variable models.
Dr. Rafael De Andrade Moral
Rafael is an Associate Professor of Statistics at Maynooth University, Ireland. With a background in Biology and a PhD in Statistics from the University of São Paulo, Rafael has a deep passion for teaching and conducting research in statistical modelling applied to Ecology, Wildlife Management, Agriculture, and Environmental Science. As director of the Theoretical and Statistical Ecology Group, Rafael brings together a community of researchers who use mathematical and statistical tools to better understand the natural world. As an alternative teaching strategy, Rafael has been producing music videos and parodies to promote Statistics in social media and in the classroom. His personal webpage can be found here
ResearchGate
GoogleScholar
ORCID
GitHub