This event has passed.

ONLINE COURSE – Data wrangling using R and Rstudio (DWRS03) This course will be delivered live

11th December 2023 - 14th December 2023

£250.00

Event Date

Monday, December 11th, 2023

COURSE FORMAT

This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

COURSE PROGRAM

TIME ZONE – Central Time Zone – however all sessions will be recorded and made available allowing attendees from different time zones to follow.

Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.

Course Details

During this course we provide a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R’s tidyverse, including dplyr, tidyr, purrr, etc. Data wrangling is the art of taking raw and messy data and formatting and cleaning it so that data analysis and visualization etc may be performed on it. Done poorly, it can be time consuming, laborious, and error-prone. Fortunately, the tools provided by R’s tidyverse allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequences for ease and speed with which we analyse data. We start with how to read data of different types into R, we then cover in detail all the dplyr tools such as select, filter, mutate, etc. Here, we will also cover the pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data on the other. We then cover how to perform descriptive or summary statistics on our data using dplyr’s summarize and group_by functions. We then turn to combining and merging data. Here, we will consider how to concatenate data frames, including concatenating all data files in a folder, as well as cover the powerful SQL like join operations that allow us to merge information in different data frames. The final topic we will consider is how to “pivot” data from a “wide” to “long” format and back using tidyr’s pivot_longer and pivot_wider.

Intended Audiences

This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.

Venue

Delivered remotely

Course Information

Time zone – GMT+1

Availability – TBC

Duration – 3 x 1/2 days

Contact hours – Approx. 12 hours

ECT’s – Equal to 1 ECT’s

Language – English

Teaching Format

Assumed quantitative knowledge

Coming soon..

Assumed computer background

Minimal prior experience with R and RStudio is required. Attendees should be familiar with some basic R syntax and commands, how to write code in the RStudio console and script editor, how to load up data from files, etc.

Equipment and software requirements

A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers.

Participants should be able to install additional software on their own computer during the course (please make sure you have administration rights to your computer).

A large monitor and a second screen, although not absolutely necessary, could improve the learning experience. Participants are also encouraged to keep their webcam active to increase the interaction with the instructor and other students.

Download R

Download RStudio

Download Zoom

PLEASE READ – CANCELLATION POLICY

Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.

If you are unsure about course suitability, please get in touch by email to find out more

oliverhooker@prstatistics.com

COURSE PROGRAMME

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 1

Topic 1: Reading in data. We will begin by reading in data into R using tools such as readr and readxl. Almost all types of data can be read into R, and here we will consider many of the main types, such as csv, xlsx, sav, etc. Here, we will also consider how to contol how data are parsed, e.g., so that they are read as dates, numbers, strings, etc.

Topic 2: Wrangling with dplyr. For the remainder of Day 1, we will next cover the very powerful dplyr R package. This package supplies a number of so-called “verbs” — select, rename, slice, filter, mutate, arrange, etc. — each of which focuses on a key data manipulation tools, such as selecting or changing variables. All of these verbs can be chained together using “pipes” (represented by %>%). Together, these create powerful data wrangling pipelines that take raw data as input and return cleaned data as output. Here, we will also learn about the key concept of “tidy data”, which is roughly where each row of a data frame is an observation and each column is a variable.

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 2

Topic 2 continued:

Topic 3: Summarizing data. The summarize and group_by tools in dplyr can be used with great effect to summarize data using descriptive statistics.

Classes from 12:00 to 16:00 (Central Time Zone)

DAY 3

Topic 4: Merging and joining data frames. There are multiple ways to combine data frames, with the simplest being “bind” operations, which are effectively horizontal or vertical concatenations. Much more powerful are the SQL like “join” operations. Here, we will consider the inner_join, left_join, right_join, full_join operations. In this section, we will also consider how to use purrr to read in and automatically merge large sets of files.

Topic 5: Pivoting data. Sometimes we need to change data frames from “long” to “wide” formats. The R package tidyr provides the tools pivot_longer and pivot_wider for doing this.

Course Instructor

- Dr. Rafael De Andrade Moral
- Rafael is an Associate Professor of Statistics at Maynooth University, Ireland. With a background in Biology and a PhD in Statistics from the University of São Paulo, Rafael has a deep passion for teaching and conducting research in statistical modelling applied to Ecology, Wildlife Management, Agriculture, and Environmental Science. As director of the Theoretical and Statistical Ecology Group, Rafael brings together a community of researchers who use mathematical and statistical tools to better understand the natural world. As an alternative teaching strategy, Rafael has been producing music videos and parodies to promote Statistics in social media and in the classroom. His personal webpage can be found here
- ResearchGate
  GoogleScholar
  ORCID
  GitHub

Let’s connect

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

Details

Start:: 11th December 2023
End:: 14th December 2023
Cost:: £250.00
Event Categories:: All Live Courses, Home Courses, Live Online Courses

Organisers

: Prof. David Warton
: Oliver Hooker (Course Organiser)

Venue

: Delivered remotely (United Kingdom)
: Western European Time, United Kingdom + Google Map

ONLINE COURSE – Data wrangling using R and Rstudio (DWRS03) This course will be delivered live

11th December 2023 - 14th December 2023

Monday, December 11th, 2023

COURSE FORMAT

COURSE PROGRAM

Course Details

Intended Audiences

Venue

Course Information

Teaching Format

Assumed quantitative knowledge

Assumed computer background

Equipment and software requirements

Tickets

COURSE PROGRAMME

Course Instructor

Let’s connect

General Info

Twitter

Facebook

Details

Organisers

Venue

Tickets

Let’s connect

General Info

Twitter

Facebook