Loading Events

« All Events

ONLINE COURSE – Data Exploration and Visualization for Ecologists and Evolutionary Biologists using Python (DEVP01) This course will be delivered live

24 June 2025 - 27 June 2025

£480.00
ONLINE COURSE – Data Exploration and Visualization for Ecologists and Evolutionary Biologists using Python (DEVP01) This course will be delivered live
Event Date

Tuesday, 24th June, 2025

Course Format

This is a ‘LIVE COURSE’ – the instructors will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

COURSE PROGRAM

TIME ZONE – UK (GMT+1) local time – however all sessions will be recorded and made available allowing attendees from different time zones to follow.

Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.

About This Course

This workshop aims to give novice programmers an introduction to data visualisation using Python for research in evolutionary biology and genomics by using biological examples throughout. We will use example datasets and problems themed around sequence analysis, taxonomy and ecology, with plenty of time for participants to work on their own research data.

Much of the popularity of Python stems from the availability of high quality libraries of existing code that we can use for our own projects. Libraries (“packages” in Python terminology) are even more useful when they are designed to work together. For scientific programming, we are lucky to have a collection of mature packages which work together to form a stack:

  1. Numpy for numerical processing
  2. Pandas for reading, cleaning and processing tabular data files
  3. Matplotlib as a low-level charting library
  4. Seaborn as a high-level charting library for rapid dataset exploration through visualization

In this course we will learn how to use these packages together to quickly explore large biological
datasets, find meaningful patterns in the data, and present our results clearly. We will focus on the high-level packages – pandas and seaborn – as this will allow us to do the most work with the smallest amount of code. By concentrating on just two packages for an entire course, we will be able to cover a large part of what these tools can do.

Intended Audiences

The course is intended for anyone interested in using Python for analysis and visualization of biological datasets. Some previous experience of Python IS required, as we won’t cover the absolute basics of the language, so you will need to know the very basic syntax. The introduction to Python for Biologists course gives a suitable background. If you want to come on this course but have no Python experience, get in touch at martin@pythonforbiologists.com and I can suggest resources to get up to speed.

This course includes plenty of practical time, including opportunities to work on your own datasets, so it might be particularly suitable for people at the start of the data analysis stage of a project.

Venue

Delivered remotely

Course Details

Time zone – UK (GMT+1) local time

Availability – 20

Duration – 4 days, 8 hours per day

Contact hours – Approx. 28 hours

ECT’s – Equal to 3 ECT’s

Language – English

Teaching Format

Lectures/discussions of Python code, libraries and techniques delivered using interactive
notebooks. Workshop/practical time for students to tackle carefully designed programming
challenges that use the material from the discussion sessions. Usually followed up by
discussion of solutions, wrap up and summarisation.

Assumed quantative knowledge

Very modest; you should be familiar with basic descriptive statistics and how to read
common chart types like box plots and scatter plots.

Assumed computer background

This course assumes a background knowledge of Python syntax, so is not suitable for
complete beginners to programming. If you have any questions about whether the course is
suitable, don’t hesitate to email martin@pythonforbiologists.com to chat.

Equipment and software requirements
A laptop computer with a working version of Python is required. Python is free and open-source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their computers during the course (please ensure you have administration rights to your computer).

Although not absolutely necessary, a large monitor and a second screen could improve the learning experience. Participants are also encouraged to keep their webcams active to increase their interaction with the instructor and other students.

Download Python

 

 

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
DEVP01 ONLINE
DEVP01 ONLINE
£ 480.00
20 available

PLEASE READ – CANCELLATION POLICY

Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited

If you are unsure about course suitability, please get in touch by email to find out more oliverhooker@prstatistics.com

COURSE PROGRAMME

Tuesday 24th

Day 1  – Classes form 09:30 – 17:30

Session 1: Environment, packages, data files and data model

  • The first session is mostly concerned with setting up our analysis environment and understanding how all the parts of the software stack work together. We'll introduce the Juptyer notbook system, make sure we have the correct versions of all the packages we need, and get an overview of the various data files that we will be using for examples. This preparatory work will help to make the rest of the course go smoothly. We will see how to load the different data files into pandas, and look at some tools for getting very high-level overviews of datasets. This will allow us to introduce some core ideas in pandas – series, indices and data types – that will be necessary in order to understand later material.

Session 2: Series objects and thinking in columns

  • In this session we address the main difference between working with core Python objects and working with pandas: the need to think about operating on entire columns of values rather than one value at a time. Looking at a large number of examples will help to make this clear. Once we start thinking in this way, we will find that we can do many common data processing tasks – filtering rows and columns, creating new columns, sorting, and summarizing columns – with very little code. After a look at some special types of filtering that require slightly different syntax, we are in a position to practice solving some fairly tricky
    data analysis questions, than involve a mixture of selecting, filtering and aggregating columns
Wednesday 25th

Day 2  – Classes form 09:30 – 17:30

Session 3: Introducing seaborn

  • In this session we will turn our attention from data *analysis* (which normally produces tables of values as output) to data *visualization* (which produces figures as output). We'll start with an overview of the seaborn package then dive straight in to the core chart types for looking at distributions and relationships. Histograms, kernel density plots and scatter plots are covered in this session, along with a few more exotic chart types like hex plot and contour plots, which can be useful alternatives to scatter plots when we have very large numbers of points to deal with. In this session we will also explore the power of seaborn’s ability to map dataframe columns to things like marker size, shape and colour, and to easily make small multiple plots. Just like with pandas, by the end of this session we will understand how to make complex charts with only a small amount of code.

Session 4: Categorical axes with seaborn

  • In this session we will survey the surprisingly large number of options we have for displaying categorical data. These include very common chart types like strip plots, box plots and bar plots, along with less common types like swarm, violin and boxen plots. This diversity of chart types will be a good chance to discuss the trade offs involved in creating visualizations of differing levels of detail. We will also look at a few more options for determining the style and appearance of charts, with a particular focus on the use of colour. We will learn about best practices for using colour effectively along with pitfalls to avoid.
Thursday 26th

Day 3 – Classes from 09:30 – 17:30

Session 5: Grouping and categories with pandas

  • Now that we have a good overview of seaborn, we will return to pandas to start looking at some more advanced techniques, which we will now be able to illustrate with charts. We will start this session with a detailed look at types of categorical data, before diving into a very important part of the pandas API: grouping. We will see how to divide data points into groups in many different ways: using existing columns, custom functions, and bins. The last option – binning data – is particularly powerful as it allows us to effectively turn continuous numerical values into categorical ones, opening up new options for visualization.

Session 6: Long vs. wide form data and heatmaps

  • In this session we will take the tools that we have already seen for reshaping dataframes in pandas and put them into the framework of tidy or long-form data versus summary or wide-form data. As always in data exploration, both options are useful in different contexts. We will also take a look at the last major chart type offered by seaborn: the heatmap (and closely related clustermap). Heatmaps are unusual in that they are a one-to-one reflection of cells in a summary table. We will see how we can use heatmaps to solve
    several common tricky data visualization
Friday 27th

Day 4 – Classes form  09:30 – 17:30

Session 7: Complex data files with pandas

  • To make things easier when getting started, all of the data files we've used in the course up to this point have been designed to be straightforward to use. However, real life datasets will often not be so cooperative. In this session we'll look at some common features of datasets that can be difficult to work with, and see what tools pandas gives us to overcome these difficulties. We'll look at examples of datasets that have missing and invalid values; that are spread over multiple files; and that are too large to easily fit into memory. All of these challenges can be overcome with some careful use of the pandas API.

Session 8: High performance pandas

  • One of the core themes running through this course is the importance of being able to quickly iterate on tables and charts. Being able to evaluate an expression or draw a chart quickly is key to this process: if we have to wait a long time for the results of a piece of code then our work becomes much harder. In this session we will look at some best practices for making sure that code runs quickly, and some options for what to do when code is too slow. Some of these ideas are quite technical, but the principles are useful for many different types of programming. We'll discuss the difference between loops and vectorized
    code, *caching* and *memoization*, sampling, and how pandas-specific tools like indices and categories influence performance.
Dr. Martin Jones
Dr. Martin Jones

Martin a freelance trainer specialising in teaching programming (mostly Python) and Linux skills to researchers in the field of biology. He trained as a biologist and completed his PhD in large-scale phylogenetics in 2007, then held a number of academic positions at the University of Edinburgh ending in a two year stint as Lecturer in Bioinformatics. I launched Python for Biologists in 2015 and have been teaching and writing full-time ever since.

Details

Start:
24 June 2025
End:
27 June 2025
Cost:
£480.00
Event Categories:
, ,

Venue

Delivered remotely (United Kingdom)
Western European Time Zone, United Kingdom

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
DEVP01 ONLINE
DEVP01 ONLINE
£ 480.00
20 available