Event Date
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Ireland local time – however all sessions will be recorded and made available allowing attendees from different time zones to follow.
Please email oliverhooker@prstatistics.com for full details or to discuss how we can accommodate you.
This course comprehensively introduces the Tidyverse and focuses on its use in data science projects. It is designed to give participants a strong foundation in R programming, core Tidyverse packages, and the Tidymodels framework. The course emphasises hands-on projects to apply learned concepts to real-world data analysis and modelling tasks applied to biology. By the end of the course, participants should:
Understand the fundamentals of R programming for data analysis.
Be proficient in using core Tidyverse packages to clean, transform, and visualise data.
Gain an introduction to basic machine learning concepts through the Tidymodels framework.
Learn to preprocess, build, evaluate, and interpret models using Tidymodels.
Apply Tidyverse and Tidymodels tools to solve real-world problems through hands-on projects.
Delivered remotely
Time zone – Ireland local time
Availability – TBC
Duration – 5 days
Contact hours – Approx. 35 hours
ECT’s – Equal to 3 ECT’s
Language – English
Introductory and Intermediate-level lectures interspersed with hands-on projects. The instructors will provide datasets, but participants are welcome to bring their data. Any code that the instructor produces during these sessions will be uploaded to a publicly available GitHub site after each session.
All sessions will be video recorded and made available to all attendees as soon as possible. If some sessions are not at a convenient time due to different time zones, attendees are encouraged to join as many of the live broadcasts as possible.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the procedure for asking questions and raising comments.
No quantitative knowledge is required for this module.
Day one will cover the basics of R for the module. However, some familiarity with any other programming language is welcome.
A computer with a working version of R or RStudio is required. R and RStudio are free and open-source software for PCs, Macs, and Linux computers.
Participants should be able to install additional software on their computers during the course (please ensure you have computer administration rights).
Although not absolutely necessary, a large monitor and a second screen could improve the learning experience. Participants are also encouraged to keep their webcams active to increase their interaction with the instructor and other students.
Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@prstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees will be credited.
Day 1: A Short Course in R Basics (9:30 – 17:30)
This day provides participants with the foundational R skills required for working with Tidyverse and
Tidymodels. It is designed for beginners or those needing a refresher in R programming.
Section 1 (R Essentials): This section focuses on R syntax, variables, data types, conditionals (`if`, `else`, `elif`), loops (`for`, `while`), and writing reusable code using functions.
Section 2 (Data Structures and File Handling in R): This section emphasises understanding data structures (e.g., vectors, data frames, lists) and handling files by reading/writing data (e.g., CSVs) for manipulation and analysis.
Day 2: Fundamentals of Tidyverse I (9:30 – 17:30)
This day introduces participants to the foundational concepts of Tidyverse packages and their
applications to data science projects.
Section 3 (Data Manipulation I): This section covers the basics of data manipulation using `dplyr` functions such as `filter()`, `select()`, `mutate()`, `arrange()`, and `summarise ()`. Participants will learn how to clean, transform, and prepare datasets for analysis.
Section 4 (Data Visualisation I): This section introduces the principles of data visualisation using `ggplot2`. Participants will learn how to create basic plots such as scatterplots, bar charts, and line graphs while exploring the grammar of graphics.
Day 3: Fundamentals of Tidyverse II (9:30 – 17:30)
This day builds on the foundations established in Day 2 and dives deeper into advanced data
manipulation and visualisation techniques.
Section 5 (Data Manipulation II): This section extends the use of `dplyr` by introducing more
complex operations such as joins, grouping with `group_by()`, and working with pipelines using
`%>%`. Finally, additional packages will be presented to enhance data manipulation
programming.
Section 6 (Data Visualisation II): Participants will explore advanced visualisation techniques
using extensions of `ggplot2`, such as creating animated plots with the `gganimate` package and
interactive visualisations with additional tools.
Day 4: Applying Tidyverse Fundamentals to Data Modelling (9:30 – 17:30)
This day introduces participants to machine learning concepts using core libraries for statistical modelling and deep learning.
Section 7 (Introduction to regression): This section focuses on regression modelling using
Tidymodels. Participants will learn to implement linear regression models, evaluate model
performance, and interpret results.
Section 8 (Introduction to Classification): This section introduces techniques such as support
vector machines and neural networks using Tidymodels. Participants will also explore methods
for assessing the performance of classification models.
Day 5: Data Science Workflow with Tidyverse (9:30 – 17:30)
On the final day, participants will apply all their newly acquired skills to solve real-world problems
inspired by ecological datasets.
Section 9 (The data science workflow): The workflow will be illustrated based on the core
packages introduced. The book "R for Data Science" will serve as a base literature for this day
Section 10 (Hands-on project): Participants will work through a complete data science workflow, including data cleaning, transformation, visualisation, modelling, and communication of results.
Dr. Gabriel Palma
Gabriel R. Palma obtained a B.Sc. in Biology from the University of São Paulo, Brazil in 2021. He is currently a PhD researcher at the Hamilton Institute at Maynooth University, Ireland, funded by the Science Foundation Ireland’s Centre for Research Training in Foundations of Data Science. His research interests include statistical and mathematical modelling, machine vision, machine learning, and applications to ecology and entomology. His personal webpage can be found here