NTB-T05

Title

Guidelines for the assessment and analysis of lrRNA-seq data for transcript identification and quantification (LRGASP challenge)

Tutorial details

Date: Sunday, September 18th
Time: 09:00 to 13:00 CEST (Slot 22)
Format: Face-to-face
Room: TBD

Instructors

Ana Conesa/Alessandra Martinez, Institute for Integrative Systems Biology, Paterna, (Spain)

Fairlie Reese, University of California at Irvine | Mortazavi Lab
Colette Felton, University of California at Santa Cruz | Brooks Lab
Dennis Mulligan, University of California at Santa Cruz | Brooks Lab
Hagen Tilgner, Weill Cornell Medicine
Chen Ying, Genome Institute of Singapore | Gorke lab
Matthias Lienhard/Ralf Herwig, Max Planck Institute for Molecular Genetics
Silvia Carbonell/Julien Lagarde, Center for Genomic Regulation | Guigó Lab
Toby Hunt, European Bioinformatics Institute, GENCODE

Summary

Long read, single-molecule sequencing platforms such as Nanopore and Pacbio are increasingly being used for transcriptomics analysis leading to the long reads RNA-seq (lrRNA-seq) datasets.

Over the last years, these two sequencing platforms have improved in throughput and accuracy, and novel algorithms have been developed to analyze the data. Still, there are not yet clear guidelines on how the best method for accurate lrRNA-seq analysis is and how different methods compare to each other. The LRGASP is a community-wide initiative to benchmark long reads sequencing platforms, library preparation method and analysis pipelines using lrRNA-seq (https://www.gencodegenes.org/pages/LRGASP/). More than 50 different datasets were created and analyzed by a dozen of lrRNA-seq analysis tools.

Evaluation metrics were created to assess the accuracy of predicted transcript models and quantification of gene and transcript expression.

In this tutorial we will present the LRGASP analysis framework, discuss relevant results and lessons learned from the LRGASP project and train participants in the utilization of a diversity of pipelines for the analysis of both Nanopore and Pacbio lrRNA-seq data, as well as the LRGASP evaluation tools to assess the quality of the data. The aim of the tutorial is to provide an intensive training in tools for lrRNA-seq analysis and discuss best practices for the analysis of these data.

We will also discuss how LRGASP evaluation tools can be used to benchmark new or updated lrRNA-seq analysis tools developed by the community beyond the LRGASP contest.
The tutorial will introduce the LRGASP contest, datasets and benchmarking tools. Seminars will be provided by developers of different lrRNA-seq analysis tools on how to use their methods both for transcript identification, quality control, quantification and differential expression analysis.

These include FLAIR (Brooks Lab), TALON (Mortazavi Lab), IsoTools(Herwig lab), IsoQuant(Tilgner Lab), Bambu (Goke lab), LyRic (Guigó Lab), SQANTI (Conesa lab) and tappAS (Conesa lab). Finally, we will discuss with participants guidelines and best-practices in lrRNA-seq data analysis.

Dr. Conesa presented an Iso-seq tutorial (Pacbio RNA-seq) at ISMB2020 and ECCB2020. This proposal extends that tutorial to include a wide variety of tools for both Nanopore and Pacbio RNA-seq tools, as well as an extensive benchmarking framework for this type of applications. Our tutorial will contribute to disseminate good analysis practice among the transcriptome community starting to use long reads sequencing in their analysis.

Intended audience

Beginner or intermediate. This tutorial will be of broad interest to researchers from academia or industry who started to analyze Nanopore and Pacbio long reads transcriptomics dataset and need guidance on alternative analysis methods. The tutorial is also useful for developers of lrRNA-seq analysis tools as it presents an extensive benchmarking platform and gives access to the utilization to competing tools by their developers.

Prerequisites

Attendees are expected to have basic Unix command line skills. Programming knowledge is not required though most of the tools are written in Python.

Maximum number of attendees

Material required (for participants)

Attendees are expected to bring their own laptops and have installed SQANTI as this will include a hands-on part. Installation of other presented tools is possible if attendees wish to mimic the demos on their own computing. Instructions on how to install all software tools will be provided to participants ahead of the tutorial and a zoom session will be organized before the meeting to troubleshoot any installation problems

Provisional schedule

9:00am-9:30 am – Introduction lrRNA-seq, library preparation methods and analysis pipelines
9:30am-10:30 am – Short presentations of lrRNA-seq tools by their creators (focus detection)
- Bambu (Chen Ying)
- Flair (Dennis Mulligan)
- TALON (Fairlie Reese)

10:30am-10:45 am – Coffee break

10:45am-11:45 am – Short presentations of lrRNA-seq tools by their creators (focus quantification)
- IsoTools (Ralf Herwig)
- IsoQuant (Hagen Tilgner)
- tappAS (Ana Conesa)

11:45am-12:30pm – Evaluation of lrRNA-seq data
- LRGASP competition
- SQANTI. Demo / Hands-on
- Conclusion and best practices recommendations

TIME	CONTENT
9:00 – 9:30	Introduction lrRNA-seq, library preparation methods and analysis pipelines & LRGASP
9:30 – 10:50	Short presentations of lrRNA-seq tools by their creators (focus detection). – Bambu (Chen Ying) – TALON (Fairlie Reese) – Flair (Dennis Mulligan) – Bambu (Chen Ying) – LyRic (Julien Lagarde)
	Break
11:15 – 12:15	Short presentations of lrRNA-seq tools by their creators (focus quantification) – IsoTools (Ralf Herwig) – IsoQuant (Hagen Tilgner) – tappAS (Ana Conesa)
12:15 -12:50	Evaluation of lrRNA-seq data – SQANTI. Demo / Hands-on (12:15 – 12.30) – Evaluation of quantification (12:30 – 12.50)
12:50 -13:00	Conclusion and best practices recommendations

Title

Guidelines for the assessment and analysis of lrRNA-seq data for transcript identification and quantification (LRGASP challenge)

Tutorial details

Instructors

Summary

Intended audience

Prerequisites

Maximum number of attendees

Material required (for participants)

Provisional schedule

MARÍA RODRÍGUEZ-MARTÍNEZ

Interpretable AI for cancer personalized medicine

MAR ALBÀ

The emerging small proteome

RAÚL RABADÁN

Some mysteries about microbes and cancer

ANA TERESA FREITAS

Personalized medicine in the era of artificial intelligence

CÉSAR HIDALGO

How humans judge machines

GRACIELA GONZÁLEZ-HERNÁNDEZ

Mining for Digital Epidemiology: Overcoming the Challenges of Real World Data