NTB-T05

Title

Guidelines for the assessment and analysis of lrRNA-seq data for transcript identification and quantification (LRGASP challenge)

Tutorial details
  • Date: Sunday, September 18th
  • Time: 09:00 to 13:00 CEST (Slot 22)
  • Format: Face-to-face
  • Room: TBD
Instructors
  • Ana Conesa/Alessandra Martinez, Institute for Integrative Systems Biology, Paterna, (Spain)
  • Fairlie Reese, University of California at Irvine | Mortazavi Lab
  • Colette Felton, University of California at Santa Cruz | Brooks Lab
  • Dennis Mulligan, University of California at Santa Cruz | Brooks Lab
  • Hagen Tilgner, Weill Cornell Medicine
  • Chen Ying, Genome Institute of Singapore | Gorke lab
  • Matthias Lienhard/Ralf Herwig, Max Planck Institute for Molecular Genetics
  • Silvia Carbonell/Julien Lagarde, Center for Genomic Regulation | Guigó Lab
  • Toby Hunt, European Bioinformatics Institute, GENCODE
Summary

Long read, single-molecule sequencing platforms such as Nanopore and Pacbio are increasingly being used for transcriptomics analysis leading to the long reads RNA-seq (lrRNA-seq) datasets.

Over the last years, these two sequencing platforms have improved in throughput and accuracy, and novel algorithms have been developed to analyze the data. Still, there are not yet clear guidelines on how the best method for accurate lrRNA-seq analysis is and how different methods compare to each other. The LRGASP is a community-wide initiative to benchmark long reads sequencing platforms, library preparation method and analysis pipelines using lrRNA-seq (https://www.gencodegenes.org/pages/LRGASP/). More than 50 different datasets were created and analyzed by a dozen of lrRNA-seq analysis tools.

Evaluation metrics were created to assess the accuracy of predicted transcript models and quantification of gene and transcript expression.

In this tutorial we will present the LRGASP analysis framework, discuss relevant results and lessons learned from the LRGASP project and train participants in the utilization of a diversity of pipelines for the analysis of both Nanopore and Pacbio lrRNA-seq data, as well as the LRGASP evaluation tools to assess the quality of the data. The aim of the tutorial is to provide an intensive training in tools for lrRNA-seq analysis and discuss best practices for the analysis of these data.

We will also discuss how LRGASP evaluation tools can be used to benchmark new or updated lrRNA-seq analysis tools developed by the community beyond the LRGASP contest.
The tutorial will introduce the LRGASP contest, datasets and benchmarking tools. Seminars will be provided by developers of different lrRNA-seq analysis tools on how to use their methods both for transcript identification, quality control, quantification and differential expression analysis.

These include FLAIR (Brooks Lab), TALON (Mortazavi Lab), IsoTools(Herwig lab), IsoQuant(Tilgner Lab), Bambu (Goke lab), LyRic (Guigó Lab), SQANTI (Conesa lab) and tappAS (Conesa lab). Finally, we will discuss with participants guidelines and best-practices in lrRNA-seq data analysis.

Dr. Conesa presented an Iso-seq tutorial (Pacbio RNA-seq) at ISMB2020 and ECCB2020. This proposal extends that tutorial to include a wide variety of tools for both Nanopore and Pacbio RNA-seq tools, as well as an extensive benchmarking framework for this type of applications. Our tutorial will contribute to disseminate good analysis practice among the transcriptome community starting to use long reads sequencing in their analysis.

Intended audience

Beginner or intermediate. This tutorial will be of broad interest to researchers from academia or industry who started to analyze Nanopore and Pacbio long reads transcriptomics dataset and need guidance on alternative analysis methods. The tutorial is also useful for developers of lrRNA-seq analysis tools as it presents an extensive benchmarking platform and gives access to the utilization to competing tools by their developers.

Prerequisites

Attendees are expected to have basic Unix command line skills. Programming knowledge is not required though most of the tools are written in Python.

Maximum number of attendees

40

Material required (for participants)

Attendees are expected to bring their own laptops and have installed SQANTI as this will include a hands-on part. Installation of other presented tools is possible if attendees wish to mimic the demos on their own computing. Instructions on how to install all software tools will be provided to participants ahead of the tutorial and a zoom session will be organized before the meeting to troubleshoot any installation problems

Provisional schedule
  • 9:00am-9:30 am – Introduction lrRNA-seq, library preparation methods and analysis pipelines
  • 9:30am-10:30 am – Short presentations of lrRNA-seq tools by their creators (focus detection)
    • Bambu (Chen Ying)
    • Flair (Dennis Mulligan)
    • TALON (Fairlie Reese)
  • 10:30am-10:45 am – Coffee break
  • 10:45am-11:45 am – Short presentations of lrRNA-seq tools by their creators (focus quantification)
    • IsoTools (Ralf Herwig)
    • IsoQuant (Hagen Tilgner)
    • tappAS (Ana Conesa)
  • 11:45am-12:30pm – Evaluation of lrRNA-seq data
    • LRGASP competition
    • SQANTI. Demo / Hands-on
    • Conclusion and best practices recommendations
TIMECONTENT
9:00 – 9:30Introduction lrRNA-seq, library preparation methods and analysis pipelines & LRGASP
9:30 – 10:50Short presentations of lrRNA-seq tools by their creators (focus detection).
– Bambu (Chen Ying)
– TALON (Fairlie Reese)
– Flair (Dennis Mulligan)
– Bambu (Chen Ying)
– LyRic (Julien Lagarde)
Break
11:15 – 12:15Short presentations of lrRNA-seq tools by their creators (focus quantification)
– IsoTools (Ralf Herwig)
– IsoQuant (Hagen Tilgner)
– tappAS (Ana Conesa)
12:15 -12:50Evaluation of lrRNA-seq data
– SQANTI. Demo / Hands-on (12:15 – 12.30)
– Evaluation of quantification (12:30 – 12.50)
12:50 -13:00Conclusion and best practices recommendations