NTB-T03

Title

Deep Learning For Biological Sequence Data: From Convolutional Neural Networks To Transformers

Tutorial details

Date: Sunday, September 18th
Time: 09:00 to 13:00 CEST (Slot 21)
Format: Face-to-face
Room: TBD

Instructors

Panagiotis Alexiou, PhD. Central European Institute of Technology, Masaryk University (Czech Republic)
Petr Simecek, PhD. Central European Institute of Technology, Masaryk University (Czech Republic)
David Cechak, PhD student. Central European Institute of Technology, Masaryk University (Czech Republic)
Vlastimil Martinek, PhD student. Central European Institute of Technology, Masaryk University (Czech Republic)

Summary

Computational Biologists have been using Machine Learning techniques based on Artificial Neural Networks for decades. New developments in the Machine Learning field over the past years have revolutionized the efficiency of Neural Networks and brought us to the era of Deep Learning. In the news, you can read about Deep Learning beating experts in Go, Chess and StarCraft, translating texts and speech between languages, turning the steering wheels of self-driving cars and even tagging kittens, Not-Hotdogs in images. In our field, we have witnessed such systems reaching competitive accuracy with experienced radiologists, predicting the folding of proteins and calling single nucleotide polymorphisms in genomic data better than any other method.
In this tutorial we utilize four powerful components that are freely available for use:
TensorFlow is an open-source library for deep learning and machine learning in general. Thanks to the second one, Google Collaboratory, computational resources needed to train
TensorFlow models are available without cost. The third, TensorFlow.js, will enable us to deploy the trained model as a static web page that can be easily hosted, e.g. on GitHub Pages. And finally, Hugging Face libraries, datasets, and models will enable us to run complex transformer models with four lines of code.
The key part of the tutorial will be the evaluation and interpretation of the trained model. What could go wrong and how to diagnose it? We will start with simple techniques, like measuring the impact of simple perturbation, and end with an Integrated Gradient method to identify parts of the input that mostly contributed to the decision.

Intended audience

This tutorial is intended for students and practitioners interested in getting their hands dirty with neural networks. It is designed to be an introduction and a starting point for further work and study. Beginners are welcome. Familiarity with Python is necessary, and experience with Jupyter Notebooks, pandas & NumPy will be useful.

Prerequisites

Maximum number of attendees

Material required (for participants)

Participants should bring their own laptops and a web browser (not IE 6.0). They will need to have a Google account (e.g. Gmail) to access Collaboratory.

Provisional schedule

09:00 – 10:00 Part I: Introduction

Intro to deep learning: What are neural networks and why have they become so popular?
MNIST dataset: Fully connected vs. convolutional neural networks
Classification of biomedical images: ImageNet, ResNet, transfer learning
CNN applied to sequential (genomic) data

10:00 – 10:30 Part II: Recurrent Neural Networks

10:30 – 11:00 Break

11:00 – 12:00 Part III: Deployment and interpretation

Save & convert: how to save TF model and how to convert it to TFjs
Interpretation: simple techniques like measuring the impact of a random change on a position or interpretation of the first layer convolutions
Integrated Gradients: explanation and examples of use

12:00 – 12:30 Part IV: Transformers

What is a transformer? Intro to the architecture, differences from CNN and RNN, and the HuggingFace transformers library
How to use pre-trained models? Example on pre-trained proteins transformers using HuggingFace

TIME	CONTENT
09:00 – 10:00	Part I: Introduction Intro to deep learning: What are neural networks and why have they become so popular? MNIST dataset: Fully connected vs. convolutional neural networks Classification of biomedical images: ImageNet, ResNet, transfer learning CNN applied to sequential (genomic) data
10:00 – 10:30	Part II: Recurrent Neural Networks
10:30 – 11:00	Break
11:00 – 12:00	Part III: Deployment and interpretation Save & convert: how to save TF model and how to convert it to TFjs Interpretation: simple techniques like measuring the impact of a random change on a position or interpretation of the first layer convolutions Integrated Gradients: explanation and examples of use
12:00 – 12:30	Part IV: Transformers What is a transformer? Intro to the architecture, differences from CNN and RNN, and the HuggingFace transformers library How to use pre-trained models? Example on pre-trained proteins transformers using HuggingFace

Title

Deep Learning For Biological Sequence Data: From Convolutional Neural Networks To Transformers

Tutorial details

Instructors

Summary

Intended audience

Prerequisites

Maximum number of attendees

Material required (for participants)

Provisional schedule

MARÍA RODRÍGUEZ-MARTÍNEZ

Interpretable AI for cancer personalized medicine

MAR ALBÀ

The emerging small proteome

RAÚL RABADÁN

Some mysteries about microbes and cancer

ANA TERESA FREITAS

Personalized medicine in the era of artificial intelligence

CÉSAR HIDALGO

How humans judge machines

GRACIELA GONZÁLEZ-HERNÁNDEZ

Mining for Digital Epidemiology: Overcoming the Challenges of Real World Data