NTB-T03

Title

Deep Learning For Biological Sequence Data: From Convolutional Neural Networks To Transformers

Tutorial details
  • Date: Sunday, September 18th
  • Time: 09:00 to 13:00 CEST (Slot 21)
  • Format: Face-to-face
  • Room: TBD
Instructors
  • Panagiotis Alexiou, PhD. Central European Institute of Technology, Masaryk University (Czech Republic)
  • Petr Simecek, PhD. Central European Institute of Technology, Masaryk University (Czech Republic)
  • David Cechak, PhD student. Central European Institute of Technology, Masaryk University (Czech Republic)
  • Vlastimil Martinek, PhD student. Central European Institute of Technology, Masaryk University (Czech Republic)
Summary

Computational Biologists have been using Machine Learning techniques based on Artificial Neural Networks for decades. New developments in the Machine Learning field over the past years have revolutionized the efficiency of Neural Networks and brought us to the era of Deep Learning. In the news, you can read about Deep Learning beating experts in Go, Chess and StarCraft, translating texts and speech between languages, turning the steering wheels of self-driving cars and even tagging kittens, Not-Hotdogs in images. In our field, we have witnessed such systems reaching competitive accuracy with experienced radiologists, predicting the folding of proteins and calling single nucleotide polymorphisms in genomic data better than any other method.
In this tutorial we utilize four powerful components that are freely available for use:
TensorFlow is an open-source library for deep learning and machine learning in general. Thanks to the second one, Google Collaboratory, computational resources needed to train
TensorFlow models are available without cost. The third, TensorFlow.js, will enable us to deploy the trained model as a static web page that can be easily hosted, e.g. on GitHub Pages. And finally, Hugging Face libraries, datasets, and models will enable us to run complex transformer models with four lines of code.
The key part of the tutorial will be the evaluation and interpretation of the trained model. What could go wrong and how to diagnose it? We will start with simple techniques, like measuring the impact of simple perturbation, and end with an Integrated Gradient method to identify parts of the input that mostly contributed to the decision.

Intended audience

This tutorial is intended for students and practitioners interested in getting their hands dirty with neural networks. It is designed to be an introduction and a starting point for further work and study. Beginners are welcome. Familiarity with Python is necessary, and experience with Jupyter Notebooks, pandas & NumPy will be useful.

Prerequisites
Maximum number of attendees

35

Material required (for participants)

Participants should bring their own laptops and a web browser (not IE 6.0). They will need to have a Google account (e.g. Gmail) to access Collaboratory.

Provisional schedule
  • 09:00 – 10:00 Part I: Introduction

Intro to deep learning: What are neural networks and why have they become so popular?
MNIST dataset: Fully connected vs. convolutional neural networks
Classification of biomedical images: ImageNet, ResNet, transfer learning
CNN applied to sequential (genomic) data

  • 10:00 – 10:30 Part II: Recurrent Neural Networks
  • 10:30 – 11:00 Break
  • 11:00 – 12:00 Part III: Deployment and interpretation

Save & convert: how to save TF model and how to convert it to TFjs
Interpretation: simple techniques like measuring the impact of a random change on a position or interpretation of the first layer convolutions
Integrated Gradients: explanation and examples of use

  • 12:00 – 12:30 Part IV: Transformers

What is a transformer? Intro to the architecture, differences from CNN and RNN, and the HuggingFace transformers library
How to use pre-trained models? Example on pre-trained proteins transformers using HuggingFace

TIME CONTENT
09:00 – 10:00Part I: Introduction
Intro to deep learning: What are neural networks and why have they become so popular?
MNIST dataset: Fully connected vs. convolutional neural networks
Classification of biomedical images: ImageNet, ResNet, transfer learning
CNN applied to sequential (genomic) data
10:00 – 10:30Part II: Recurrent Neural Networks
10:30 – 11:00Break
11:00 – 12:00Part III: Deployment and interpretation
Save & convert: how to save TF model and how to convert it to TFjs
Interpretation: simple techniques like measuring the impact of a random change on a position or interpretation of the first layer convolutions
Integrated Gradients: explanation and examples of use
12:00 – 12:30 Part IV: Transformers
What is a transformer? Intro to the architecture, differences from CNN and RNN, and the HuggingFace transformers library
How to use pre-trained models? Example on pre-trained proteins transformers using HuggingFace