NTB-T01

Title

Computational Challenges in Phospho-Proteomics and Systems Biology of Cellular Signaling

Tutorial details
  • Date: Sunday, September 18th
  • Time: 14:00 to 18:00 CEST
  • Format: Face-to-face
  • Room: TBD
Instructors
  • Filipa Blasco Tavares Pereira-Lopes, Case Western Reserve University (USA)
  • Marzieh Ayati, University of Texas Rio Grande Valley (USA)
  • Serhan Yilmaz, Case Western Reserve University (USA)
  • Daniela Schlatzer, Case Western Reserve University (USA) – Helper
  • Mehmet Koyutürk, Case Western Reserve University (USA) – Helper
  • Mark Chance, Case Western Reserve University (USA) – Helper
Summary

Module 1. Processing of Mass-Spectrometry Based Phospho-Proteomic Data 

In this module, we will introduce the different types of mass spectrometry data files and summarize workflows to extract final peptide and/or protein reports from raw MS data files. 

● Understanding different types of MS data files and which information can be extracted is essential to any proteomics analysis workflow. Here, we will introduce all the different MS data types you might find deposited in phosphoproteomic repositories. 

● We will define workflows to process mass spectrometry data for biostatistical analysis. This will include data QC, various statistical applications, and types of quantification. We will also introduce open-source tools to provide these functions.

● The identification of biologically meaningful changes in phosphoproteomic data is highly dependent on a rigorous biostatistical treatment. We will provide methods to define variance, remove outliers, and identify differentially phosphorylated peptides and/or proteins. 

Module 2.Resources. Databases, and Algorithms for Kinase-Phosphosite Annotations

In this module, we first introduce the resources and databases to study and explore the phosphoproteomics data and the annotations of the phosphorylated sites. Then we talk about the following challenges and how different algorithms approach these issues: 

● Understanding the association of kinases and their substrates. Currently more than 95% of phosphoproteome have no known associated kinases. The prediction of kinases for this large number of unannotated phosphosites is crucial to expand our knowledge base of cell signaling and future therapeutic development. We present state-of-the-art approaches that are developed to predict kinase-substrate associations. We present the advantages and disadvantages of different methods. 

● Imbalance knowledge and biases. The knowledge about the targets of kinases is extraordinarily unequally distributed. Namely, just 20% of kinases are responsible for phosphorylation of 87% of currently annotated substrates. So there are some kinases that are less studied that examining their substrates may reveal major insights into cell signaling and cell functions. We discuss different approaches that try to address this issue and potential opportunities for further research. 

Module 3. Kinase Activity Inference 

In this module, we introduce approaches that combine the available kinase-phosphosite annotations and the mass-spectrometry based phospho-proteomic data to infer the activity of the kinases for a given condition of interest. Then, we discuss the pros/cons of different statistical models, challenges related to reliability of the inferences, and finally discuss the ways in which additional sources of functional information can be incorporated to improve the reliability of the inference. 

● Pros & cons of different statistical, mathematical or machine learning models. There are various statistical or mathematical models to infer the activity of the kinases based on the phosphorylation of their substrates. These models range from simpler mean substrate phosphorylation and statistical tests like z-test or kolmogorov-smirnov to more sophisticated approaches like kinase-set enrichment analysis or machine learning based models. Here, we discuss the pros & cons of these approaches both from the perspective of accuracy as well as the interpretability of the results. 

● Challenges related to reliability of the inferences. Many of the available computational approaches suffer from a bias toward rich kinases that is learned from the imbalance of the available kinase-phosphosite annotations. Here, we discuss strategies to detect these issues during validation and introduce ways in which such biases can be avoided. 

● Additional sources of functional information for improving the reliability of the inference. While the main source of information for kinase activity inference are the known kinasephosphosite annotations, recent studies emphasize the importance of functional information to aid the inference. Here, we discuss various the utility of various functional networks like protein-protein interactions, co-evolution evidence, and structural similarity information and explain ways in which this information can be used to improve the reliability of the inference without introducing a bias. 

Module 4. Applications: Alzheimer’s disease 

In this last module, we will use a rich Alzheimer’s disease phosphoproteome dataset (with a design including variables of: time, biological sex, and genetic background), to demonstrate that yielded data can inform therapeutics development. We will summarize meaningful phosphorylation sites that are sensitive to disease progression in males and females. Next, we will categorize this information in GO classes to better understand what signaling pathways are impacted by AD throughout disease progression. Lastly, kinase activity inference analysis will identify key drivers of AD phospho-dysregulation. Integrating these data will enable the selection of candidate kinases to be further explored as drug targets for AD therapeutic development.  

Target audience

Computational biologists who are familiar with other omic datasets and would like to learn about proteomics to broaden the scope of their algorithms, as well as life scientists who use phospho-proteomics and would like to apply systems approaches to better utilize their data 

Prerequisites
Maximum number of attendees

50

Material required (for participants)

N/A

Programme
TIME CONTENT
14:00 – 14:30Processing of Mass-Spectrometry Based Phospho-Proteomic Data
Instructor: Filipa Lopes, Helper: Daniela Schlatzer
14:30 – 15:30Resources. Databases, and Algorithms for Kinase-Phosphosite Annotation
Instructor: Marzieh Ayati, Helper: Mehmet Koyutürk
Break
15:45 -16:45Kinase Activity Inference
Instructor: Serhan Yilmaz, Helper: Mehmet Koyutürk
16:45 – 17:15Module 4 – 30 Minutes: Applications: Alzheimer’s disease
Instructor: Filipa Lopes, Helper: Mark Chance