General data
Credits (SWS) | 12 (10 SWS) |
Module level | Master |
Language | German/English |
Total hours | 360 h |
Weekly time slot | 2-3 days/week |
Block part in semester break | to be determined |
Time schedule of the internship
Feb - Mar 2025 | Kickoff meeting and assignment of projects and teams |
Apr - Jul 2025 | Division of the project work, interim presentations |
Jul - Aug 2025 | Blockpart for finalizing the project work, writing the report, and preparing the final presentation |
Requirements and prior knowledge
Bachelor's degree in Bioinformatics. Good programming skills. Interest in data visualization and network medicine. Previous experience in software development is an advantage, but not a must.
Background
Proteins are vital in human biochemistry and carry out biological processes through complex and transient interactions. These interactions depend on the context in which they appear and are forming the basis of protein-protein-interaction (PPI) networks. This information is used for drug development, cell differentiation and communication, disease module detection, and many more topics.
Motivation
Measuring these interactions experimentally is time-intensive, costly, and inaccurate (there are studies describing false positive rates of up to 40%-60%[1,2]). Additionally, PPI networks suffer from study bias, where over-studied proteins skew the distributions and seem more important than they are, and active module identification methods perform just as well on random networks[3]. Additionally, they don’t consider alternative splicing, a crucial process resulting in different protein isoforms that may show different or opposite binding preferences.
To combat problems with PPI networks, we started CoBiNet, a joint project with the FAU in Erlangen and the IEO in Milan. At DaiSyBio, we focus on adding isoform-level information to PPI networks. As testing all pairwise interactions of protein isoforms is prohibitively expensive, computational tools can help to infer isoform-specific interactions. To this end, we previously built DIGGER [4], a database that considers not only experimentally validated PPIs but also known domain-domain-interaction (DDI) to infer if protein isoforms are likely to interact. Only a few DDIs have been experimentally validated, making computational prediction necessary for expanding DIGGER towards covering a significant portion of the interactome. The quality of these DDIs thus also limits the quality of the DIGGER database. While various prediction methods have been developed for predicting DDIs, they have not been systematically validated.
Objectives
One important aspect of improving and extending DDI prediction is to evaluate existing methods. The project aims to create a nextflow pipeline to benchmark the existing methods and compare the results. Some methods might not work and thus need to be re-implemented, while others might not perform as well as advocated. This project also leaves room for implementing additional computational strategies.
Tasks
- Research of existing methods and the status of their code
- Selection of methods to implement
- Creation of nextflow benchmarking pipeline
- Deployment of pipeline with the provided datasets (making use of data splitting to retain information for benchmarking)
- Analysis and visualization of benchmarking results
References
- Armean IM, Lilley KS, Trotter MW (2012). Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments
- Berggård T, Linse S, James P (2007). Methods for the detection and analysis of protein-protein interactions
- Lazareva O, Baumbach J, List M, Blumenthal DB (2021). On the limits of active module identification.
- Louadi Z, Yuan K, Gress A, Tsoy O, Kalinina O, Baumbach J, Kacprowski T, List M (2020). DIGGER: exploring the functional role of alternative splicing in protein interactions