ZHAW Academia

PhD position in Computational Phylogenetics 100 %

CHF 50'000 – 70'000 / year
WÄDENSWIL
NEURAL NETWORKLARGE LANGUAGE MODELLLMS

Description

In this exciting PhD project, you pioneer neuro-symbolic methods that retain the mechanistic grounding of classical phylogenetics, and that integrate the representational richness of genomic LLMs.

Your role

Genomic sequences are modeled as evolving along binary phylogenetic trees through stochastic string-valued substitution and insertion-deletion (indel) processes. Given a set of present-day sequences, classical inference problems in phylogenetics are:

  • homology inference
  • tree inference
  • ancestral sequence reconstruction

A central focus of our recent work has been to develop fast frequentist indel-aware approaches to these problems. For tractability, the models in most cases must assume that residues evolve independently across sites. In reality, mutation probabilities are influenced by sequence context, including position-specific structural and functional constraints.

In recent years, the convergence of computational biology and data-driven methods has led to genomic large language models (gLLMs). These can model sequence context dependences. Building on our previous work, our aim is to develop neuro-symbolic methods that retain mechanistic grounding of classical phylogenetics, and that integrate the representational richness of gLLMs.

As a PhD student you will devise mutation models, develop inference algorithms, implement them in our Rust code-base, and evaluate the methods by simulation and on real data.

Selection of relevant articles:

  1. Maiolo M, Zhang X, Gil M, Anisimova M. "Progressive multiple sequence alignment with indel evolution" BMC Bioinformatics. 2018. 19(1):331. doi: 10.1186/s12859-018-2357-1.
  2. Pečerska, J., Gil, M. and Anisimova, M. “Joint alignment and tree inference" bioRxiv, 2021. pp.2021-09. doi: 10.1101/2021.09.28.462230.
  3. Jowkar, G., Pečerska, J., Maiolo, M., Gil, M., & Anisimova, M. “ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process" Systematic biology. 2022. syac050-syac050. doi: 10.1093/sysbio/syac050
  4. Iglhaut C, Pečerska J, Gil M, Anisimova M. "Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels" Molecular Biology and Evolution. 2024. 41(7):msae109. doi: 10.1093/molbev/msae109.

Your profile

You should have a MSc in Computer Science, Computational Science, Computational Biology, Statistics / Applied Mathematics, or a related quantitative field, with a strong background in:

  • Algorithms, particularly combinatorial optimization
  • Stochastic modelling
  • Computational inferential statistics
  • Programming, ideally in Rust and/or C++

Knowledge of phylogenetics, and/or an understanding of neural networks is an advantage.

What you can expect

We offer working conditions and terms of employment commensurate with higher education institutions and actively promote personal development for staff in leadership and non-leadership positions.

A detailed description of advantages and benefits can be found at Working at the ZHAW .

The main points are listed below:

  • Workplace Culture
  • Work Life Balance
  • Diversity and Inclusion
  • Personal Development
  • Environmental, Economic and Social Sustainability at the ZHAW
  • Ocupational Health Management
  • Salary and Pension Provision