Scandit Industry · Research Internship

Computer Vision Research Internship: Image to Sequence Modeling (e.g. Transformers)

CHF 30'000 – 50'000 / year

ZÜRICH

MACHINE LEARNINGDEEP LEARNINGNEURAL NETWORKCOMPUTER VISIONPYTORCH

Description

We are offering a research-focused internship aimed at advancing machine learning methods for complex visual understanding tasks. The project centers on deep learning architectures for image-to-sequence modelling, such as Transformers, attention mechanisms, and modern sequence and representation-learning frameworks, to address challenging and highly structured computer vision problems. This project contributes to long-term research efforts aimed at achieving even higher performance, robustness, and generalization in large-scale visual applications. An ideal position for experienced master’s students, PhD collaborations, or candidates preparing for a research career in industry or academia.

Responsibilities

You will work closely with experienced ML researchers and engineers on cutting-edge research at the intersection of computer vision and sequence modeling. Your work will include:

Designing and experimenting with new ML architectures for structured visual data.
Evaluating alternative modeling paradigms (e.g., encoder–decoder, hybrid Transformer models, sequence-based representations).
Investigating techniques for improving robustness, generalization, and multi-view reasoning.
Running systematic experiments, ablations, and error analyses to validate research hypotheses.

This project provides opportunities for novel model design, extensive experimentation, and scholarly research. You will contribute to long-term innovation in our technology, with potential real-world impact for millions of users.

Qualifications

MSc or PhD student in Computer Science, Machine Learning, Artificial Intelligence, or a related field with a strong research focus. Candidates should have a solid foundation in machine learning theory, neural networks, and computer vision.

Essential Skills:

Proficiency in Python and deep learning frameworks such as PyTorch.
Practical experience designing, training, and evaluating neural networks, including CNNs and Transformer-based architectures.
Strong analytical and problem-solving abilities, with the capability to interpret experimental results and iterate effectively.
Familiarity with research best practices, including reproducibility, controlled experiments, and ablation studies.

Desirable Skills:

Prior research experience in computer vision, pattern recognition, sequence modeling, or image-to-sequence architectures.
Experience training large-scale models or working with foundation-style architectures.
Contributions to publications, preprints, or open-source machine learning projects.
Strong communication skills and the ability to work independently in a research-oriented environment.

Benefits

A highly skilled team and a fun environment where you can put your enthusiasm for computer vision challenges and cutting-edge technologies to use
Hackathons, summer parties, company outings and other regular events
Office in the city center of Zurich

Apply Now