Data Scraping Engineer
Description
At Omnilex, we’re on a mission to transform the way lawyers work. Our AI-native platform lets legal professionals enhance their productivity in legal research and automate workflows. We collaborate closely with our clients and iterate at a market-leading pace.
You’ll be joining a young, passionate, and dynamic team of 15, with roots at ETH Zurich.
Are you excited about turning messy, multi-jurisdiction legal content into clean, structured, and AI-ready data? Do you enjoy building reliable pipelines for extraction, normalization, chunking, citation handling, tagging, structuring, summarizing, and indexing; then measuring quality and cost? Do you thrive in a fast-paced startup where your work directly powers search, AI answer quality, and analytics?
Responsibilities
- As a Data Engineer focused on AI data processing & integration, your primary focus will be building and owning data flows that make our AI features accurate, explainable, and scalable
- Design and maintain ingestion for legal sources (APIs, scraping, bulk data) across jurisdictions with strong reliability and compliance
- Normalize and model heterogeneous sources into pragmatic, typed schemas (statutes, decisions, commentaries, citations, metadata)
- Implement citation-aware chunking, sectioning, and cross-referencing so RAG is precise, traceable, and cost-efficient
- Build enrichment pipelines for tagging, classification, summarization, embeddings, entity extraction, and graph relationships; using AI where it helps
- Improve search quality via better indexing strategies, analyzers, synonyms, ranking, and relevance evaluation
- Establish data quality, lineage, and observability (QA checks, coverage metrics, regression tests, versioning)
- Optimize performance, runtime complexity, DB query times, token usage, and overall pipeline cost
- Collaborate closely with users and customers to translate user problems and company requirements into robust data and SLAs
- Communicate your work and findings to the team for continuous feedback and improvement (in English)
Qualifications
MINIMUM QUALIFICATIONS
- Degree in Computer Science, Data Science, or a related field; or equivalent practical experience
- Strong hands-on experience in data engineering with TypeScript
- Solid grasp of data structures, algorithms, regexes, and SQL (PostgreSQL)
- Experience using LLMs/embeddings for practical data tasks (chunking, tagging, summarization, RAG-ready pipelines)
- Ability to learn quickly and adapt to a dynamic startup environment, with strong ownership and product mindset
- Availability full-time. On-site in Zurich at least two days per week (hybrid)
PREFERRED QUALIFICATIONS
- You have a Swiss work permit or EU/EFTA citizenship
- Working proficiency in German (much of our legal data is in German) and proficiency in English
- Experience with Azure (incl. Azure AI/Cognitive Search), Docker, and CI/CD
- Familiar with modern scraping/parsing stacks (Playwright/Puppeteer, PDF tooling, OCR)
- Experience with vector indexing, relevance evaluation, and search ranking
- Familiar with our stack: Azure / NestJS / Next.js
- Knowledge and experience with legal systems, in particular Switzerland, Germany, USA
Benefits
- Direct impact: your pipelines immediately improve search, answers, and user trust, transforming legal research
- Autonomy & ownership: Own across ingestion, processing, enrichment, and indexing
- Team: Professional growth at the intersection of legal, data, and AI with an interdisciplinary team
- Compensation: CHF 8’000–12’000 per month + ESOP (employee stock options), depending on experience and skills.