maincode
maincode15d ago
New

Signal Engineer

Melbournefull-timemid
OtherEngineer
0 views0 saves0 applied

Quick Summary

Overview

About the role Matilda is Australia's LLM. What ends up in the corpus is what the model learns, so the quality of the data sets the ceiling on the quality of the model. We're hiring a Signal Engineer to own that ceiling.

Requirements Summary

- Strong engineer. Python, data tooling, distributed processing, clean pipelines. - High attention to detail. Small errors compound fast at this scale. - Taste and judgment about what good training data looks like.

Technical Tools
pythonetl

About the Role

~1 min read

Matilda is Australia's LLM. What ends up in the corpus is what the model learns, so the quality of the data sets the ceiling on the quality of the model.

We're hiring a Signal Engineer to own that ceiling. You will build the pipelines that turn massive, messy, raw data into the dataset Matilda trains on. The work is part engineering, part editorial judgment, done in code.

A lot of the real gains in frontier models come from the data, and most of that work is underinvested in across the field. It is one of the highest-leverage places you can spend your time as an engineer.

- Pipelines that ingest, clean, dedupe, filter, and score training data at TB to PB scale

- Quality classifiers and heuristics that separate useful data from the rest

- Dataset mixture design and experiments on what actually improves the model

- Tools to explore, sample, and audit what's in the corpus

- Close work with researchers and training engineers so data choices connect to model behaviour

- Strong engineer. Python, data tooling, distributed processing, clean pipelines.

- High attention to detail. Small errors compound fast at this scale.

- Taste and judgment about what good training data looks like.

- Comfort working with very large, very messy datasets.

- Curiosity about how data shapes model behaviour.

- High learning velocity. You don't need a PhD or prior LLM experience.

Nice to Have

~1 min read

- Experience with web-scale corpora or pretraining data pipelines

- Experience working with unstructured text data

- Familiarity with distributed data frameworks (Spark, Ray, or similar)

- Exposure to deduplication, quality classification, or tokenisation

Full-time role based in Melbourne, working closely with our in-person team. At this time we are not able to offer visa sponsorship, so applicants must have existing and unrestricted work rights in Australia.

Location & Eligibility

Where is the job
Melbourne
On-site at the office
Who can apply
Same as job location

Listing Details

Posted
April 23, 2026
First seen
May 8, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
17%
Scored at
May 8, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

maincodeSignal Engineer