Synthetic Data Engineer (AI Data/Training)
Quick Summary
Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting. Implement automated quality scoring and de-duplication systems.
Proven experience building large-scale data pipelines (Airflow, Spark, Ray). Deep knowledge of prompt engineering for data generation. Familiarity with dataset distillation and bias mitigation.
We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.
Responsibilities
~1 min read- →Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
- →Implement automated quality scoring and de-duplication systems.
- →Manage data pipelines that feed directly into SFT and DPO training loops.
Requirements
~1 min read- Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
- Deep knowledge of prompt engineering for data generation.
- Familiarity with dataset distillation and bias mitigation.
Location & Eligibility
Listing Details
- Posted
- April 24, 2026
- First seen
- April 24, 2026
- Last seen
- May 2, 2026
Posting Health
- Days active
- 8
- Repost count
- 0
- Trust Level
- 35%
- Scored at
- May 3, 2026
Signal breakdown

Web3 and AI talent recruitment agency based in Hong Kong with 700+ placements globally
Please let Hyphenconnect know you found this job on Jobera.
4 other jobs at Hyphenconnect
View all →Explore open roles at Hyphenconnect.
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.