Senior Data Engineer
Quick Summary
pipelines and storage that stay performant and cost-efficient across 1,000+ users and hundreds of connected brands — with strict data isolation, privacy, and compliance built in, not bolted on.
GCP (BigQuery, Cloud Run), PostgreSQL + pgvector, and orchestration/transformation tooling (dbt, Airflow, Dagster, or similar). Experience with pipeline observability and tracing in an AI/LLM context
We're empowering small teams with technology that makes it easier to market and grow businesses. Our current focus is to help consumer brands shift from "workflow automation" to "agent management" within their marketing operations. Shadow is the AI coordination layer — providing shared AI memory, centralized agent control, and model orchestration for marketing teams.
What We Offer
~1 min readShadow is built alongside Darkroom — a performance marketing agency that's been operating for 10 years, employs 100+ people, runs 100+ clients at a time, and has worked with over 1,000 consumer brands. The agency is both our proving ground and our first user, which means the data you build with is real marketing data at real volume from day one — not a synthetic demo.
You own the pipelines that bring the world's marketing data into Shadow — and keep them fast, accurate, and reliable as we scale to thousands of users. Every brand connects its full stack (ad platforms, ecommerce, analytics, email/SMS), and you make that data land cleanly, normalize into shared schemas, and stay in sync. The agent is only as good as the data underneath it; that layer is yours.
This is a hands-on, build-heavy engineering role for someone who has run large data systems before and wants to do it again in a smaller, faster environment.
Build and scale the ingestion layer across third-party marketing APIs (Meta, Google, TikTok, GA4, Shopify, Klaviyo, and more) — auth, extraction, rate-limit handling, backfill, and incremental sync.
Design normalization and transformation pipelines that map messy, platform-specific data into shared, queryable schemas (e.g. a unified creative/campaign/order model).
Own data reliability at scale — sync accuracy, freshness, coverage, and observability. Build the systems that detect when a connection breaks or a number looks wrong before a user does.
Engineer for multi-tenant scale and security: pipelines and storage that stay performant and cost-efficient across 1,000+ users and hundreds of connected brands — with strict data isolation, privacy, and compliance built in, not bolted on.
Partner with the AI and data-science teams to expose clean, well-modeled data the agent can retrieve and reason over.
Requirements
~1 min readExperience building and operating large enterprise data pipelines engineered for scale — systems serving 1,000+ users (or equivalent data volume / tenancy), where reliability, isolation, and cost at scale were real constraints you solved.
Strong SQL and Python, with production experience in a modern data warehouse (BigQuery, Snowflake, Redshift, or similar).
Deep familiarity with ETL/ELT patterns, incremental sync, schema design, and data modeling for analytics.
Built and maintained integrations against third-party APIs — OAuth flows, pagination, rate limits, schema drift, and the operational reality of connectors that break.
A bias toward observability and data quality: you instrument your pipelines and you don't ship data you can't trust.
Experience building or operating within SOC 2-compliant systems with enterprise-grade security and privacy — you've handled sensitive customer data under real compliance constraints (access controls, encryption, data isolation, auditability) and treat it as a first-class engineering requirement.
Nice to Have
~1 min readExperience in martech, adtech, or an adjacent data-heavy marketing domain — you've worked with ad platform or ecommerce data before and know where the bodies are buried (attribution windows, currency/timezone messes, deduping across platforms).
Familiarity with our stack: GCP (BigQuery, Cloud Run), PostgreSQL + pgvector, and orchestration/transformation tooling (dbt, Airflow, Dagster, or similar).
Experience with pipeline observability and tracing in an AI/LLM context (e.g. Langfuse).
Comfort supporting data that feeds AI agents and retrieval systems, not just dashboards.
Obsessive about data organization at scale. We're hiring for someone who lives in the data layer and wants to own it end to end.
You’re a power AI user. You've embedded AI into every workflow you touch and you think in systems — not one-off prompts, but repeatable structures that compound.
Entrepreneurial. You don't need much direction to move fast, you pivot when the situation demands it, and what you ship is production-grade, not a prototype you hand off for someone else to finish.
Location & Eligibility
Listing Details
- Posted
- June 5, 2026
- First seen
- June 5, 2026
- Last seen
- June 5, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 59%
- Scored at
- June 5, 2026
Signal breakdown
Please let darkroom know you found this job on Jobera.
3 other jobs at darkroom
View all →Explore open roles at darkroom.
Similar Data Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.