darkroom1mo ago

Senior Data Engineer

Pakistan, India, BrazilRemotefull-timesenior

Data EngineerData

1 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

pipelines and storage that stay performant and cost-efficient across 1,000+ users and hundreds of connected brands — with strict data isolation, privacy, and compliance built in, not bolted on.

Requirements Summary

GCP (BigQuery, Cloud Run), PostgreSQL + pgvector, and orchestration/transformation tooling (dbt, Airflow, Dagster, or similar). Experience with pipeline observability and tracing in an AI/LLM context

Technical Tools

Data EngineerData

We're empowering small teams with technology that makes it easier to market and grow businesses. Our current focus is to help consumer brands shift from "workflow automation" to "agent management" within their marketing operations. Shadow is the AI coordination layer — providing shared AI memory, centralized agent control, and model orchestration for marketing teams.

What We Offer

~1 min read

✓Unlimited PTO + Local holidays (Relevant to your hub): Rebooting is part of the work. Take the time you need to stay sharp.

✓Remote-First Culture: Many roles are fully remote. Employees based in or near our New York or Lisbon HQs are expected to work hybrid with weekly in-office time. Hub locations include Brazil and Spain.

✓Parental Leave: Flexible parental leave to support new parents during this important transition.

✓Growth: Our interdisciplinary model gives every employee exposure far beyond their core role. Grow your skills, expand your influence, and stay at the forefront of the industry.

You own the pipelines that bring the world's marketing data into Shadow — and keep them fast, accurate, and reliable as we scale to thousands of users. Every brand connects its full stack (ad platforms, ecommerce, analytics, email/SMS), and you make that data land cleanly, normalize into shared schemas, and stay in sync. The agent is only as good as the data underneath it; that layer is yours.

This is a hands-on, build-heavy engineering role for someone who has run large data systems before and wants to do it again in a smaller, faster environment.

Build and scale the ingestion layer across third-party marketing APIs (Meta, Google, TikTok, GA4, Shopify, Klaviyo, and more) — auth, extraction, rate-limit handling, backfill, and incremental sync.
Design normalization and transformation pipelines that map messy, platform-specific data into shared, queryable schemas (e.g. a unified creative/campaign/order model).
Own data reliability at scale — sync accuracy, freshness, coverage, and observability. Build the systems that detect when a connection breaks or a number looks wrong before a user does.
Engineer for multi-tenant scale and security: pipelines and storage that stay performant and cost-efficient across 1,000+ users and hundreds of connected brands — with strict data isolation, privacy, and compliance built in, not bolted on.
Partner with the AI and data-science teams to expose clean, well-modeled data the agent can retrieve and reason over.

Requirements

~1 min read

Experience building and operating large enterprise data pipelines engineered for scale — systems serving 1,000+ users (or equivalent data volume / tenancy), where reliability, isolation, and cost at scale were real constraints you solved.
Strong SQL and Python, with production experience in a modern data warehouse (BigQuery, Snowflake, Redshift, or similar).
Deep familiarity with ETL/ELT patterns, incremental sync, schema design, and data modeling for analytics.
Built and maintained integrations against third-party APIs — OAuth flows, pagination, rate limits, schema drift, and the operational reality of connectors that break.
A bias toward observability and data quality: you instrument your pipelines and you don't ship data you can't trust.
Experience building or operating within SOC 2-compliant systems with enterprise-grade security and privacy — you've handled sensitive customer data under real compliance constraints (access controls, encryption, data isolation, auditability) and treat it as a first-class engineering requirement.

Nice to Have

~1 min read

Experience in martech, adtech, or an adjacent data-heavy marketing domain — you've worked with ad platform or ecommerce data before and know where the bodies are buried (attribution windows, currency/timezone messes, deduping across platforms).
Familiarity with our stack: GCP (BigQuery, Cloud Run), PostgreSQL + pgvector, and orchestration/transformation tooling (dbt, Airflow, Dagster, or similar).
Experience with pipeline observability and tracing in an AI/LLM context (e.g. Langfuse).
Comfort supporting data that feeds AI agents and retrieval systems, not just dashboards.

Obsessive about data organization at scale. We're hiring for someone who lives in the data layer and wants to own it end to end.
You’re a power AI user. You've embedded AI into every workflow you touch and you think in systems — not one-off prompts, but repeatable structures that compound.
Entrepreneurial. You don't need much direction to move fast, you pivot when the situation demands it, and what you ship is production-grade, not a prototype you hand off for someone else to finish.

We are an equal opportunity workplace—we are dedicated to equal employment opportunities regardless of race, color, ancestry, religion, sex, national orientation, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.