Augury1mo ago

Software Data Engineer, Data Platform

Israel·Haifamid

Data EngineerData

3 views0 saves0 applied

Apply Now

Quick Summary

Overview

Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI,

Technical Tools

Data EngineerData

Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI, Augury helps the world’s manufacturers leverage real-time production insights to drive new levels of efficiency. Combining predictive and prescriptive AI technology with industry expertise, production teams can proactively address alerts, minimize downtime, reduce asset costs, and maximize yield and capacity. Our customers achieve payback in six months or less, enabling global scale. We're looking for team members excited to partner with the world's manufacturers and build the future of production together.

You are a Software Data Engineer with deep experience building data-intensive systems, not a traditional ETL or BI-focused Data Engineer. In this role, you will design and build production-grade data services, platforms, and pipelines that power DIH and our AI-driven products. You will combine strong software engineering fundamentals with modern data engineering practices, with a focus on clean architecture, reliability, scalability, observability, and testing.

As a Software Data Engineer, Data Platform, you will:

Build and evolve Python-based services and pipelines that ingest raw industrial events, store them reliably, and expose clean, well-modeled tables and APIs for downstream consumers, including Digital Twin, Smart Canvas, AI agents, and analytics.
Design systems that handle duplicates, invalid data, late-arriving events, and reprocessing in a principled, incremental, and reproducible manner.
Collaborate with platform, machine learning, and product teams across Israel and globally to transform complex data challenges into robust, observable, and scalable software solutions.

Design and implement end-to-end data flows, from raw event ingestion into durable storage to modeled datasets and aggregates that power products, Digital Twin capabilities, analytics, and AI agents.
Build idempotent pipelines that can safely re-run without corrupting data, using deterministic keys and clearly defined contracts between raw, curated, and modeled datasets.
Implement incremental aggregations (e.g., machine signal summaries, production metrics, and operational KPIs) that correctly account for late-arriving data, watermarking strategies, and reproducibility requirements.
Model relationships and context across machines, lines, factories, sensors, work orders, and operational events to support context-aware applications, knowledge graphs, and AI agents.
Partner with platform teams to define how datasets are stored within our lakehouse, Digital Twin, and context graph architectures and exposed through well-defined APIs and tools.

Write clean, maintainable Python services with clear separation of concerns across ingestion, validation, transformation, persistence, aggregation, and orchestration layers.
Apply strong data modeling and SQL fundamentals, including schema design, indexing strategies, event-time semantics, and scalable aggregation patterns.
Drive testing discipline across the data platform, including unit tests, data-quality tests, integration tests, and validation frameworks.
Design for observability through metrics, logging, tracing, and monitoring that simplify debugging, improve data quality visibility, and support production operations.
Troubleshoot and resolve production data issues, including incorrect aggregations, missing data, duplicate records, schema evolution challenges, and backfill operations.

Build and evolve systems that scale from local development environments to cloud-scale lakehouse architectures using technologies such as Databricks, Delta Lake, and Spark.
Design and implement data pipelines following modern lakehouse patterns, including Bronze, Silver, and Gold layers, partitioning strategies, and cost-efficient compute utilization.
Work with streaming and messaging platforms (Kafka, Pub/Sub, or similar) to build reliable, idempotent consumers, replay capabilities, and reprocessing workflows.
Contribute to multi-tenant data architectures, data contracts, and governance practices that enable secure and efficient access to customer data at scale.

Work closely with DIH, Smart Canvas, and AI teams to define how agents interact with structured data, context graphs, APIs, and tools in deterministic and reliable ways.
Translate product requirements and user needs into technical designs that balance correctness, performance, latency, cost, and long-term maintainability.
Participate in architecture reviews, design discussions, code reviews, and collaborative development practices that raise the overall engineering bar across the organization.
Help shape the future of AI-native experiences by building the data foundations that power intelligent applications and agentic workflows.

Bachelor's degree in Computer Science, Software Engineering, Data Engineering, Information Systems, or a related engineering discipline, or equivalent practical experience.

5+ years of professional software engineering experience, including substantial experience building backend systems, distributed systems, or data-intensive applications in production environments.
Strong Python engineering skills, including modular architecture, dependency management, testing practices, observability, and production-grade code quality.
Strong SQL and data modeling expertise, including schema design, indexing strategies, event-driven data models, and scalable analytical aggregations.
Hands-on experience building incremental and idempotent data pipelines that handle duplicate, invalid, and late-arriving events without impacting downstream consumers.
Experience with at least one major cloud platform (Azure, GCP, or AWS) and modern lakehouse technologies such as Databricks, Delta Lake, Spark, or equivalent architectures.
Experience with streaming or messaging technologies such as Kafka, Pub/Sub, Event Hubs, or similar event-driven systems.
Proven ability to diagnose and resolve production data issues, including data quality problems, schema evolution, backfills, replay scenarios, and performance bottlenecks.
Strong written and verbal communication skills in English and experience collaborating effectively with globally distributed teams.

Nice to Have

~1 min read

Experience building industrial, IoT, manufacturing, or operational data platforms.
Familiarity with Digital Twin architectures and industrial data models.
Experience with graph databases, context graphs, knowledge graphs, or relationship-centric data modeling.
Exposure to AI/LLM-powered applications, including retrieval-augmented generation (RAG), agents, tool calling, or evaluation frameworks.
Experience working with Databricks or similar lakehouse platforms from both application and platform perspectives.
Experience building data products that directly support AI agents, intelligent applications, or machine learning workflows.