Zeta22d ago

Senior Data Reliability Engineer

United States·Basking RidgeFull-timesenior

OtherData Reliability Engineer

2 views0 saves0 applied

Apply Now

Quick Summary

Overview

About us: Build the future of banking. Zeta is a next-generation banking technology company providing cloud-native, fully stackable processing and core banking platforms for issuers.

Technical Tools

airflowawsgrafanakafkapostgresqlprometheuspythonsparkterraformetlperformance-optimizationsql-optimization

About us:

Build the future of banking.

Zeta is a next-generation banking technology company providing cloud-native, fully stackable processing and core banking platforms for issuers. With a focus on scalability, compliance, and innovation, Zeta empowers financial institutions to modernize their technology infrastructure and deliver secure, seamless digital banking experiences.

Our impact runs at real-world scale. Today, over 25 million cards are live on Zeta-powered platforms across 7 countries, supported by a passionate team of 1,700+ Zetanauts across India, the US, EMEA, and Asia. Backed by SoftBank Vision Fund, Mastercard, and other reputed strategic investors, we reached a valuation of $2 billion in 2025.

Our focus is on establishing product lines that focus on key outcomes by addressing real customer pain points, modernizing legacy systems, and strengthening core fundamentals. As a result, our systems and platforms support a wide range of banking and payments capabilities, including:

1. Tachyon, our cloud-native banking stack built for population-scale systems

2. Cipher, our unified authentication platform for secure, high-volume banking environments

3. Digital Credit as a Service, enabling banks to launch credit lines on UPI

4. Elena, our intelligent and conversational AI platform for banking

5. Pixel, India’s first digital-native credit card, launched in partnership with HDFC Bank, for whom we also revamped their PayZapp mobile app: Winner of the Celent Model Bank Award for Payments Innovation 2024

6. Sparrow, the leading card experience for non-prime cardholders in the US

…and more across cards, payments, lending, and core banking.

We are an engineering-first organization that values ownership, bias for action, and long-term thinking. Together, we solve some of the hardest problems in banking tech. Our culture is built around trust, collaboration, and creating the conditions for you to drive impact proportionate to your potential. Reinforcing our commitment to creating an inclusive and supportive workplace, we have been consistently recognized as a Great Place to Work.

If you want to build cutting-edge banking tech that enables banks to serve millions reliably, securely, and at a population scale, Zeta is your playground.

If you would like to learn more about how we have grown and evolved over the years, watch our journey here. You can also explore our website and follow us on LinkedIn, Instagram, YouTube, and X.

About the Role

~3 min read

As a Senior Data Reliability Engineer, you will be responsible for architecting, scaling, and optimizing enterprise-grade data platforms, including large-scale data lakes and data warehouses built from multiple disparate data sources. This role requires deep expertise in cloud databases, data infrastructure reliability, observability, and automation, with a strong focus on operational excellence, performance, and resilience.

Own the reliability, availability, scalability, and performance of PostgreSQL RDS environments across production and non-production systems.

Lead proactive monitoring and observability initiatives for PostgreSQL RDS instances, leveraging tools such as CloudWatch, Prometheus, Grafana, and other enterprise monitoring platforms.

Drive advanced PostgreSQL performance tuning, including query optimization, indexing strategies, parameter tuning, and capacity planning.

Architect and optimize database backup, disaster recovery, and failover strategies to ensure business continuity and minimal downtime.

Own the reliability and operational excellence of Debezium and Kafka Connect ecosystems, ensuring robust real-time data ingestion and delivery.

Lead troubleshooting and optimization of ETL workflows and data pipelines, ensuring scalability, reliability, and fault tolerance across data platforms.

Oversee Apache Airflow workflow orchestration, ensuring high reliability, SLA adherence, and operational efficiency of production DAGs.

Design and implement Infrastructure as Code (IaC) solutions using tools such as Terraform, Crossplane, and automation frameworks to streamline deployments and operational tasks.

Lead incident response, root cause analysis, and post-incident reviews for critical production issues.

Define and enforce database security standards, including access controls, encryption policies, compliance adherence, and periodic security audits.

Partner closely with engineering, DevOps, and data platform teams to optimize data architecture and improve overall platform reliability.

Mentor junior engineers and drive best practices across database reliability engineering and cloud data operations.

Identify and lead continuous improvement initiatives focused on reliability, automation, scalability, and operational maturity.

Deep expertise in PostgreSQL administration and performance tuning, preferably in AWS RDS environments.

Strong experience with Debezium, Kafka Connect, ETL frameworks/tools, and enterprise-grade data pipeline architectures.

Strong hands-on experience with Amazon Redshift, S3, and cloud-native data platforms.

Expertise in Apache Airflow workflow orchestration and operational management.

Experience with Apache Spark and large-scale distributed data processing.

Strong scripting and automation experience using Python, Bash, or similar languages.

Strong experience in Infrastructure as Code (IaC) using Terraform, Crossplane, or equivalent tools.

Hands-on experience with monitoring and observability tools such as CloudWatch, Prometheus, Grafana.

Strong understanding of cloud database security, compliance, and governance frameworks (e.g., GDPR, HIPAA).

Experience designing highly available, fault-tolerant, and scalable cloud database systems.

Bachelor’s degree in computer science, Information Technology, or a related field (master’s preferred).

10–12 years of overall experience in database engineering, cloud data infrastructure, or reliability engineering.

Minimum 5+ years of hands-on experience with PostgreSQL, including AWS RDS administration.

Strong experience in cloud-native data platforms and enterprise-scale production environments.

AWS Certified Database - Specialty or relevant cloud certifications preferred.

Zeta is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants from all backgrounds, cultures, and communities to apply and believe that a diverse workforce is key to our success.