Senior Cloud Site Reliability Engineer AP

IndiaIndia·NagarFull Timesenior
OtherCloud Site Reliability Engineer
0 views0 saves0 applied

Quick Summary

Requirements Summary

your specific skills and experience, geographic location, or other relevant factors. The salary range for this position may be tailored to be lower or higher in different talent markets.

Technical Tools
OtherCloud Site Reliability Engineer

Lighthouse is built on a foundation of unique, compassionate, highly driven individuals. We elevate the strengths and talents of those around us while leveraging opportunities for growth. We offer the experience of solving complex problems while continuing to grow multiple facets of your career. Lighthouse is where innovation meets support and where collaboration is the key ingredient to success. We grow together and are stronger together. 

About the Role

~1 min read

The Senior Cloud Site Reliability Engineer (Senior Cloud SRE) is responsible for ensuring the reliability, scalability, availability, performance, security, and operational excellence of Lighthouse’s cloud platforms and critical product infrastructure.

This role combines software engineering, cloud engineering, automation, observability, and operational governance practices to build highly resilient and self-healing platforms across hybrid and cloud-native environments. The ideal candidate will drive SRE best practices, improve service reliability through automation, establish observability standards, and partner closely with Engineering, Product, Security, DBA, and DevEx teams to improve operational maturity across the organization.

The role requires deep expertise in cloud infrastructure, Kubernetes, DevOps/SRE principles, telemetry, incident management, monitoring, and automation, along with strong collaboration and communication skills.

  • Drive and implement Site Reliability Engineering (SRE) best practices across cloud platforms and services.
  • Define, maintain, and improve:
    • Service Level Indicators (SLIs)
    • Service Level Objectives (SLOs)
    • Service Level Agreements (SLAs)
    • Error Budgets
  • Improve service reliability, resiliency, scalability, and operational efficiency.
  • Establish operational standards, reliability governance, and production readiness practices.
  • Conduct Root Cause Analysis (RCA), postmortems, and reliability improvement initiatives.
  • Participate in on-call rotations, incident management, and major incident resolution activities.
  • Continuously improving operational processes to reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR)
  • Design, implement, and maintain enterprise observability and telemetry platforms.
  • Build operational dashboards, reliability scorecards, and service health monitoring solutions.
  • Configure proactive alerting, anomaly detection, and incident correlation mechanisms.
  • Implement centralized monitoring and telemetry using:
    • Grafana
    • Prometheus
    • Azure Monitor
    • Log Analytics
    • ELK Stack / ElasticSearch
    • Power BI dashboards
  • Develop actionable operational metrics and telemetry reporting for engineering and leadership teams.
  • Enhance visibility into infrastructure, application, Kubernetes, and platform health.
  • Drive automation-first operational practices across infrastructure and platform services.
  • Develop Infrastructure-as-Code (IaC) solutions using:
    • Terraform
    • ARM/Bicep
    • Ansible
  • Build operational automation scripts using:
    • Python
    • Bash
    • PowerShell
  • Develop self-healing and auto-remediation capabilities for recurring operational incidents.
  • Automate infrastructure provisioning, monitoring, scaling, backup, recovery, and deployment workflows.
  • Reduce manual operational effort and improve engineering productivity through intelligent automation.
  • Collaborate closely with:
    • Cloud Engineering teams
    • Product Engineering teams
    • DevEx teams
    • Security teams
    • DBA teams
    • Operations teams
  • Support engineering teams in improving production readiness and operational maturity.
  • Contribute to continuous improvement initiatives, reliability reviews, and operational excellence programs.
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience/certification).
  • Knowledge of Python, scripting, or Infrastructure-as-Code tools (e.g., Terraform, Ansible, ARM/Bicep).
  • Experience managing cloud platforms (e.g., Azure, AKS, Pivotal Cloud Foundry, or equivalent).
  • Strong understanding of Kubernetes and containerization concepts.
  • Experience with application packaging, deployment automation, and release management.
  • Solid knowledge of relational databases (MS-SQL) and exposure to NoSQL technologies (e.g., Redis, ElasticSearch, MongoDB).
  • Experience with CI/CD tools (Azure DevOps, Jenkins, GitHub Actions, or similar).
  • Familiarity with monitoring and logging tools (Grafana, ELK stack, Prometheus, PowerBI, etc.).
  • Proficiency with Git and modern branching/merging workflows.
  • Strong Linux administration and troubleshooting skills.
  • Excellent problem-solving, communication, and teamwork skills.
  • Duties are performed in a typical office environment while at a desk or computer table.
  • Duties require the ability to use a computer, communicate over the telephone, and read printed material, in a quiet and professional setting.
  • Duties may require being on call periodically and working outside normal working hours (evenings and weekends).

Requirements

~1 min read

This position will work for and be employed by Lighthouse's India subsidiary, which is an independent company located in India.

Location & Eligibility

Where is the job
Nagar, India
On-site at the office
Who can apply
IN

Listing Details

Posted
June 9, 2026
First seen
July 1, 2026
Last seen
July 1, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
14%
Scored at
July 1, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

CL 88e84fc2 cfc3 4bec b5cf 08a3bda081bcSenior Cloud Site Reliability Engineer AP