Site Reliability Engineer | CloudRaft | Remote (India)

Site Reliability Engineer | CloudRaft | Remote (India)

Remote India
Application ends: May 2, 2025
Apply Now

Job Description

About Us :

CloudRaft is a dynamic and innovative company focused on delivering cutting-edge cloud-native solutions. We thrive on collaboration, creativity, and excellence, aiming to provide top-tier services to our clients. We are looking for a talented and experienced Lead Architect to join our team immediately and help us scale our operations to new heights.

Role :

  • You have good understanding and professional work experience of running Kubernetes in on-prem and cloud (OpenShift, EKS, AKS and GKE).

  • You are comfortable in programmable infrastructure and can do programming in Golang or Python.

  • You are experienced in production grade CI/CD in tools such as Github Action, Argo CD and Gitlab and have explored advanced deployment strategies.

  • You can set up observability pipelines and backend using popular products like vector, fluentd, opentelemetry, prometheus, grafana etc.

  • Take the observability to the next level with products such as Victoria Metrics, Thanos, and SigNoz

  • You have production experience in troubleshooting and resolve system issues

  • Have a good understanding and implementation experience of SRE concepts such as SLIs and SLOs

  • You can represent the organization and collaborate with and coach the customer teams

  • You have curiousity to learn and develop skills in upcoming fields such as AI, MLOps, Edge Computing, etc

  • You like sharing your work through technical writing and speaking sessions in the community and conferences

Qualifications:

  • Bachelor’s degree in Computer Science, IT, or a related field
  • 2-5 years of experience in DevOps/SRE
  • Stong Understanding in at least two of AWS, OpenShift, Azure and Google Cloud
  • Hands-on production experience in designing and managing Kubernetes clusters
  • Hands-on experience in CI/CD and setting up Developer tooling
  • Programming skills in any modern programming language (Python or Golang or Node)
  • Infrastructure as Code (Terraform, CDK, Pulumi, etc)
  • You have understanding of security concepts and tooling
  • Excellent problem-solving and troubleshooting skills
  • Strong communication and teamwork skills
  • Ability to write well as we prefer async communication
  • Having product mindset and customer empathy is a big plus