Site Reliability Engineer | CloudRaft | Remote (India)
Job Description
About Us :
CloudRaft is a dynamic and innovative company focused on delivering cutting-edge cloud-native solutions. We thrive on collaboration, creativity, and excellence, aiming to provide top-tier services to our clients. We are looking for a talented and experienced Lead Architect to join our team immediately and help us scale our operations to new heights.
Role :
You have good understanding and professional work experience of running Kubernetes in on-prem and cloud (OpenShift, EKS, AKS and GKE).
You are comfortable in programmable infrastructure and can do programming in Golang or Python.
You are experienced in production grade CI/CD in tools such as Github Action, Argo CD and Gitlab and have explored advanced deployment strategies.
You can set up observability pipelines and backend using popular products like vector, fluentd, opentelemetry, prometheus, grafana etc.
Take the observability to the next level with products such as Victoria Metrics, Thanos, and SigNoz
You have production experience in troubleshooting and resolve system issues
Have a good understanding and implementation experience of SRE concepts such as SLIs and SLOs
You can represent the organization and collaborate with and coach the customer teams
You have curiousity to learn and develop skills in upcoming fields such as AI, MLOps, Edge Computing, etc
You like sharing your work through technical writing and speaking sessions in the community and conferences
Qualifications:
- Bachelor’s degree in Computer Science, IT, or a related field
- 2-5 years of experience in DevOps/SRE
- Stong Understanding in at least two of AWS, OpenShift, Azure and Google Cloud
- Hands-on production experience in designing and managing Kubernetes clusters
- Hands-on experience in CI/CD and setting up Developer tooling
- Programming skills in any modern programming language (Python or Golang or Node)
- Infrastructure as Code (Terraform, CDK, Pulumi, etc)
- You have understanding of security concepts and tooling
- Excellent problem-solving and troubleshooting skills
- Strong communication and teamwork skills
- Ability to write well as we prefer async communication
- Having product mindset and customer empathy is a big plus