Site Reliability Engineer | Solvd

Site Reliability Engineer | Solvd

Application ends: April 19, 2025
Apply Now

Job Description

Solvd Inc. is a premier software engineering company. We have 8 offices across the globe and over 800 international employees on staff. With over 12 years of experience, highly skilled teams around the world and deep industry knowledge, we help clients create software that improves their operations and opens new markets. We have built an impressive roster of digital-native enterprise clients including some of the biggest brands in retail and social media.

We are looking for a seasoned Site Reliability Engineerto join our dynamic team.

Project Details: management platform designed to simplify the deployment, scaling, and operations of Kubernetes clusters across multiple environments. It provides tools and automation for managing Kubernetes in cloud, on-premise, or hybrid infrastructures, making it easier for organizations to run containerized workloads.

Requirements:

  • A minimum of 5+ years of hands-on experience in backend software engineering.
  • Strong analytical and problem-solving skills.
  • Proficiency in Go and TypeScript.
  • Strong experience with containers and container orchestration technologies (e.g., Kubernetes, Docker, Containerd).
  • Experience architecting and developing highly available systems in production environments.
  • Extensive hands-on experience in Linux environments (software packaging, distribution, configuration, scripting, networking, namespaces, and cgroups).
  • Solid understanding of major cloud platforms (AWS, Azure, GCP) and infrastructure provisioning tools such as CloudFormation, ARM Templates, GCP Resource Manager, and Terraform.
  • Understanding of applications monitoring and observability; familiarity with open source logs and metrics collection tools.
  • Working knowledge of Git, SSH, and Linux shell scripting.
  • Strong communication and interpersonal skills, with the ability to work effectively in a distributed team serve the customer base across different time zones.
  • Willingness to learn new technologies.
  • Hands-on experience with JavaScript and Node.js is a nice to have.

Responsibilities:

  • Develop new features, including support for new Kubernetes versions, and fix bugs.
  • Perform maintenance and troubleshooting tasks to ensure system reliability.
  • Provide reliability support and ensure consistent system performance.
  • Perform release management activities, including the coordination and deployment of software releases.
  • Collaborate with colleagues in customer-facing roles, such as Solution Architects, to assist in troubleshooting customers’ enterprise infrastructure setups.