✓100% Medical, Dental & Vision Coverage for Employees
✓Paid Time Off and Paid Holidays
✓401K match up to 5%
✓Educational Benefits for Career Growth
✓Employee Referral Bonus
✓Flexible Spending Accounts:
Healthcare (FSA)
✓Parking Reimbursement Account (PRK)
✓Dependent Care Assistant Program (DCAP)
✓Transportation Reimbursement Account (TRN)
✓Design and implement enterprise-grade monitoring and observability frameworks (metrics, logs, traces) across distributed systems using enterprise Splunk, Grafana and Open-telemetry tools
✓Establish and manage SLIs, SLOs, and error budgets to drive reliability improvements
✓Develop and maintain real-time asset inventory systems across cloud, on-prem, and hybrid environments
✓Automate workload onboarding and offboarding processes, ensuring standardization and governance
✓Track system ownership, dependencies, and lifecycle states for operational transparency
✓Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact
✓Design and operate scalable, resilient, and secure infrastructure platforms across cloud and hybrid environments
✓Implement automated compliance tracking and enforcement aligned with organizational and regulatory standards (e.g., NIST, FISMA, FedRAMP)
✓Embed ITIL processes (incident, change, problem, configuration management) into SRE workflows
✓Build and maintain automated deployment environments and pipelines that enforce security, compliance, and operational standards
✓Develop “golden paths” and standardized platform templates for consistent workload deployment
✓Automate provisioning, patching, configuration management, and environment lifecycle
✓Leverage AI/ML coding assistants and vibe coding practices to rapidly develop automation scripts, tools, and internal platforms
✓Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights
✓Lead adoption of AI-enhanced SRE practices, including intelligent remediation and predictive operations
✓Champion DevOps and SRE practices including Infrastructure as Code, CI/CD, observability, and reliability engineering
✓Build developer-friendly platforms (“golden paths”) that simplify deployments, reduce friction, and improve velocity
✓Enable and optimize infrastructure for AI/ML workloads, including data pipelines, storage systems, and inference environments, GPU-enabled and high-performance compute workloads
✓Build and manage containerized and orchestrated platforms (Docker, Kubernetes)
✓Support cloud migration, modernization, and platform standardization initiatives
✓Ensure systems meet security, compliance, backup, and disaster recovery requirements
✓Evangelize and promote best practices in DevOps, SRE, and platform engineering to developer communities
✓Stay abreast of new technologies in your areas but not limited to AIOps, MLOps, cloud computing & deployment, site reliability engineering, infrastructure automation, security best practices, data engineering etc.
✓Must have total of 6+ experience DevOps / SRE roles with monitoring and observability tools (Prometheus, Grafana, ELK, or cloud-native equivalents) for on-prem and cloud hosted workloads.
✓Must have 4+ years of Hands-on Linux experience that includes Ubuntu/CentOS/Red Hat operating systems, containers, dependency management and administration support
✓Must have 4+ years of experience automating Infrastructure-as-Code (IaC) deployments to one of the following cloud platforms Amazon AWS, Google GCP and Microsoft Azure
✓Must have 4+ years with CI/CD and automation tools such as Terraform, Ansible, Chef, Puppet, Jenkins, GitHub Actions
✓Strong scripting skills (Python, Bash, PowerShell or similar)
✓Must be proficient using vibe coding and coding assistants to develop scripts, tools and applications for the DevOps and SRE use cases
✓Must have proficiency to debug or troubleshoot and/or deploying SQL and/or NoSQL databases, object storage, web servers, open-source programming stack for Node.JS, R, Python, .NET Core, Java is desired but not mandatory
✓Must be willing to learn new technologies, adopt and adapt to emerging technologies or needs from a project to a project
✓Cloud certifications is preferred
✓Certifications in Grafana, Splunk, Docker, Kubernetes is preferred but optional