✓100% Medical, Dental & Vision Coverage for Employees
✓Paid Time Off and Paid Holidays
✓401K match up to 5%
✓Educational Benefits for Career Growth
✓Employee Referral Bonus
✓Flexible Spending Accounts:
Healthcare (FSA)
✓Parking Reimbursement Account (PRK)
✓Dependent Care Assistant Program (DCAP)
✓Transportation Reimbursement Account (TRN)
✓Lead and manage IT operations aligned with ITIL processes including Incident, Problem, Change, and Release Management
✓Provide hands-on leadership in managing Linux and Windows environments across cloud and on-premises infrastructure
✓Own and drive incident response, root cause analysis, and service restoration for mission-critical systems
✓Design, build, and maintain golden images, patching strategies, and system hardening standards
✓Lead patch management and vulnerability remediation programs ensuring compliance and system integrity
✓Develop and implement automation solutions using modern approaches including Vibe Coding (AI-assisted development) to accelerate operational efficiency and reduce toil
✓Support and optimize infrastructure for AI/ML workloads, including provisioning, scaling, and performance tuning
✓Manage and maintain GPU-enabled environments and instances for high-performance computing and machine learning use cases
✓Oversee and optimize infrastructure monitoring, logging, alerting, and observability frameworks
✓Manage and mentor a team of systems engineers; provide technical guidance and performance oversight
✓Collaborate with architecture, security, and development teams to improve reliability, scalability, and operational efficiency
✓Support hybrid environments including cloud platforms and on-premise data centers
✓Ensure proper documentation, runbooks, SOPs, and operational readiness
✓Stay abreast of new technologies in your areas but not limited to US Federal Standards, NIST Publications, cloud computing & deployment, site reliability engineering, security standards and compliance best practices etc.
✓Must have 5+ years of experience leading operations team with hands-on experience in driving operational process improvements and technological advancements.
✓Proven experience implementing and operating within ITIL frameworks
✓Must have 10+ years of hands-on Unix/Linux experience that includes specific technical experience with CentOS / Red Hat systems administration support for large scale distributed environments
✓Hands-on experience with incident management, patching, system hardening, and production support
✓Experience building and maintaining golden images and standardized environments
✓Strong scripting/automation skills (e.g., Python, Bash, PowerShell or similar)
✓Experience with configuration management and automation tools (Ansible, Terraform, Puppet, Chef, or similar)
✓Strong understanding of networking fundamentals (DNS, TCP/IP, firewalls, load balancing)
✓Experience with monitoring and logging tools (e.g., Nagios, Splunk, ELK, Prometheus, Grafana)
✓Must have Cloud Build-Out or Migration experience in at least one of the following providers Amazon AWS, Google GCP and Microsoft Azure
✓Must have 2+ years with CI/CD and automation tools such as Terraform, Ansible, Chef, Puppet, Jenkins, GitHub
✓Experience supporting AI/ML workloads or data-intensive platforms
✓Familiarity with GPU-based compute environments (e.g., NVIDIA GPU instances)
✓Must be willing to learn new technologies, adopt and adapt to emerging technologies or needs from a project to a project
✓Knowledge of security best practices and compliance frameworks such as NIST 800-53, FedRAMP, FISMA etc. are preferred
✓Certifications such as ITIL, Linux, AWS, Azure, or Kubernetes (CKA/CKAD) is preferred
✓Networking certifications (CCNA/CCNP) are a plus