N
Nahc1d ago
New

Site Reliability Engineer (SRE)

TaiwanTaiwan·Taipei CityFull-timemid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

Our client is an innovative technology company operating large-scale cloud and edge infrastructure supporting AI-driven products and services. As the platform continues to expand,

Technical Tools
EngineeringDevops Engineer

Our client is an innovative technology company operating large-scale cloud and edge infrastructure supporting AI-driven products and services. As the platform continues to expand, they are looking for a Site Reliability Engineer to help build highly reliable, observable, and secure systems that power mission-critical applications.

This role offers the opportunity to work across cloud infrastructure, Kubernetes, observability, security, automation, and emerging AI operational platforms in a fast-growing environment.

  • Design and maintain monitoring, alerting, and dashboarding systems across cloud and edge environments.
  • Build visibility into system health through metrics, logs, traces, and performance analytics.
  • Define and manage SLIs, SLOs, and service reliability targets.
  • Develop proactive monitoring and anomaly detection capabilities to identify issues before they impact users.
  • Deploy, manage, and optimize containerized workloads running on Kubernetes.
  • Maintain scalable cloud infrastructure across production environments.
  • Improve system performance, availability, and operational efficiency.
  • Support infrastructure provisioning through Infrastructure-as-Code practices.
  • Implement secure access controls and audit mechanisms across infrastructure environments.
  • Monitor for cybersecurity threats, unauthorized access attempts, and service disruptions.
  • Develop alerting and response procedures for security-related incidents.
  • Contribute to operational security best practices and governance initiatives.
  • Automate repetitive operational tasks to reduce manual effort and improve reliability.
  • Build tooling and scripts to streamline infrastructure operations.
  • Support CI/CD workflows and deployment automation.
  • Promote documentation, operational standards, and continuous improvement.
  • Participate in on-call rotations and incident management.
  • Lead troubleshooting efforts during production incidents.
  • Conduct root-cause analysis and post-mortem reviews.
  • Drive long-term improvements that enhance system resilience.
  • Work closely with software, AI, machine learning, hardware, and product teams.
  • Ensure new services are production-ready with appropriate monitoring, security, and reliability measures.
  • Support the operational needs of both cloud-based and distributed edge computing environments.
  • 3+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Production Operations.
  • Hands-on experience with AWS or other major cloud platforms.
  • Strong understanding of observability and monitoring tools such as Grafana, Prometheus, or similar platforms.
  • Solid Linux administration and troubleshooting skills.
  • Experience with Docker, Kubernetes, and containerized workloads.
  • Experience with Infrastructure as Code tools such as Terraform.
  • Proficiency in at least one scripting or programming language (Python, Bash, etc.).
  • Understanding of networking fundamentals and infrastructure security concepts.
  • Experience supporting production systems and participating in incident response.
  • Strong automation mindset and commitment to operational excellence.
  • Experience operating large-scale edge computing or IoT deployments.
  • Familiarity with zero-trust access management platforms.
  • Experience in security operations, threat detection, or infrastructure security.
  • Exposure to AI infrastructure, LLM-based applications, or workflow automation platforms.
  • Knowledge of AI-Ops, anomaly detection, or intelligent monitoring solutions.
  • Familiarity with compliance and security frameworks such as ISO 27001.
  • Location & Eligibility

    Where is the job
    Taipei City, Taiwan
    On-site at the office
    Who can apply
    TW

    Listing Details

    Posted
    June 4, 2026
    First seen
    June 4, 2026
    Last seen
    June 6, 2026

    Posting Health

    Days active
    0
    Repost count
    0
    Trust Level
    60%
    Scored at
    June 4, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    N
    Site Reliability Engineer (SRE)