Kpler
Kpler1mo ago

Senior DevOps Engineer (Cloud & ML Infrastructure)

GreeceGreeceRemoteFull-timesenior
DevOps & InfrastructureDevops EngineerInfrastructure & Cloud
0 views0 saves0 applied

Quick Summary

Overview

At Kpler, we are dedicated to helping our clients navigate complex markets with ease. By simplifying global trade information and providing valuable insights,

Technical Tools
DevOps & InfrastructureDevops EngineerInfrastructure & Cloud
At Kpler, we are dedicated to helping our clients navigate complex markets with ease. By simplifying global trade information and providing valuable insights, we empower organisations to make informed decisions in commodities, energy, and maritime sectors.

Since our founding in 2014, we have focused on delivering top-tier intelligence through user-friendly platforms. Our team of over 700 experts from 35+ countries works tirelessly to transform intricate data into actionable strategies, ensuring our clients stay ahead in a dynamic market landscape. Join us to leverage cutting-edge innovation for impactful results and experience unparalleled support on your journey to success.


Your future position
 

As a Senior Platform Engineer you will join the Cloud Platform team to design, operate, and evolve Kpler’s cloud-native infrastructure supporting backend, data, and ML workloads. You will operate within the existing platform engineering framework and contributes to overall reliability, scalability, and cost efficiency of the platform.  In addition, you will bring hands-on experience running ML/AI and GPU-based workloads in production, helping the team standardize and strengthen this scope as it grows. This is a senior+ individual contributor role combining operational excellence, architectural input, and hands-on execution in a 24/7 production environment.


  • Design, operate, and improve Kpler’s cloud-native infrastructure (Kubernetes, networking, compute, storage).

  • Contribute to Infrastructure as Code, CI/CD pipelines, and platform automation.

  • Ensure high availability, reliability, and security of production systems.

  • Improve observability, monitoring, alerting, and incident response processes.

  • Reduce MTTR and failure rates through structured reliability improvements.

  • Optimize infrastructure cost and performance, including compute-intensive workloads.

  • Support and help standardize ML/GPU-based workloads within the existing platform model.

  • Collaborate closely with ML engineers, data engineers, and backend teams to ensure production-grade deployments.

  • Contribute to architectural decisions shaping the evolution of the platform.

  • 5+ years of experience in cloud/platform engineering in production environments.

  • Strong hands-on experience with Kubernetes in production.

  • Experience with Infrastructure as Code (Terraform preferred).

  • Strong knowledge of AWS (or equivalent cloud provider).

  • Experience operating distributed systems in 24/7 environments.

  • Strong operational mindset (SLOs, monitoring, incident management).

  • Proven experience running ML/AI workloads in production.

  • Experience with GPU-based workloads.

  • Exposure to LLM-based or compute-intensive systems.

  • Experience optimizing cost and performance of high-compute infrastructure

  • Technical / Functional Skills:
  • Strong cloud platform engineering expertise (AWS preferred).

  • Advanced Kubernetes operations in production (scaling, upgrades, workload isolation, troubleshooting).

  • Solid Infrastructure as Code experience (Terraform or equivalent).

  • Strong understanding of distributed systems and cloud-native architectures.

  • Experience designing and operating CI/CD pipelines.

  • Strong observability practices (monitoring, logging, alerting, SLO definition).

  • Incident management and root cause analysis in 24/7 systems.

  • Infrastructure cost optimization and performance tuning.

  • Solid programming skills (Python or Go preferred).

  • Practical experience supporting ML/AI or GPU-based workloads in production (highly valued).

  • Ownership & Accountability – Takes end-to-end responsibility for production systems and reliability outcomes.

  • Systems Thinking – Understands architectural trade-offs and long-term impact of technical decisions.

  • Structured Problem Solving Under Pressure – Maintains clarity and effectiveness during incidents and high-stakes situations.

  • Collaborative & Autonomy – Communicates clearly in distributed teams, documents decisions effectively, and works autonomously while maintaining strong cross-team alignment

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience.

  • Strong programming skills (Python or Go preferred).

  • Solid understanding of cloud-native architecture and reliability engineering principles.

  • Listing Details

    Posted
    March 23, 2026
    First seen
    March 26, 2026
    Last seen
    April 22, 2026

    Posting Health

    Days active
    27
    Repost count
    0
    Trust Level
    39%
    Scored at
    April 22, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Kpler
    Kpler
    lever

    We're kpler, where impressive simplicity is at our core and fuels all we pursue.

    Employees
    750
    Founded
    2014
    Domain
    kpler.com
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    KplerSenior DevOps Engineer (Cloud & ML Infrastructure)