Platform Support Engineer
Quick Summary
At Braze, we have found our people. We’re a genuinely approachable, exceptionally kind, and intensely passionate crew. We seek to ignite that passion by setting high standards, championing teamwork,
At Braze, we have found our people. We’re a genuinely approachable, exceptionally kind, and intensely passionate crew.
We seek to ignite that passion by setting high standards, championing teamwork, and creating work-life harmony as we collectively navigate rapid growth on a global scale while striving for greater equity and opportunity – inside and outside our organization.
To flourish here, you must be prepared to set a high bar for yourself and those around you. There is always a way to contribute: Acting with autonomy, having accountability and being open to new perspectives are essential to our continued success.
Our deep curiosity to learn and our eagerness to share diverse passions with others gives us balance and injects a one-of-a-kind vibrancy into our culture.
If you are driven to solve exhilarating challenges and have a bias toward action in the face of change, you will be empowered to make a real impact here, with a sharp and passionate team at your back. If Braze sounds like a place where you can thrive, we can’t wait to meet you.
Responsibilities
~2 min readPlatform Support Engineers (PSEs) are the first line of defense in ensuring the health and availability of Braze’s platform and systems. As part of a global triage team, they actively monitor system performance, respond to alerts, and execute runbooks, SOPs (Standard Operating Procedures), and MOPs (Maintenance Operating Procedures) to address operational issues.
Braze operates at a massive scale with over 3.3 billion monthly active users across our customers, collecting hundreds of billions of data points each month and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. The Braze Operations Team optimizes our response mechanisms by centralizing triage and monitoring responsibilities. It allows our other engineering teams to focus on what they do best while we do what we do best. As a Platform Support Engineer at Braze, you will focus on maintaining uptime and reliability, collaborating with engineers to escalate complex issues, and contributing to continuously improving operational processes.
Main responsibilities:
- →Active System Monitoring:
- →Use monitoring tools (e.g., Datadog, Prometheus, or similar) to observe the health of platform systems and services continuously
- →Proactively identify and respond to performance anomalies, outages, or unusual system behavior
- →Maintain awareness of ongoing incidents and collaborate with relevant teams to ensure timely resolution
- →Incident Response and Triage:
- →Act as the first responder to system alerts, determining the severity and scope of issues
- →Execute predefined runbooks, SOPs, and MOPs to mitigate incidents and restore services
- →When incidents exceed the scope of triage procedures, escalate issues to appropriate engineering teams (e.g., SREs or Platform Engineers)
- →Operational Procedures:
- →Follow and improve operational processes for incident management, system health checks, and routine maintenance tasks
- →Maintain and update runbooks, ensuring accuracy and relevance to current systems and practices
- →Participate in post-incident reviews to improve documentation and operational readiness
- →Collaboration and Communication:
- →Provide clear, concise communication during incidents, ensuring stakeholders know the status and progress of the resolution
- →Collaborate with SREs, Platform Engineers, and other teams to enhance monitoring, alerting, and operational tools
- →Actively participate in training sessions to stay current on new systems and tools introduced by engineering teams
- →Continuous Improvement:
- →Identify monitoring, documentation, and procedure gaps and suggest improvements to enhance efficiency and effectiveness
- →Assist in testing new runbooks, tools, and processes to improve incident response times
- →Contribute to the automation of routine tasks to reduce manual toil
- Experience:
- 1-3 years of experience in technical operations, system administration, or entry-level cloud engineering roles
- Familiarity with cloud platforms (AWS, GCP, Azure), kubernetes, and basic computing, storage, and networking concepts
- Experience with monitoring and alerting tools (e.g., Datadog, Prometheus, Grafana) is a plus
- Skills:
- Strong troubleshooting and problem-solving skills, with the ability to follow processes and escalate appropriately
- Proficiency in scripting or automation tools (e.g., Python, Bash) is a bonus
- Familiarity with incident management processes and ITIL best practices
- Mindset:
- Detail-oriented and committed to maintaining system health and uptime
- Eager to learn and grow, with a passion for operational excellence
- Collaborative and communicative, able to work effectively in a global, distributed team
#LI-Hybrid
What We Offer
~3 min readListing Details
- Posted
- February 24, 2026
- First seen
- March 23, 2026
- Last seen
- April 14, 2026
Posting Health
- Days active
- 21
- Repost count
- 0
- Trust Level
- 45%
- Scored at
- April 14, 2026
Signal breakdown

Braze is a comprehensive customer engagement platform that powers relevant and memorable experiences between consumers and the brands they love.
View company profilePlease let Braze know you found this job on Jobera.
4 other jobs at Braze
View all →Explore open roles at Braze.
Similar Platform Support Engineer jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.