IT Infrastructure Engineer – RMA & Hardware Diagnostics
Quick Summary
Perform advanced firmware and hardware diagnostics on enterprise server platforms, including CPU, memory, PCIe devices, GPUs, storage subsystems,
Responsibilities
~1 min read- →
Perform advanced firmware and hardware diagnostics on enterprise server platforms, including CPU, memory, PCIe devices, GPUs, storage subsystems, and power components
- →
Troubleshoot complex hardware failures using system logs, BMC/IPMI interfaces, BIOS diagnostics, and vendor-specific tooling
- →
Act as the primary escalation point for L1 and L2 technicians on high-impact hardware incidents
- →
Conduct structured root cause analysis and document findings to prevent repeat failures
- →
Own the full RMA lifecycle, including validation of failed components, warranty claim creation, vendor coordination, tracking, and resolution
- →
Interface directly with OEM vendors to escalate recurring hardware defects and drive corrective action
- →
Analyze hardware failure trends and report metrics such as repeat RMA rates and component reliability
- →
Develop and standardize diagnostic playbooks, troubleshooting workflows, and hardware validation procedures
- →
Validate replacement components prior to redeployment into production environments
- →
Collaborate cross-functionally with data center operations, procurement, and engineering teams to improve hardware lifecycle processes
- →
Contribute to reducing MTTR and improving fleet-wide reliability through process improvements and knowledge sharing
-
5+ years of hands-on experience working with enterprise server hardware in a production data center environment
-
Deep understanding of x86 server architecture, including CPUs, memory, PCIe devices, storage controllers, GPUs, and power subsystems
-
Strong experience performing firmware and BIOS/BMC diagnostics and upgrades
-
Advanced Linux command-line troubleshooting skills, including log analysis and hardware-level diagnostics
-
Experience working with remote management interfaces such as IPMI, iDRAC, iLO, or equivalent
-
Proven experience managing hardware RMA processes and working directly with OEM vendors
-
Ability to conduct structured root cause analysis and document technical findings clearly
-
Familiarity with hardware monitoring systems and failure trend analysis
-
Strong ownership mindset and ability to operate independently in mission-critical environments
-
High proficiency in spoken and written English
Nice to Have
~1 min read-
Experience performing board-level diagnostics and component-level repair (SMD rework)
-
Familiarity with data center networking equipment and basic network troubleshooting
-
Experience supporting GPU-dense or high-performance compute environments
Valid driver’s license
What We Offer
~1 min readWe offer competitive salaries, ranging from $76,800k - $184,300k OTE (On Target Earnings) based on your experience and skills.
What We Offer
~1 min readListing Details
- First seen
- April 3, 2026
- Last seen
- April 26, 2026
Posting Health
- Days active
- 23
- Repost count
- 0
- Trust Level
- 31%
- Scored at
- April 26, 2026
Signal breakdown
Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.
View company profilePlease let Nebius know you found this job on Jobera.
4 other jobs at Nebius
View all →Explore open roles at Nebius.
Similar It Infrastructure Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.