Operations Engineer, HPC Networking
Quick Summary
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.
About the Role
~1 min readAt CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand fabrics, powering industry leading AI workloads.
Responsibilities
~1 min readIn this role, you will support the deployment, monitoring, troubleshooting, and maintenance of large-scale InfiniBand fabrics, ensuring their stability and performance. The ideal candidate will have a strong operations mindset, effective collaboration skills, and the ability to solve complex issues in a dynamic environment.
- →Regularly monitor the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes.
- →Investigate and resolve operational issues within InfiniBand fabrics, such as network connectivity problems and performance bottlenecks.
- →Assist with the installation and operational bring-up of large InfiniBand fabrics in collaboration with onsite personnel and customer teams.
- →Perform routine maintenance and upgrades on InfiniBand switches and control plane components.
- →Collaborate with HPC cluster operations teams to provide troubleshooting and operational expertise.
Investing in our people is one of our top priorities, and we value candidates who can bring their diversified experiences to our teams. Here are some qualities we’ve found compatible with our team. We'd love to talk about whether this aligns with your experience and Interests and what you’re excited to work on next.
Requirements
~1 min read- At least 1 year of experience with InfiniBand or similar networking technologies.
- Solid understanding of networking concepts, including architectures, topologies, operational best practices, and troubleshooting.
- Experience with Linux system administration and maintenance.
- Proficiency in at least one scripting language (e.g., Python) and hands-on experience with Ansible.
- Applicants must have work authorization that does not require sponsorship from the company now or in the future
- Experience with monitoring and visualization platforms such as Grafana or Prometheus.
Requirements
~1 min read- Hands-on experience with Nvidia UFM or similar fabric management tools.
- Experience with operational tooling and automation frameworks like Ansible.
- Knowledge of data center operations, including server racks, and cabling.
- Python or Bash scripting.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $90,000-$110,000. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
What We Offer
~1 min readWhile we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.
California Consumer Privacy Act - California applicants only
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: careers@coreweave.com.
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.
Location & Eligibility
Listing Details
- First seen
- April 15, 2026
- Last seen
- May 5, 2026
Posting Health
- Days active
- 19
- Repost count
- 0
- Trust Level
- 47%
- Scored at
- May 5, 2026
Signal breakdown
Please let Coreweave know you found this job on Jobera.
4 other jobs at Coreweave
View all →Explore open roles at Coreweave.
Similar Operations Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.
