Senior Staff AI Data Infrastructure Engineer
Quick Summary
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics.
Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data. Deep understanding of data access patterns in deep learning frameworks like PyTorch , DeepSpeed, or Megatron .
-
Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
-
Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
-
Training Throughput Optimization: Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
-
Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.
-
Engineering Excellence: BS/MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
- Work Experience: 5-8 + years of industry experience.
-
Programming Mastery: Proficient in Python, C++, or Java, with a deep understanding of high-performance concurrent programming and systems design.
-
Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark.
-
Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance.
-
Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
-
Deep understanding of data access patterns in deep learning frameworks like PyTorch, DeepSpeed, or Megatron.
-
Practical experience with Vector Databases, automated labeling toolchains, or data-centric AI workflows.
-
Knowledge of storage formats optimized for AI (e.g., Parquet, Lance) and high-performance file systems.
- A fun, supportive and engaging environment.
- Infrastructures and computational resources to support your work.
- Opportunity to work on cutting edge technologies with the top talents in the field.
- Opportunity to make significant impact on the transportation revolution by the means of advancing autonomous driving.
- Competitive compensation package.
- Snacks, lunches, dinners, and fun activities.
Location & Eligibility
Listing Details
- Posted
- April 26, 2026
- First seen
- April 27, 2026
- Last seen
- May 31, 2026
Posting Health
- Days active
- 34
- Repost count
- 0
- Trust Level
- 34%
- Scored at
- May 31, 2026
Signal breakdown
Please let Xpengmotors know you found this job on Jobera.
4 other jobs at Xpengmotors
View all →Explore open roles at Xpengmotors.
Similar Ai Data Infrastructure Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.