Senior AI Data Infrastructure Engineer
Quick Summary
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics.
Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data. Deep understanding of data access patterns in deep learning frameworks like PyTorch , DeepSpeed, or Megatron .
-
Scalable Data Pipelines: Architect and build scalable, end-to-end pipelines to automate the ingestion, cleaning, and processing of PB-scale raw data for both production autonomy and multi-modal LLMs.
-
Modern Lakehouse Architecture: Evolve our data storage solutions based on Apache Iceberg and Lance to implement efficient semantic indexing, metadata management, and data versioning.
-
Training Throughput Optimization: Deeply optimize data loading and pre-fetching strategies to ensure maximum throughput for large-scale training on 10,000+ GPU clusters.
-
Infrastructure Evolution: Support the seamless transition of foundation model data into actionable training sets, bridging the gap between raw vehicle logs and model-ready tokens.
-
Engineering Excellence: BS/MS/PhD in Computer Science or a related field, with a proven track record of building large-scale distributed systems.
- Work Experience: 3-5 years of industry experience.
-
Programming Mastery: Proficient in Python, C++, or Java, with a deep understanding of high-performance concurrent programming and systems design.
-
Distributed Frameworks: Hands-on experience with at least one distributed processing framework, such as Ray and Spark.
-
Lakehouse Expertise: Familiarity with Data Lakehouse concepts and practical experience with technologies like Iceberg and Lance.
-
Experience building data warehouses for Trillion-token datasets or PB-scale multi-modal data.
-
Deep understanding of data access patterns in deep learning frameworks like PyTorch, DeepSpeed, or Megatron.
-
Practical experience with Vector Databases, automated labeling toolchains, or data-centric AI workflows.
-
Knowledge of storage formats optimized for AI (e.g., Parquet, Lance) and high-performance file systems.
Location & Eligibility
Listing Details
- First seen
- March 26, 2026
- Last seen
- May 17, 2026
Posting Health
- Days active
- 52
- Repost count
- 0
- Trust Level
- 34%
- Scored at
- May 18, 2026
Signal breakdown
Please let Xpengmotors know you found this job on Jobera.
4 other jobs at Xpengmotors
View all →Explore open roles at Xpengmotors.
Similar Ai Data Infrastructure Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.