Multimodal AI Systems Architect (AI Engineering)
Quick Summary
Integrate vision encoders and audio-native models into core agent reasoning loops. Optimize streaming latency for voice-to-voice AI interactions.
Experience with Whisper, CLIP, and multimodal LLM integration. Knowledge of streaming architectures and WebRTC. Expertise in cross-modal alignment.
We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.
Responsibilities
~1 min read- →Integrate vision encoders and audio-native models into core agent reasoning loops.
- →Optimize streaming latency for voice-to-voice AI interactions.
- →Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.
Requirements
~1 min read- Experience with Whisper, CLIP, and multimodal LLM integration.
- Knowledge of streaming architectures and WebRTC.
- Expertise in cross-modal alignment.
Location & Eligibility
Listing Details
- Posted
- April 24, 2026
- First seen
- April 24, 2026
- Last seen
- May 5, 2026
Posting Health
- Days active
- 10
- Repost count
- 0
- Trust Level
- 35%
- Scored at
- May 5, 2026
Signal breakdown

Web3 and AI talent recruitment agency based in Hong Kong with 700+ placements globally
Please let Hyphenconnect know you found this job on Jobera.
4 other jobs at Hyphenconnect
View all →Explore open roles at Hyphenconnect.
Similar Systems Architect jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.