Technical Staff, Audio Omni

Research

US, Local, California

March 26, 2026

Full Time

Job Description

About Anuttacon

Anuttacon is an independent research lab pursuing humanistic general intelligence that you can experience in every real-time interaction—seamlessly understanding and expressing through text, voice, visuals and beyond.

We see AI and humans as equal partners in virtual world creation and discovery. Our mission is to build multimodal AI with genuine emotional understanding and expressive communication – technology that not only thinks but feels, connecting with you authentically through rich, nuanced interactions that enhance your experience.

Key Responsibilities:

Design and develop a unified Any-to-Any multimodal architecture with a primary focus on native Audio-in-Audio-out modeling.
Develop high-performance Neural Audio Codecs, exploring the optimal balance between continuous representations and discrete tokens.
Leverage large-scale multimodal data (speech, music, environmental audio, video) to lead distributed pre-training of ultra-large-scale models.
Explore instruction fine-tuning and reinforcement learning algorithms tailored for the audio modality, optimizing emotional expression, interruption handling, paralinguistic features (laughter, pauses), and perceptual audio quality in speech interaction.

Qualifications:

PhD in Computer Science, Artificial Intelligence, Electronic Engineering, or a related field;
Hands-on experience training large-scale models (LLM or multimodal), with deep understanding of Transformer architectures and distributed training frameworks (Megatron-LM, DeepSpeed, TorchTitan, etc.);
Deep expertise in at least one of the following areas:
Audio/Text Interleaved Pretraining
Multimodal Alignment & RL
End-to-End Speech Dialogue Modeling
Proficiency in PyTorch with extensive experience in large-scale data processing;
Ability to collaborate across time zones with strong communication skills; results-driven and highly accountable.

Preferred Qualifications:

Publications at top-tier venues such as NeurIPS, ICML, ICASSP, or ISMIR;
Experience managing multi-type audio datasets at the scale of one million hours or above;
Core contributor to an industry-level Omni or Multimodal Foundation Model.

Technical Staff, Audio Omni

Job Description

Location

For Job Seekers

For Employers

Quick Links

Coming Soon

Technical Staff, Audio Omni

Job Description

Share this post

Location

For Job Seekers

For Employers

Quick Links

Coming Soon