Audiobox (AI research by Meta)

What can do:

Audiobox is a new AI research model developed by Meta for audio generation. Its core function revolves around generating voices and sound effects with the help of voice inputs and natural language text prompts. This allows for the creation of custom audio for a plethora of uses. The Audiobox also features specialized models like Audiobox Speech and Audiobox Sound, all of which are based on the shared self-supervised model, Audiobox SSL.

Features of Audiobox

  1. Audio Generation: Audiobox can generate voices and sound effects making it effective for a multitude of tasks and applications.
  2. Use of Voice Inputs and Natural Language Text: The AI model uses voice inputs along with natural language text prompts for the creation of custom audio.
  3. Audiobox Speech and Audiobox Sound: These are specialized models that cater to specific needs and applications.
  4. Audiobox SSL: All models of Audiobox are built around the shared self-supervised model, the Audiobox SSL.
  5. Interactive Audio Demos: Audiobox provides a series of interactive audio demos that help users understand its unique capabilities.

Use Cases of Audiobox

  1. Audiobox Maker: It allows users to express their creativity and create original audio stories, which they can download and share with friends.
  2. Technical Research: Useful for those wanting to understand the intricacies of AI models and Meta's commitment to making AI safe.
  3. Custom Sound Effects: The natural language text prompts and voice inputs allow the creation of custom sound effects for a range of applications.
  4. Specialized Applications: The specialized models, Audiobox Speech and Audiobox Sound cater to specific audio needs, expanding the range of potential use cases.
  5. Audio Education and Experimentation: Through the interactive demos, users can learn more about the capabilities of Audiobox and experiment with the tool.

Prompt type:

Text to audio, Create audio


Audiobox is Meta’s foundation research model for audio generation. It uses voice inputs and natural language text prompts to generate voices and sound effects, enabling users to create custom audio.