Introducing Code Llama, a state-of-the-art large language model for coding TakeawaysCode Llama is a state-of-the-art LLM capable of generating code, and natural language about...
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation AbstractWe introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and ro...
Multi-Head State Space Model for Speech Recognition AbstractState space models (SSMs) have recently shown promising results on small-scale sequence and ...
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding AbstractEnd-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse ...
Open sourcing AudioCraft: Generative AI for audio made simple and available to all Imagine a professional musician being able to explore new compositions without having to play a sing...
Community-driven AI innovation comes alive with Llama 2 Last week, we took an important step toward advancing access and opportunity in the creation of AI-p...
Llama 2: Open Foundation and Fine-Tuned Chat Models AbstractIn this work, we develop and release Llama 2, a collection of pretrained and fine-tuned larg...
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning AbstractWe present CM3Leon (pronounced “Chameleon”), a retrieval-augmented, tokenbased, decoder-only...
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild AbstractRecognizing scenes and objects in 3D from a single image is a longstanding goal of computer ...
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second AbstractWe present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for ...