Cohere said the model achieved an average word error rate of 5.42 on the Hugging Face Open ASR leaderboard, outperforming competing systems such as Zoom Scribe v1, IBM Granite 4.0 1B, and ElevenLabs Scribe v2. The company also reported a 61% win rate in human evaluations measuring accuracy, coherence, and usability, though performance lagged behind rivals in Portuguese, German, and Spanish.
In terms of speed, Transcribe can process 525 minutes of audio per minute, positioning it as a high-throughput option within its class. The relatively compact 2 billion parameter size is intended to balance performance with the ability to run on accessible hardware.
Cohere plans to integrate the model into its enterprise platform, North, and is making it available for free through its API, as well as via its managed inference service, Model Valut.
The release comes as demand for speech-to-text tools grows, driven by applications such as meeting transcription and dictation software. Cohere’s move expands its product portfolio beyond language models as the company builds out capabilities for enterprise AI workflows.
This analysis is based on reporting from TechCrunch.
Image courtesy of Cohere.
This article was generated with AI assistance and reviewed for accuracy and quality.