skip to content
Site header image reelikklemind

🎤 whisper.cpp: High-Performance Speech to Text in C/C++

C/C++ implementation of Whisper for high-performance speech recognition


whisper.cpp: High-Performance Speech Recognition in C/C++


In the field of speech recognition, OpenAI's Whisper has emerged as a game-changer. However, running large AI models typically requires significant computational resources. Enter whisper.cpp, a groundbreaking C/C++ implementation of Whisper that delivers high-performance speech recognition with minimal dependencies and maximum efficiency.


1. Introduction to whisper.cpp

whisper.cpp is a lightweight, high-performance port of OpenAI's Whisper automatic speech recognition (ASR) model written entirely in C/C++. This remarkable project brings the power of state-of-the-art speech recognition to virtually any platform, from embedded systems to high-end servers, without the overhead of Python dependencies.

Key Features:

  • Pure C/C++ Implementation: No external dependencies, completely self-contained
  • Cross-Platform Support: Runs on Mac OS, iOS, Android, Linux, Windows, and more
  • Hardware Acceleration: Optimized for Apple Silicon, NVIDIA GPUs, Intel GPUs, and more
  • Memory Efficient: Zero memory allocations at runtime, quantization support
  • Multiple Model Sizes: From tiny (75MB) to large (2.9GB) models
  • Real-Time Performance: Capable of faster-than-realtime transcription on modern hardware
  • Voice Activity Detection: Built-in VAD for efficient processing

2. Getting Started: Installation and Setup

System Requirements:

  • Modern CPU (x86, ARM, or POWER architectures)
  • 2GB RAM minimum (4GB+ recommended for larger models)
  • 1GB+ free disk space (for models)
  • CMake build system
  • Optional: CUDA for NVIDIA GPU support

Installation Steps:

Step 1: Clone the Repository

git clone <https://github.com/ggml-org/whisper.cpp.git>
cd whisper.cpp

Step 2: Download a Model

# Download the base English model (recommended for starters)
./models/download-ggml-model.sh base.en

# Available models:
# tiny.en (75MB), base.en (142MB), small.en (466MB)
# medium.en (1.5GB), large-v3 (2.9GB)

Step 3: Build the Project

# Standard build
cmake -B build
cmake --build build -j --config Release

# Quick demo (downloads model and runs samples)
make base.en

Step 4: Test with Sample Audio

# Transcribe the included JFK sample
./build/bin/whisper-cli -f samples/jfk.wav

# Or process all samples
make samples

3. Basic Usage and Examples

Command-Line Interface:

Basic Transcription

# Transcribe a WAV file
./build/bin/whisper-cli -f audio.wav

# Specify a different model
./build/bin/whisper-cli -m models/ggml-tiny.en.bin -f audio.wav

# Get help with all options
./build/bin/whisper-cli -h

Common Options

# Translate to English
./build/bin/whisper-cli -f audio.wav --translate

# Set language (e.g., Spanish)
./build/bin/whisper-cli -f audio.wav -l es

# Use multiple threads
./build/bin/whisper-cli -f audio.wav -t 8

# Enable word timestamps
./build/bin/whisper-cli -f audio.wav --word-timestamps

Audio Format Conversion:
whisper.cpp requires 16-bit WAV files at 16kHz sample rate:

# Convert MP3 to WAV using ffmpeg
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

# Convert other formats
ffmpeg -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
ffmpeg -i input.m4a -ar 16000 -ac 1 -c:a pcm_s16le output.wav

4. Model Sizes and Performance

Available Models:

Model Disk Size Memory Usage Speed Accuracy
tiny.en 75 MiB ~273 MB Very Fast Good
base.en 142 MiB ~388 MB Fast Very Good
small.en 466 MiB ~852 MB Medium Excellent
medium.en 1.5 GiB ~2.1 GB Slow Outstanding
large-v3 2.9 GiB ~3.9 GB Very Slow Best

Model Selection Guide:

  • tiny.en: Best for real-time applications, embedded systems
  • base.en: Good balance of speed and accuracy for most use cases
  • small.en: High accuracy for production applications
  • medium.en: Maximum accuracy when quality is critical
  • large-v3: Best possible accuracy, requires significant resources

Performance Benchmarks:

  • Real-time transcription achievable on modern CPUs with base.en model
  • Apple Silicon: 3x+ faster with Metal acceleration
  • NVIDIA GPUs: 5-10x faster with CUDA support
  • Intel GPUs: Significant speedup with OpenVINO

5. Hardware Acceleration Options

whisper.cpp supports multiple hardware acceleration methods to maximize performance:

Apple Silicon Optimization:

# Build with Metal support
cmake -B build -DWHISPER_METAL=1
cmake --build build -j --config Release

# Core ML support (3x+ speedup)
cmake -B build -DWHISPER_COREML=1
cmake --build build -j --config Release

# Generate Core ML model first
./models/generate-coreml-model.sh base.en

NVIDIA GPU Support:

# Build with CUDA support
cmake -B build -DGGML_CUDA=1
cmake --build build -j --config Release

# For RTX 5000 series and newer
cmake -B build -DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="86"
cmake --build build -j --config Release

Intel GPU/OpenVINO Support:

# Build with OpenVINO support
cmake -B build -DWHISPER_OPENVINO=1
cmake --build build -j --config Release

# Generate OpenVINO model
python models/convert-whisper-to-openvino.py --model base.en

Vulkan GPU Support:

# Cross-vendor GPU acceleration
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config Release

ARM NEON Optimization:

# ARM processors (mobile, Raspberry Pi)
cmake -B build -DWHISPER_NEON=1
cmake --build build -j --config Release

6. Quantization and Memory Optimization

whisper.cpp supports model quantization to reduce memory usage and disk space:

Quantization Methods:

# Build quantization tool
cmake -B build
cmake --build build -j --config Release

# Quantize a model (Q5_0 method)
./build/bin/quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0

# Available quantization methods:
# q4_0, q4_1, q5_0, q5_1, q8_0

Quantization Benefits:

  • Reduced Memory: Up to 4x reduction in memory usage
  • Smaller Disk Size: Quantized models take less storage space
  • Faster Loading: Smaller models load quicker
  • Better Cache Performance: Improved CPU cache utilization

Memory Usage Comparison:

  • Non-quantized base.en: ~388 MB
  • Q5_0 quantized base.en: ~150 MB
  • Q4_0 quantized base.en: ~120 MB

7. Platform-Specific Support

whisper.cpp runs on virtually any platform:

Desktop Platforms:

  • Linux: Full support with all acceleration options
  • Windows: MSVC and MinGW support
  • macOS: Intel and Apple Silicon with Metal/Core ML

Mobile Platforms:

  • iOS: Full support with Core ML acceleration
  • Android: Native Android support

Embedded Systems:

  • Raspberry Pi: ARM NEON optimization
  • Other ARM boards: Generic ARM support

WebAssembly:

# Build for WebAssembly
cmake -B build -DWHISPER_WASM=1
cmake --build build -j --config Release

Docker Support:

# Using Docker
docker pull ghcr.io/ggml-org/whisper.cpp:latest
docker run -it ghcr.io/ggml-org/whisper.cpp:latest

Special Architectures:

  • POWER9/10: VSX intrinsics support for IBM POWER systems
  • RISC-V: Experimental support for RISC-V processors

8. Advanced Features and Tips

Voice Activity Detection (VAD):

# Enable VAD to skip silent portions
./build/bin/whisper-cli -f audio.wav --vad-thold 0.6

# Adjust VAD threshold (0.0 to 1.0)
./build/bin/whisper-cli -f audio.wav --vad-thold 0.8

Language Detection:

# Auto-detect language
./build/bin/whisper-cli -f audio.wav

# Force specific language
./build/bin/whisper-cli -f audio.wav -l fr

Translation Support:

# Translate non-English speech to English
./build/bin/whisper-cli -f audio.wav --translate

# Translate with specific source language
./build/bin/whisper-cli -f audio.wav --translate -l es

Performance Optimization Tips:

CPU Optimization

# Use multiple threads
./build/bin/whisper-cli -f audio.wav -t 8

# Enable AVX2/AVX512 if available
cmake -B build -DWHISPER_FMA=1 -DWHISPER_AVX2=1

Memory Management

# Process long files in chunks
./build/bin/whisper-cli -f long_audio.wav --max-ctx 512

# Reduce context size for memory-constrained systems
./build/bin/whisper-cli -f audio.wav --max-ctx 256

Audio Quality Tips

  • Use 16kHz sample rate for best results
  • Ensure audio is mono (single channel)
  • Remove background noise when possible
  • Avoid compressed audio formats (use WAV when possible)

Common Issues and Solutions:

Memory Issues

  • Use quantized models for memory-constrained systems
  • Reduce context size with -max-ctx
  • Use smaller models (tiny.en or base.en)

Performance Issues

  • Enable hardware acceleration (Metal, CUDA, OpenVINO)
  • Use appropriate model size for your use case
  • Increase thread count with t option

Accuracy Issues

  • Use larger models for better accuracy
  • Ensure audio quality is good
  • Specify correct language with l option

9. Integration and Applications

C/C++ Integration:

#include "whisper.h"

// Initialize context
struct whisper_context * ctx = whisper_init_from_file("models/ggml-base.en.bin");

// Process audio
whisper_full(ctx, params, audio_data, audio_length);

// Get results
int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i) {
    const char * text = whisper_full_get_segment_text(ctx, i);
    printf("%s\\n", text);
}

// Cleanup
whisper_free(ctx);

Python Bindings:

# Use whisper.cpp through Python wrappers
import whispercpp

# Initialize
whisper = whispercpp.Whisper('models/ggml-base.en.bin')

# Transcribe
result = whisper.transcribe('audio.wav')
print(result)

Real-World Applications:

  • Voice Assistants: Offline voice command processing
  • Meeting Transcription: Real-time meeting notes
  • Content Creation: Automatic caption generation
  • Accessibility: Speech-to-text for accessibility tools
  • Language Learning: Pronunciation practice and feedback
  • Call Centers: Automatic call transcription and analysis
  • Media Production: Subtitle generation for videos

Mobile Applications:

  • iOS Apps: Fully offline speech recognition
  • Android Apps: Voice-enabled features without internet
  • Cross-Platform: Consistent behavior across platforms

Server Applications:

  • API Services: High-performance speech recognition API
  • Batch Processing: Large-scale audio file processing
  • Real-Time Streaming: Live audio transcription services

Conclusion

whisper.cpp represents a significant milestone in the democratization of advanced speech recognition technology. By bringing OpenAI's powerful Whisper model to the efficient world of C/C++, it opens up possibilities that were previously out of reach for many developers and applications.


Key Takeaways:

  • Performance: Delivers state-of-the-art speech recognition with minimal overhead
  • Portability: Runs on virtually any platform from embedded systems to servers
  • Flexibility: Supports multiple hardware acceleration methods
  • Efficiency: Optimized for both memory usage and processing speed
  • Accessibility: Makes advanced AI available without complex dependencies


The most important fact is that high-quality speech recognition is now available to anyone, anywhere, on virtually any device. Whether you're building a mobile app, a desktop application, or a server-side service, whisper.cpp provides the tools you need to integrate powerful speech recognition capabilities.

From real-time voice assistants to large-scale transcription services, whisper.cpp enables developers to create innovative applications that understand and process human speech with unprecedented accuracy and efficiency. The combination of cutting-edge AI performance and lightweight implementation makes it an essential tool for the modern developer's toolkit.



Crepi il lupo! 🐺