The easiest way to use Whisper in Swift
Easily add transcription to your app or package. Powered by whisper.cpp.
Add SwiftWhisper as a dependency in your Package.swift
file:
let package = Package(
...
dependencies: [
// Add the package to your dependencies
.package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
],
...
targets: [
// Add SwiftWhisper as a dependency on any target you want to use it in
.target(name: "MyTarget",
dependencies: [.byName(name: "SwiftWhisper")])
]
...
)
Add https://github.com/exPHAT/SwiftWhisper.git
in the "Swift Package Manager" tab.
import SwiftWhisper
let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)
print("Transcribed audio:", segments.map(\.text).joined())
You can subscribe to segments, transcription progress, and errors by implementing WhisperDelegate
and setting whisper.delegate = ...
protocol WhisperDelegate {
// Progress updates as a percentage from 0-1
func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)
// Any time a new segments of text have been transcribed
func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
// Finished transcribing, includes all transcribed segments of text
func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])
// Error with transcription
func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}
You can find the pre-trained models here for download.
To use CoreML, you'll need to include a CoreML model file with the suffix -encoder.mlmodelc
under the same name as the whisper model (Example: tiny.bin
would also sit beside a tiny-encoder.mlmodelc
file). In addition to the additonal model file, you will also need to use the Whisper(fromFileURL:)
initializer. You can verify CoreML is active by checking the console output during transcription.
The easiest way to get audio frames into SwiftWhisper is to use AudioKit. The following example takes an input audio file, converts and resamples it, and returns an array of 16kHz PCM floats.
import AudioKit
func convertAudioFileToPCMArray(fileURL: URL, completionHandler: @escaping (Result<[Float], Error>) -> Void) {
var options = FormatConverter.Options()
options.format = .wav
options.sampleRate = 16000
options.bitDepth = 16
options.channels = 1
options.isInterleaved = false
let tempURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent(UUID().uuidString)
let converter = FormatConverter(inputURL: fileURL, outputURL: tempURL, options: options)
converter.start { error in
if let error {
completionHandler(.failure(error))
return
}
let data = try! Data(contentsOf: tempURL) // Handle error here
let floats = stride(from: 44, to: data.count, by: 2).map {
return data[$0..<$0 + 2].withUnsafeBytes {
let short = Int16(littleEndian: $0.load(as: Int16.self))
return max(-1.0, min(Float(short) / 32767.0, 1.0))
}
}
try? FileManager.default.removeItem(at: tempURL)
completionHandler(.success(floats))
}
}
You may find the performance of the transcription slow when compiling your app for the Debug
build configuration. This is because the compiler doesn't fully optimize SwiftWhisper unless the build configuration is set to Release
.
You can get around this by installing a version of SwiftWhisper that uses .unsafeFlags(["-O3"])
to force maximum optimization. The easiest way to do this is to use the latest commit on the fast
branch. Alternatively, you can configure your scheme to build in the Release
configuration.
...
dependencies: [
// Using latest commit hash for `fast` branch:
.package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "deb1cb6a27256c7b01f5d3d2e7dc1dcc330b5d01"),
],
...