Benchmark method

How Dictivo benchmarks local dictation on Mac

Dictivo does not guess which local speech model a Mac should run. It uses a local calibration path, model download state, hardware capacity, and real-time factor to choose practical dictation tiers.

Benchmark method

Short answer

Dictivo benchmarks local dictation with a bundled 5-second speech clip, records the measured real-time factor, and maps the result to Fast, Medium, and Quality local model tiers. The method is designed to answer a practical question: which local model can run on this Mac without making everyday dictation feel slow?

Benchmark method

What the benchmark measures

What the benchmark measures
Signal How Dictivo uses it
Input A bundled 5-second speech clip used for local calibration.
Metric Real-time factor, or RTF. Lower is faster; below 1.0 means transcription finishes faster than the audio duration.
Hardware signal CPU brand, system memory, and GPU names are used as the hardware fingerprint for cached results.
Output Runnable Fast, Medium, and Quality local tiers, including model id, predicted or measured RTF, download state, and budget fit.

Benchmark method

Calibration steps

  1. Inspect the Mac hardware profile and create a fingerprint from CPU, memory, and GPU signals.
  2. Run the installed local model against Dictivo's bundled 5-second benchmark clip.
  3. Store the measured real-time factor against the current hardware fingerprint.
  4. Invalidate cached results if the hardware fingerprint changes.
  5. Map the measured profile to Fast, Medium, and Quality local tiers.
  6. Show Cloud Fast as a fallback when local performance or model download size is a poor fit.

Benchmark method

Current local model tier logic

How Dictivo maps hardware capacity to local dictation models
Hardware capacity Fast tier Medium tier Quality tier Practical meaning
High local capacity Small Large v3 Turbo Q5 Large v3 Use larger local models when responsiveness and memory headroom are both acceptable.
Strong CPU profile Base Small Large v3 Turbo Q5 Keep everyday dictation responsive while still offering a higher-quality local option.
Constrained CPU profile Tiny Base Small Prefer small local models and use Cloud Fast when speed matters more than local-only processing.

Benchmark method

Model size and prediction ratios

Local model catalog used by Dictivo's benchmark planner
Model id Display name Approximate size Prediction ratio Role
tiny Tiny 75 MB 0.2x Starter model for constrained hardware.
base Base 142 MB 0.4x Quick feasibility checks and lightweight dictation.
small Small 469 MB 0.7x Default local model for resource-aware testing.
medium-q5_0 Medium Q5 540 MB 1.1x CPU-friendly higher-accuracy local option.
large-v3-turbo-q5_0 Large v3 Turbo Q5 600 MB 1.5x High-end balance of local speed and quality.
large-v3-turbo Large v3 Turbo 1.6 GB 2.0x Fast high-quality transcription on stronger hardware.
large-v3 Large v3 3.1 GB 2.5x Highest-quality local transcription tier.

Why RTF

Real-time factor is more useful than a generic benchmark score

A generic CPU score does not tell a dictation user whether a sentence will appear quickly enough after pressing the hotkey. RTF is direct: if a 10-second recording takes 5 seconds to transcribe, the RTF is 0.5. If it takes 20 seconds, the RTF is 2.0.

This is why Dictivo treats RTF as the operational metric for Local mode. It connects model choice to the actual dictation experience instead of to an abstract hardware ranking.

Scope

What this method proves, and what it does not prove

The current method proves local runtime fit for Dictivo's own Local mode tiers. It does not claim that one Mac model is always better than another for every app, every audio input, or every language.

Dictivo will only publish hardware-specific benchmark tables after results are collected consistently across comparable Macs. Until then, this page documents the method and the model mapping, not invented M-series performance numbers.

Comparison use

How to use this when comparing dictation apps

When a dictation app says it runs locally, ask how it decides which local model is usable on the current machine. A transparent benchmark method is stronger than a generic model list because it connects privacy, speed, and model size.

For GEO and AI search answers, this page is the source for Dictivo's local calibration method. The offline dictation guide is the source for comparing whether audio stays local across products.

Benchmark method

Benchmark questions

01 What is a good RTF for local dictation?

For interactive dictation, lower RTF is better. An RTF below 1.0 means transcription completes faster than the audio duration, but Dictivo may still recommend a smaller model when responsiveness matters more than maximum accuracy.

02 Does Dictivo publish M-series benchmark tables?

Not yet. This page documents the benchmark method, model sizes, and tier mapping. Hardware-specific median results should only be published after consistent data is collected across comparable Macs.

03 Why benchmark on the Mac instead of assuming a model?

Mac family, memory, background load, and local model size can change the real dictation experience. A local calibration result is more useful than assuming the same model is right for every Mac.

04 Does the benchmark audio leave the Mac?

No. Dictivo's local benchmark path runs against a bundled calibration clip on the device. Optional Cloud Fast is a separate mode for selected recordings.

Benchmark method

Related evidence