Engineering · April 30, 2026 · 6 min read

Flip the read: when one audio file has two right answers

The BPM detector says 70. The vocalist hears 140. Neither is wrong; they're locked to different pulses in the same audio. Once you see that pattern, it shows up everywhere — half-time vs straight, triad vs 7th, swung vs straight. Four toggles in the audio toolkit are all the same idea: make the flip a single tap.

The 70/140 problem

A modern trap beat at 70 BPM and a modern trap beat at 140 BPM can sound identical. Same kick on 1, same snare on 3, same 32nd-note hat rolls. The producer wrote it at 140 because that's the BPM tag the file ships with; the vocalist counts it at 70 because the snare lands every two seconds, not every one second. Both are right.

Our BPM detector locks to whichever pulse is loudest in the energy envelope. Sometimes that's the kick-and-hat grid (140); sometimes it's the snare-pulse cycle (70). Neither read is the "true" tempo because there isn't one. So we shipped a toggle:

// On a fresh detection: bpmMultiplier = 1
// User taps ÷2 → bpmMultiplier = 0.5 → display Math.round(rawBpm * 0.5)
// User taps ×2 → bpmMultiplier = 2.0 → display Math.round(rawBpm * 2.0)
// Same audio analysis, three reads.

Re-analysis is free because the audio doesn't change — we just paint a different number. The half-time and double-time facts we already showed in the result row track the displayed value, so the share text stays internally consistent.

The straight/swung problem

A 16-step drum pattern at 90 BPM lands its snare hits at the same wall-clock times whether you read it as straight 16ths or swung 16ths — but only the off-beat 16ths actually move when you "flip the read." Straight: each off-beat lands halfway between on-beats. Swung at 66%: each off-beat lands two-thirds of the way through the pair, the triplet shuffle.

The same pattern. The same notes. Two different feels. The drum sandbox's Swing row is a single number — 50, 54, 58, 66 — that flips between them. The scheduler reshapes the gap between consecutive steps:

// After an on-beat 16th, gap to the next off-beat = (1 + s) * stepDur
// After an off-beat 16th, gap to the next on-beat  = (1 - s) * stepDur
// where s = (swingPct - 50) / 48 ∈ [0, 1/3]
// Pairs sum to 2 * stepDur so bar length is invariant.

Bar length stays fixed, so the chord progression overlay and bassline overlay (which anchor on steps 0 and 8) don't drift. The user moves a slider; everything else stays where it was.

The triad/7th problem

Cmaj is three notes (C-E-G). Cmaj7 is four (C-E-G-B). Same tonal center. Same root. The B is the question — it's one semitone below the next octave's C, a gentle dissonance that pulls toward resolution. Add it and the chord sounds dreamy and unresolved; remove it and the chord sounds plain and stable.

For a writer in C major, the question "should this verse be on C or Cmaj7?" is the same question as "should the BPM read 70 or 140?" — pick the read that gives the song the feel you want. The piano roll's diatonic palette has a Triads / 7ths toggle. Same key, same buttons, different flavor.

Each toggle is the same idea: same audio, two right answers. The UI's job is to let you switch in one tap.

The half-time/straight problem

Half-time and straight-time at the same BPM share kick and hi-hat patterns; only the snare moves. In straight-time the snare lands on beats 2 and 4 — the classic backbeat. In half-time it lands only on beat 3, doubling the perceived bar length. A vocalist gets twice the real estate per measure in half-time, which is why modern trap, drill, and DnB all live there.

The drum sandbox doesn't have an explicit half-time toggle — it has snare placement on the grid, which is the same lever. Move the snare from step 5 (beat 2) to step 9 (beat 3) and you've flipped the read.

Four toggles, one idea

÷2

BPM read

66%

Swing

7ths

Diatonic

3↔5

Snare

The pattern keeps showing up because audio is rotation-invariant in time and harmonic-invariant in pitch. A 4-bar loop at 140 is identical to a 2-bar loop at 70, looped twice. A C-E-G chord is identical to a G-C-E chord, voiced differently. A 16-step grid at swing 50% is identical to the same grid at swing 66% played with a different reference clock.

Each toggle in the audio toolkit picks one of these ambiguities and lets the user resolve it explicitly. Without the toggle, the UI is making the choice for you — locking to the loudest pulse, defaulting to triads, assuming straight 16ths. That's fine 90% of the time. The 10% where it's wrong used to require redoing the analysis. Now it's a tap.

The build cost

Each of these toggles took less than 50 lines because the underlying analysis didn't change — we just added a multiplier to the rendering layer:

BPM ÷2/×1/×2: cache lastResult, paint round(rawBpm × multiplier). ~30 lines.
Swing 50/54/58/66: split scheduler step into pair halves with asymmetric gaps. ~25 lines.
Triads / 7ths: swap the chord-type lookup table at the diatonic-palette layer. ~20 lines.
Subdivision Quarter / 8ths / 16ths / Triplet on the metronome: same pair-pattern as swing but symmetric. ~30 lines.

Compounding observation: the more toggles you add, the cheaper the next one gets. Each one teaches the codebase one more way to think about its own data. By the time triads/7ths shipped, the diatonic palette already accepted a chord-type table; the toggle was a one-line table swap.

Try the four toggles

Same audio. Different reads. One tap each.

Open the audio toolkit →

What this isn't

"Flip the read" isn't a substitute for analysis — the detector still has to lock to a pulse to give you a starting BPM, the chord matcher still has to identify the chord. The toggle is what comes next: now that you have the answer, here's a one-tap way to ask "what about the other read?"

And it isn't a fix for genuinely ambiguous audio. If the source track is so dense that the detector can't find a clear pulse, the toggle gives you three wrong answers instead of one. We surface confidence on the BPM detector for that reason — when confidence is low, none of ÷2, ×1, ×2 is going to be right.

What's next

The toggle pattern's hidden ceiling is roughly four options — more than that and pills become a dropdown, which is a worse UI. So the next surface to flip is the chord identifier's enharmonic spelling: the same pitch class is C# or Db depending on key. We don't toggle that yet; you get C# in sharp keys and Db in flat keys. Probably the next ~20 lines.