Engineering · April 30, 2026 · 11 min read

Building the StudioMode scraper: 30 days, 3,000 beats, 10,000 quota units

StudioMode's catalog isn't licensed. We don't host audio. Every beat surfaces from a YouTube scrape that runs twice a day on a 10,000-unit daily quota. This is the story of getting from "10 beats indexed" to "3,000 beats deduplicated and tagged" — including the night the catalog ceiling became a real ceiling.

3,300+

Beats indexed

1,700+

Producers

2×

Daily runs

50×

v3 efficiency

The constraint: 10,000 quota units a day

The YouTube Data API v3 gives every project 10,000 quota units per day. A search query costs 100. A video lookup costs 1. A playlist query costs 1. Channel lookup, 1. Search is the killer. Every "free type beat" search burns 1% of your daily budget.

v1 of the scraper was naive: 50 search queries per run, 100 quota units each = 5,000 units a run. We could only run twice and we were already at the limit. Worse, half the results were duplicates from previous runs because YouTube reorders results endlessly.

The 3.3k ceiling

For about a week, the catalog stuck at 3,300 beats no matter how many times we ran the scraper. The graphs were flat. New beats were getting added but the count didn't move. That's when we realized the dedup logic was running after the search, meaning every duplicate result still cost a quota unit on the followup video lookup.

We were spending 10,000 units a day to insert maybe 20 net new beats. The other 99% of the budget was burned on confirming "yep, this is the same beat we saved yesterday." The scraper was running but the catalog was a closed system.

We were spending 10,000 units a day to insert maybe 20 net new beats.

v3: bulk-dedup-first

The fix was structural. v3 inverted the order:

Search returns video IDs only — already cheap.
Bulk-dedup against the existing catalog — single Postgres query like select id from beats where youtube_id = any($1).
Only video-lookup the truly-new IDs — saving 1 quota unit per duplicate skipped.

This was the 50× efficiency unlock. From "every search costs 100 + 50 followups" to "every search costs 100 + maybe 5 followups." The catalog growth chart finally bent the right way.

Title-mining: the second multiplier

A separate signal that paid off: producers tag their YouTube uploads with formulaic titles. "Free Lil Tjay type beat 'Heaven' [PROD. RIO]". The artist name + word "type beat" + maybe a song name in quotes.

We added a title-mining step that extracts artist names from existing catalog titles and queues them as future search terms. Net effect: the scraper teaches itself which artists are getting tagged. After a week of mining, our search-term list grew from 60 hand-picked artists to 280+ self-discovered ones, and the catalog growth rate doubled again.

Two API keys, alternating

Each Google Cloud project gets one quota allocation. Two projects = two quotas. We added a second YouTube API key and a key-rotation module that randomly picks one per request. Daily ceiling jumped from 10,000 to 20,000 units. Twice the scrape, same shape.

This is technically against the spirit of the rate limit (it's also against nothing in the ToS — multiple projects per dev account is explicitly allowed). The right answer for production scale is to apply for a quota increase, but for an early-stage indie project the two-key trick is the path of least resistance.

The followed-channel switch

The biggest single optimization came from realizing that for channels we already follow — the producers we know we want to track — the playlistItems endpoint is dramatically cheaper than search: 1 unit per page of 50 videos, vs. 100 for one search query that returns the same data.

Switching followed-producer scraping from search to playlistItems(uploads_playlist) was a 50× efficiency win for that workload. We went from "follow 5 producers per scrape" to "follow 250 producers per scrape" overnight, with the same quota cost.

What we'd do differently

Three things, in order of "most regret":

Build the dedup table from day one. We had a unique constraint on youtube_id, but no first-class "we already saw this" lookup. Adding it later cost us a week of stuck-catalog frustration.
Track quota-spent-per-result-inserted. The metric "beats per quota unit" is the only one that matters for a scraper. We didn't have it for the first three weeks. Once we did, every optimization was obvious.
Mine titles earlier. Hand-picking 60 artists felt thorough. It wasn't. Producers tag against far more artists than we curated, and the title-mining step would have caught all of them on day one.

The catalog now

As of today the scraper runs at 14:00 and 02:00 UTC, hits both API keys in rotation, processes ~5,000 candidate video IDs per run, dedups them against the catalog, and adds ~150-300 net new beats per day. Quota usage sits comfortably below the daily cap, leaving headroom for ad-hoc workflow_dispatch runs when we want to backfill a producer or genre.

The catalog is no longer the bottleneck. The next problem to solve is enrichment — getting BPM and key on every beat without scraping individual marketplace pages.

What's next

v4 is going to be a full-text-trigram search on YouTube descriptions instead of titles, which will surface beats whose titles are too generic to match our search terms but whose descriptions are rich. Plus a Whisper-based audio transcript step for beats that have no metadata at all. The latter is expensive but produces structured output for free.

Try StudioMode

14-day Pro trial — 3,300+ beats, growing daily.

🎧 Open StudioMode