Building a Reliable CAPTCHA-Solving Pipeline: Error Handling, Retries, and Circuit Breakers
Your scraper solves CAPTCHAs fine in dev. Then you deploy it, scale to 1000 pages/hour, and everything falls apart — timeouts, expired tokens, wrong CAPTCHA types, rate limits. The difference betwe...

Source: DEV Community
Your scraper solves CAPTCHAs fine in dev. Then you deploy it, scale to 1000 pages/hour, and everything falls apart — timeouts, expired tokens, wrong CAPTCHA types, rate limits. The difference between a hobby scraper and a production one isn't the happy path. It's how you handle failures. Let's build a resilient CAPTCHA-solving pipeline step by step. The Problem: Naive Solve-and-Submit Most tutorials show you this: token = solve_captcha(sitekey, url) submit_form(token) This works until it doesn't. In production, you'll hit: Timeouts — solver takes too long Expired tokens — you solved it but submitted too late Wrong type detection — you sent "recaptcha_v2" but it's actually "recaptcha_v3" Rate limits — too many concurrent solves Service outages — the solving API is temporarily down Step 1: Classify Your Errors Not all errors deserve the same response. Group them: from enum import Enum class CaptchaErrorType(Enum): TRANSIENT = "transient" # Retry immediately RATE_LIMITED = "rate_limited"