OBrain Sovereign Engine
Architecture & CoreDecisions

0001 hybrid scraper architecture

ADR 0001: Hybrid Scraper Architecture

Status

ACCEPTED

Context

We need to scrape e-commerce data (CJ, AliExpress) reliably.

  • Problem 1: CJ Dropshipping has aggressive WAF/shield (Cloudflare) that blocks standard datacenter IPs.
  • Problem 2: Serverless (Cloudflare Workers) cannot hold long-lived connections for headless browsers easily/cheaply.
  • Problem 3: 3rd party APIs (ZenRows) are expensive for high volume.

Decision

We implement a Hybrid Governed Architecture:

  1. Primary: Oracle Cloud VPS (Free Tier) running a Python/Playwright server (deploy/vps-scraper).
    • Why: Real browser fingerprint, resident IP, full control.
  2. Fallback: ZenRows (Serverless API).
    • Why: Reliable backup if VPS is detected/down.
  3. Governor: ScraperGovernor (Proxy Pattern) in Backend.
    • Logic: Circuit Breaker opens after 5 VPS failures -> Failover to ZenRows -> Penalty of 20% on confidence score.

Consequences

  • Positive: High reliability, low cost (Free VPS), audit trail of source confidence.
  • Negative: Operational overhead of managing a VPS (SSH, updates, reboot).

On this page