Best n8n Nodes for Proxy-Based Data Collection in 2026

The complete guide to the best n8n nodes for proxy-based data collection in 2026 — HTTP Request, Code, Browserless, plus proxies, workflows, and tips.

Lokesh Kapoor
May 26, 2026
11 min read

n8n crossed 60,000 GitHub stars in 2025 and now powers more than 100,000 active automation workflows running daily — a meaningful chunk of which are proxy-routed scraping pipelines that quietly replace custom Python jobs. The visual editor, native HTTP proxy support, and self-hostable footprint make n8n a near-perfect fit for teams that need reliable data collection without a full engineering build.

The catch is knowing which nodes actually matter. n8n ships with 400+ integrations, but only a handful of them are load-bearing for proxy-based scraping. Pick the wrong ones and your workflow stalls at scale; pick the right combination and you have a production-grade ingestion pipeline you can hand off to a non-engineer to maintain.

This guide walks through the 7 best n8n nodes for proxy-based data collection in 2026, the proxy providers that pair cleanest with them, code-level setup examples, and the mistakes that cost beginners weeks of debugging. If you are new to n8n entirely, start with our n8n review for 2026 first.

Why Use n8n for Proxy-Based Data Collection?

Three reasons n8n keeps winning the proxy-scraping use case in 2026: it speaks HTTP natively, it stores credentials securely, and it composes complex flows visually without sacrificing code-level control. Most scraping projects start as a single script and end as a tangled web of cron jobs, error logs, and Slack alerts. n8n collapses that mess into one canvas.

You also get free production-grade infrastructure when you self-host: encrypted credential storage, automatic retries, queue-mode for high throughput, and a built-in execution log that beats most homemade observability stacks. For teams running 10–500 scraping workflows, n8n delivers the maintenance savings normally reserved for managed scraper APIs at zero per-execution cost.

The trade-off is throughput — for million-page-per-day workloads, a custom Python pipeline still wins on raw concurrency. But for the vast middle ground of business-critical scraping (price monitoring, lead enrichment, SEO research, competitor tracking), n8n is the right tool.

How n8n Handles HTTP Proxies (Setup Reference)

Every proxy-based n8n workflow starts with the same configuration: a proxy URL formatted as http://USER:PASS@host:port, dropped into the HTTP Request node Options panel. n8n forwards every outbound request through that URL, including any nested redirects, with timeouts and retries you control.

For production setups, store the proxy credentials in n8n encrypted credential store rather than hard-coding them. Reference them in the HTTP Request node URL via expressions, which keeps secrets out of exported workflow JSON.

{
  "url": "https://target.com/products",
  "method": "GET",
  "options": {
    "proxy": "http://USER:PASS@gate.smartproxy.com:7000",
    "timeout": 15000,
    "redirect": { "redirect": { "followRedirects": true } }
  }
}

This single configuration unlocks 90% of proxy-based scraping in n8n. The remaining 10% — rotation, sticky sessions, hybrid keyword targeting — comes from pairing the HTTP Request node with the Code node, covered below.

The 7 Best n8n Nodes for Proxy-Based Data Collection

Below are the seven nodes that show up in virtually every production-grade n8n scraping pipeline. Master these and you can build almost any data collection workflow without leaving the visual editor.

1. HTTP Request Node

The workhorse. The HTTP Request node sends authenticated requests with full proxy support, custom headers, query strings, body payloads, and built-in retry policies. It is the first node in 95% of scraping workflows and the only node many beginners need for static-HTML targets.

Best practice: always set a timeout (15s is sensible), enable followRedirects, and configure "Continue On Fail" so a single 503 does not blow up the whole batch. For dynamic header rotation, pass headers from a preceding Set or Code node via expressions.

2. HTML Extract Node

Sits directly after HTTP Request and parses the response HTML using CSS selectors. You define a list of fields (title, price, rating) and the selector for each, and the node outputs a clean JSON object per item. No regex, no DOM walking — just declarative selectors that ship in under a minute.

For pages with repeating elements (search results, product grids), enable "Return Array" mode so the node emits one item per matched element. Pair with Split In Batches downstream for clean per-item processing.

3. Code (JavaScript) Node

The escape hatch. When built-in nodes cannot express the logic you need — proxy rotation, deterministic session keys, custom retry algorithms — the Code node lets you drop into JavaScript with full access to incoming items, workflow context, and the n8n credential store.

// Deterministic proxy rotation by execution ID
const proxies = [
  "http://user:pass@gate.smartproxy.com:7001",
  "http://user:pass@gate.smartproxy.com:7002",
  "http://user:pass@gate.smartproxy.com:7003",
];
const seed = $execution.id.split("").reduce((a, c) => a + c.charCodeAt(0), 0);
const proxy = proxies[seed % proxies.length];
return [{ json: { proxy } }];

4. Browserless Community Node

For JavaScript-heavy sites (SPAs, React apps, lazy-loaded grids), the Browserless community node renders pages in real headless Chrome and returns fully hydrated HTML. It accepts a proxyServer parameter so your residential or ISP proxies route through the headless browser cleanly — equivalent to Playwright-with-proxy in five clicks.

Self-host the Browserless container alongside n8n on Docker for zero per-render cost, or use Browserless.io cloud when you need elastic scale during traffic spikes.

5. Schedule Trigger Node

Cron-style scheduling that fires a workflow on intervals (every 5 minutes), specific times (3 AM daily), or complex cron expressions. The right node for periodic price refreshes, daily SERP scrapes, hourly stock checks, and any recurring data pull.

Pair it with the Wait node to space out requests inside a single execution — critical for politeness and avoiding rate limits on target sites.

6. Split In Batches Node

When your workflow processes thousands of URLs, Split In Batches breaks them into manageable chunks (typically 50–200 per batch) and loops the rest of the workflow over each chunk. Crucial for controlling concurrency and respecting proxy provider rate limits without writing manual loop logic.

For controlled parallelism, set batch size to match your proxy concurrency limit (most residential gateways cap at 100–500 connections). Anything more triggers 429 responses and burns credits.

7. Postgres / Google Sheets Output Node

The terminal node. Postgres for production pipelines (typed columns, indexes, full SQL queries downstream), Google Sheets for quick prototypes and stakeholder-facing dashboards. n8n upserts records cleanly with deterministic IDs so re-running the workflow replaces rather than duplicates.

For high-volume scrapes, prefer Postgres — Sheets caps out around 5M cells per spreadsheet, which sounds like a lot until you scrape 50,000 products with 100 fields each.

Best Proxy Providers to Pair with n8n

n8n is proxy-agnostic — any HTTP/HTTPS proxy URL works in the HTTP Request node. But not every provider plays equally well with workflow automation. The five below are the cleanest fits for n8n in 2026, chosen for ease of authentication, sticky session support, and Python/Node-friendly proxy URLs.

1. BrightData

Loading Proxy...

BrightData's 72M+ residential IPs across 195 countries and its Web Unlocker API are the gold-standard pairing for n8n when your targets sit behind Cloudflare or PerimeterX. The unlocker handles fingerprinting and CAPTCHA bypass server-side, returning clean HTML you can pipe straight into the HTML Extract node.

Setup is a single proxy URL in the HTTP Request node, or a Bearer token plus URL for the Web Unlocker variant. Audit logs and SOC 2 compliance make it the safe choice for enterprise data pipelines orchestrated in n8n.

2. Decodo

Loading Proxy...

Decodo (formerly Smartproxy) is the developer-friendly value pick for n8n users in 2026. With 115M+ IPs across 195 countries and 99.99% uptime, it pairs enterprise-grade infrastructure with plans starting around $30/month — a rare combination that lets indie teams ship serious workflows without enterprise commitments.

The proxy URL format is the simplest of any provider — one URL with embedded auth works across every n8n node. Sticky session support is configured via the username (e.g. user-sticky-session-abc), which n8n passes cleanly through expressions. Great for multi-step workflows that need a stable exit IP across login, search, and scrape steps.

3. NodeMaven

Loading Proxy...

NodeMaven offers the longest sticky sessions on the market — up to 24 hours of the same exit IP. For n8n workflows that walk multi-step funnels (log in → search → paginate → scrape detail pages) under a single execution, that stability eliminates the half-completed runs that pollute downstream data.

The filter-first network screens out flagged IPs before serving them, so success rates on tough targets (social media, ticketing, sneaker sites) are noticeably higher than rotating-only peers. Pricing is mid-market and pairs well with workflow-heavy use cases.

4. Webshare

Loading Proxy...

Webshare runs 10M+ rotating proxies at the lowest per-GB price on this list, making it the right choice for high-volume scraping where datacenter or lightly-rotated residential IPs suffice. Free tier (10 proxies, 1GB/month) is enough to validate a full n8n workflow end-to-end before paying.

The provider supplies a downloadable proxy list compatible with n8n's Code node — drop the list into a Set node, rotate via expression, done. No bespoke authentication or session tokens to wrangle in workflow JSON.

5. Geonode

Loading Proxy...

Geonode is the unlimited-bandwidth pick for n8n teams running heavy concurrent scraping. With 30M+ residential IPs across 190 countries and pricing structured around concurrent threads instead of per-GB metering, it removes the cost anxiety that limits how aggressively you can crank up Split In Batches concurrency.

Geonode also ships a clean dashboard for monitoring per-target success rates, which surfaces n8n workflow regressions before they pollute your warehouse. The unlimited-bandwidth model is particularly well suited to high-volume e-commerce, classifieds, and SERP scrapes where you would otherwise burn through residential GB quotas in days.

Common Mistakes Beginners Make with n8n Scraping Workflows

Forgetting to Set HTTP Request Timeouts

The HTTP Request node defaults to no timeout, so a hung target site stalls your entire workflow indefinitely — sometimes for hours before n8n marks the execution as failed. Always set a 15-second timeout in the Options panel, and pair it with "Continue On Fail" so the workflow moves on instead of dying. This single change prevents the most common production incident: a single slow target taking down an entire nightly batch.

Using One Proxy URL Across All Workflow Runs

A static proxy URL in the HTTP Request node means every execution exits through the same IP. Even premium residential gateways flag this pattern within hours. Either use a rotating-gateway proxy (BrightData, Smartproxy) that rotates per request automatically, or build proxy rotation in a Code node and pass the selected proxy into the HTTP Request node via an expression. Static IPs only make sense for whitelisted private endpoints.

Ignoring Rate Limits on the Schedule Trigger

Setting a Schedule Trigger to run every minute against a single target site is the fastest way to get blocked, regardless of proxy quality. Most sites tolerate one request every 3–10 seconds per IP before flagging it. Use Split In Batches plus a Wait node to space requests inside a single execution, and stagger Schedule Triggers across workflows so they do not all fire at exactly :00 of every hour.

Storing Proxy Credentials in Workflow JSON

Pasting USER:PASS directly into a Proxy field exports them in plain text whenever you back up or share a workflow. Always use n8n encrypted credential store: create a credential of type "Generic Credential," reference it in the proxy URL via expressions, and the secret never appears in workflow JSON. This also makes credential rotation trivial — update once in the credential store and every workflow picks it up instantly.

Tips for Production-Grade n8n Scraping Workflows

  • Tag executions with metadata. Pass workflow_id and run_id into your output rows so you can trace any bad data back to the exact execution that produced it. Invaluable when debugging silent failures weeks later.
  • Run n8n in queue mode. For workflows that exceed five minutes or 100 concurrent executions, switch to queue mode with Redis. Single-process n8n is a bottleneck above that threshold and silently drops executions under load.
  • Self-host behind a VPN. For sensitive scraping (regulated industries, brand-protection), pin your n8n instance behind a Tailscale or WireGuard network so the only egress is through your proxy provider.
  • Version workflows in git. Export n8n workflow JSON and commit it. The credential store keeps secrets safe, and git history lets you roll back a broken selector change in seconds.
  • Monitor execution failure rates. Pipe n8n metrics into Grafana or a simple Slack alert. Block rate creeping above 5% is an early signal your proxy needs a refresh.

Frequently Asked Questions

n8n is an open-source workflow automation tool that lets you connect 400+ services with a visual editor — think Zapier but self-hostable and code-friendly. For web scraping, n8n shines because you can chain HTTP requests, HTML parsing, custom JavaScript, and database writes without writing a full app. Self-hosting is free, and the visual editor makes complex pipelines maintainable by non-developers — which is why it crossed 60K GitHub stars in 2025.
Yes — n8n’s HTTP Request node supports proxies natively. Set the proxy URL in the Options panel and rotation happens automatically when your provider rotates IPs per request (e.g. BrightData rotating residential gateway). For per-execution control, use a Code node to pick a proxy from an array and pass it into the HTTP Request node via expressions. Both approaches work cleanly inside the standard n8n workflow editor without any plugin installs.
Yes — n8n is licensed under the Sustainable Use License and is free to self-host for personal and internal-business use. You only pay for the n8n Cloud hosting tier (starting at $20/month) or commercial embedding. For developers running scraping workflows on their own VPS or Docker host, n8n is fully free with unlimited workflows, executions, and integrations — a strong reason it has become the go-to alternative to Zapier and Make for technical teams.
Open the HTTP Request node, expand Options, and paste your proxy URL in the format http://USER:PASS@host:port directly into the Proxy field. For better security, store the credentials in n8n’s encrypted credential store and reference them via expressions like ={{ $credentials.smartproxyUser }}:{{ $credentials.smartproxyPass }}@gate.smartproxy.com:7000. This keeps secrets out of workflow JSON when you export or version your flows in git.
Yes — install the Browserless community node and point it at a self-hosted Browserless container or Browserless.io cloud. The node sends a target URL and returns the fully rendered HTML, which you then pipe into HTML Extract or your Code node. Pass your proxy URL into Browserless via the proxyServer parameter so all browser traffic exits through your residential or ISP proxy pool — the same pattern as Playwright with a proxy.
It depends on volume and maintenance. For under 10–20 distinct scraping targets with simple HTML extraction, n8n is dramatically faster to build and ship — visual debugging and built-in retries beat writing a full async script. For million-page scrapes or complex parsing logic, raw Python with libraries like httpx or Scrapy gives you finer control over concurrency and memory. Many teams use both: n8n for orchestration, Python for hot-path scrapers.
Two patterns work well. First, use a rotating-gateway proxy (BrightData, Smartproxy, Webshare) where the provider rotates IPs per request automatically — you only need one URL. Second, store a list of static proxies in a Set or Code node, hash the workflow execution ID, and pick a proxy deterministically. The deterministic approach keeps the same proxy for a multi-step workflow run, which matters when the target site uses session cookies.
Yes — n8n’s HTTP Request node is agnostic to proxy type. Residential, datacenter, ISP, and mobile proxies all work as long as the provider gives you an HTTP/HTTPS proxy URL. For tough anti-bot targets like Amazon or Cloudflare-protected sites, residential proxies from BrightData, Smartproxy, or NodeMaven dramatically outperform datacenter IPs. Pair them with sticky sessions in n8n by passing a session ID parameter in the proxy URL or username.
Both. n8n Cloud is the official managed hosting tier starting at $20/month — zero setup and automatic updates. Self-hosted runs anywhere Docker runs (DigitalOcean droplets, AWS EC2, your home server) and is completely free. For proxy-based scraping with sensitive data, self-hosting is usually the better choice — workflows never leave your network, and you can attach the n8n instance to a dedicated VPN for additional egress control.

Conclusion: Build Your First Proxy-Based n8n Workflow Today

n8n has earned its place as the default workflow tool for proxy-based data collection in 2026. The combination of native HTTP proxy support, visual debugging, encrypted credential storage, and the seven nodes covered above is enough to replace 90% of custom scraping scripts — at a fraction of the maintenance cost. Start with HTTP Request, layer in HTML Extract for parsing, and reach for the Code node when you need rotation or custom logic.

Pair your workflows with the right proxy provider — BrightData and NodeMaven for tough targets, Decodo for value, Geonode for unlimited bandwidth, Webshare for high-volume datacenter work — and you have a production-grade ingestion pipeline you can hand off to non-engineers. Add monitoring, version-controlled workflows, and self-hosting behind a VPN, and you are running enterprise-grade data collection at indie-friendly cost.

Ready to ship your first workflow? Browse our residential proxy directory for the perfect pairing, or read our guide to scaling web scraping in 2026 for the next layer of the stack.