Web Scraping with Selenium: Best Practices & Tutorial

A complete Selenium web scraping tutorial plus the best practices that make scrapers reliable: explicit waits, retries, stealth, proxies, and scaling with Selenium Grid.

ProxyHorizon Team
May 30, 2026
18 min read

If a website renders its content with JavaScript, hides data behind a login, or only reveals prices after you click a button, plain HTTP requests will leave you staring at an empty page. This is exactly where Selenium earns its keep: it drives a real browser, so it sees the web the same way a human does.

Selenium has been the workhorse of browser automation since 2004. It powers QA suites at most of the Fortune 500, and for web scraping it remains a dependable choice when you need to render JavaScript, interact with pages, and handle complex flows. But Selenium is also famous for being flaky in the wrong hands — brittle selectors, bad waits, and clumsy bot patterns that get you blocked within minutes.

This guide fixes that. We will go from a clean install to a complete, working scraper, then layer on the best practices that separate a hobby script from a production-grade pipeline: proper waits, retries, stealth, proxies, and scaling. Every concept comes with runnable Python code. Let us build something reliable.

When Should You Actually Use Selenium?

Selenium is powerful but heavy — it launches a full browser, which costs memory and time. Reach for it only when you genuinely need a browser:

  • Use Selenium when the page renders content with JavaScript, requires clicks, scrolling, logins, or form submissions, or when you need to screenshot the rendered result.
  • Skip Selenium when the data is in the raw HTML or behind a clean API. For static pages, requests plus BeautifulSoup is ten times faster and far lighter.

A quick comparison of the common Python options:

ToolRenders JavaScript?SpeedBest for
requests + BeautifulSoupNoVery fastStatic HTML, APIs
SeleniumYesSlowerInteractive, JS-heavy sites
PlaywrightYesFastModern JS sites, async at scale
ScrapyNo (without add-ons)Very fastLarge structured crawls

The rule of thumb: try the page with requests first. If the data you want is missing from the raw response, it is being rendered client-side, and that is your cue to bring in Selenium.

Setting Up Your Environment

Modern Selenium (4.6 and later) ships with Selenium Manager, which automatically downloads and matches the right browser driver for you. The days of manually juggling ChromeDriver versions are over.

Bash
# Install Selenium into a virtual environment
python -m venv venv
source venv/bin/activate      # on Windows: venv\Scripts\activate
pip install selenium

Here is the smallest scraper that proves everything works. It opens a page and prints every quote on it:

Python
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()        # Selenium Manager resolves the driver
driver.get("https://quotes.toscrape.com")

for quote in driver.find_elements(By.CSS_SELECTOR, "span.text"):
    print(quote.text)

driver.quit()                      # always close the browser

Run it. If you see a list of quotes, you are ready to build something real.

Locating Elements the Right Way

Everything in Selenium scraping comes down to finding elements reliably. Selenium offers several locator strategies through the By class:

StrategyExampleWhen to use
By.IDBy.ID, "search"Fastest and most stable when an ID exists
By.CSS_SELECTORBy.CSS_SELECTOR, "div.quote span.text"The everyday workhorse; flexible and fast
By.XPATHBy.XPATH, "//div[@class=quote]"Complex relationships or selecting by text
By.CLASS_NAMEBy.CLASS_NAME, "author"Simple single-class lookups

Best practice: prefer IDs and CSS selectors over long, brittle XPath chains. An absolute XPath like /html/body/div[2]/div[3]/span breaks the instant the layout shifts. A CSS selector tied to a meaningful class survives redesigns far better. Use find_element for a single match and find_elements (which returns a list, never an error) when you expect many.

Waiting Properly: The Number One Skill in Selenium

If you remember one thing from this guide, make it this. The overwhelming majority of flaky Selenium scrapers fail because they try to read an element before the page has finished rendering it. The fix is explicit waits.

Implicit vs explicit waits

An implicit wait sets a global timeout for finding elements. It is a blunt instrument. An explicit wait pauses until a specific condition is true, then continues immediately — precise and efficient. Always prefer explicit waits.

Python
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

wait = WebDriverWait(driver, 10)   # wait up to 10 seconds

# Pause until the element is actually present, then grab it
first_quote = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "div.quote span.text"))
)
print(first_quote.text)

Common expected conditions you will reach for constantly: presence_of_element_located, visibility_of_element_located, element_to_be_clickable, and text_to_be_present_in_element.

Why you should almost never use time.sleep

A hard-coded time.sleep(5) is the mark of a beginner. It either wastes time (the page loaded in 0.4 seconds) or fails anyway (the page took 6). Explicit waits adapt to reality — they return the moment the condition is met and only time out when something is genuinely wrong. There is exactly one place sleep is acceptable: pacing your requests politely, which we will cover under best practices.

A Complete Tutorial: Scrape a Dynamic Site Step by Step

Let us scrape a JavaScript-rendered site end to end. We will target the JS version of Quotes to Scrape, which builds its content client-side — the perfect Selenium use case. Our goal: collect every quote, its author, and its tags across all pages, then save the lot to CSV.

Python
import csv
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Step 1 - configure a headless browser
options = Options()
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 10)

results = []
driver.get("https://quotes.toscrape.com/js/")

# Step 2 - loop through every page
while True:
    # wait until the JS-rendered quotes actually appear
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.quote")))

    # Step 3 - extract structured data from each card
    for card in driver.find_elements(By.CSS_SELECTOR, "div.quote"):
        results.append({
            "text": card.find_element(By.CSS_SELECTOR, "span.text").text,
            "author": card.find_element(By.CSS_SELECTOR, "small.author").text,
            "tags": ", ".join(t.text for t in card.find_elements(By.CSS_SELECTOR, "a.tag")),
        })

    # Step 4 - follow the "Next" button, or stop if there is none
    next_link = driver.find_elements(By.CSS_SELECTOR, "li.next a")
    if not next_link:
        break
    next_link[0].click()

driver.quit()

# Step 5 - persist the data
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["text", "author", "tags"])
    writer.writeheader()
    writer.writerows(results)

print(f"Scraped {len(results)} quotes across all pages")

Notice the shape of this scraper: configure once, wait for content, extract into clean dictionaries, paginate by following the real Next button rather than guessing URLs, and write structured output at the end. That structure scales to far more complex sites with very few changes.

Handling JavaScript, Infinite Scroll, and Pop-ups

Real sites throw curveballs. Three patterns cover most of them.

Infinite scroll — many feeds load more content as you scroll. Trigger it with JavaScript and stop when the page height stops growing:

Python
import time

last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)   # give new content a moment to load
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break       # reached the bottom
    last_height = new_height

Cookie banners and pop-ups — find and dismiss them, but do not crash if they are absent. Using find_elements (plural) returns an empty list instead of raising, which keeps your scraper resilient:

Python
accept = driver.find_elements(By.CSS_SELECTOR, "button#accept-cookies")
if accept:
    accept[0].click()

Reading values from JavaScript — when data lives in a JS variable or a tricky widget, you can execute script directly and return the result into Python with driver.execute_script("return window.someData").

Best Practices for Reliable, Polite Scraping

This is the section that turns a script you babysit into a pipeline you trust. Adopt these and your success rate climbs dramatically.

  • Always use explicit waits. Never assume an element is ready. This single habit eliminates most flakiness.
  • Wrap risky actions in retries with backoff. Networks blip and elements go stale. A retry layer turns transient failures into non-events.
  • Run headless and trim the fat. Disable images and unnecessary features when you only need data — it is faster and cheaper.
  • Rate-limit yourself. Add a short, slightly randomized delay between requests. Hammering a server is rude, suspicious, and gets you blocked.
  • Respect robots.txt and the terms of service. Scrape public data responsibly and never overload a site.
  • Centralize selectors. Keep your CSS selectors in one place (a Page Object) so a site redesign is a one-line fix, not a rewrite.
  • Always quit the driver. Orphaned browser processes leak memory fast. Use try/finally or a context manager.

A simple, reusable retry helper handles the two most common transient exceptions:

Python
import time
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException

def with_retries(action, attempts=3, base_delay=2):
    for i in range(attempts):
        try:
            return action()
        except (TimeoutException, StaleElementReferenceException):
            if i == attempts - 1:
                raise                      # out of retries, surface the error
            time.sleep(base_delay * (i + 1))   # exponential backoff

Avoiding Bot Detection

Default Selenium is easy to spot. It announces itself through the navigator.webdriver flag, uses an automation-flavored Chrome profile, and behaves with inhuman precision. If your scraper suddenly hits CAPTCHAs or blocks, detection is why. Your toolkit:

  • Hide the automation flags. A few launch arguments remove the most obvious tells.
  • Use undetected-chromedriver. A drop-in replacement patched to bypass common anti-bot systems like Cloudflare.
  • Behave like a human. Randomize delays, scroll before clicking, and avoid scraping a hundred pages in ten seconds.
  • Rotate proxies and user agents. The single biggest factor in staying unblocked at scale, covered next.
Python
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)

When that is not enough, switch to undetected-chromedriver:

Python
# pip install undetected-chromedriver
import undetected_chromedriver as uc

driver = uc.Chrome(headless=False)
driver.get("https://example.com")
print(driver.title)
driver.quit()

Using Proxies with Selenium

At any real volume, scraping from a single IP gets that IP rate-limited or banned. Rotating residential or datacenter proxies spread your requests across many IPs so no single address looks suspicious. Because authenticated proxies (username and password) are awkward in vanilla Selenium, the cleanest approach is selenium-wire, which supports proxy auth directly:

Python
# pip install selenium-wire
from seleniumwire import webdriver

proxy = "http://USER:PASS@gate.your-proxy.com:7000"
seleniumwire_options = {
    "proxy": {
        "http": proxy,
        "https": proxy,
        "no_proxy": "localhost,127.0.0.1",
    }
}

driver = webdriver.Chrome(seleniumwire_options=seleniumwire_options)
driver.get("https://httpbin.org/ip")
print(driver.find_element("tag name", "body").text)   # confirms the proxy IP
driver.quit()

Proxy quality matters more than almost anything else for staying unblocked. These are the providers we rate most highly for scraping — compare pool size, locations, and pricing:

Pool:115M+
Uptime:99.99%
Latency:0.6s
Countries:195+
Huge IP Pool
User Friendly
Pay As You Go
Pool:102M+
Uptime:99.99%
Latency:0.6s
Countries:195+
Massive 102M+ IP Pool
Ethically Sourced & Compliant
AI-Powered Web Unblocker
Dedicated Account Manager
Advanced ASN & City Targeting
Pool:32M+
Uptime:99.9%
Latency:0.8s
Countries:195+
Traffic Never Expires
Pay As You Go
Ethical Sourcing
Pool:10M+
Uptime:99.97%
Latency:1.0s
Countries:50+
Extremely Cheap
Free Tier Available
Customizable

For the full lineup, browse our proxy provider directory and filter by use case, network type, and budget.

Scaling Up with Selenium Grid

One browser scrapes one page at a time. When you need throughput, Selenium Grid lets you run many browsers in parallel across one or more machines. You point a webdriver.Remote at the Grid hub instead of launching locally:

Python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
driver = webdriver.Remote(
    command_executor="http://localhost:4444/wd/hub",
    options=options,
)
driver.get("https://example.com")
print(driver.title)
driver.quit()

Spin up a Grid in seconds with the official Docker images, then distribute your scraping jobs across the available nodes. Combine this with proxy rotation and you have a setup that scrapes thousands of pages reliably.

Common Errors and How to Fix Them

ErrorCauseFix
NoSuchElementExceptionElement not in the DOM yet, or wrong selectorAdd an explicit wait; verify the selector in DevTools
StaleElementReferenceExceptionThe DOM changed after you grabbed the elementRe-find the element right before using it
TimeoutExceptionCondition never became true within the waitIncrease the timeout or fix the selector or condition
ElementClickInterceptedExceptionAn overlay or banner is covering the elementDismiss the overlay, or scroll the element into view first
WebDriverException (session)Browser and driver mismatch or crashUpdate Selenium so Selenium Manager resolves the driver

Frequently Asked Questions

Selenium is an excellent choice when you need to scrape JavaScript-rendered pages or interact with a site by clicking, scrolling, logging in, or submitting forms, because it drives a real browser. For static HTML pages it is overkill, and a lighter stack like requests with BeautifulSoup will be much faster. Choose Selenium when the data only appears after the browser runs the page JavaScript.
BeautifulSoup parses static HTML and does not run JavaScript, so it is fastest for simple pages and APIs. Selenium and Playwright both drive real browsers and can handle dynamic, interactive sites. Playwright is newer, generally faster, and has excellent async support, while Selenium has the largest ecosystem and the longest track record. Use BeautifulSoup for static data, and Selenium or Playwright for JavaScript-heavy sites.
Yes. Default Selenium exposes signals such as the navigator.webdriver flag and automation switches, and its perfectly precise behavior is itself a giveaway. You can reduce detection by removing the automation flags, using undetected-chromedriver, behaving more like a human with randomized delays, and rotating proxies and user agents. No method is guaranteed, since detection is an ongoing arms race.
For small jobs, no. For anything at volume, yes. Scraping many pages from one IP address quickly leads to rate limiting or bans. Rotating residential or datacenter proxies spread your requests across many addresses so no single IP looks suspicious. Proxy quality is one of the biggest factors in staying unblocked when scraping at scale.
Navigate to the page, then use an explicit WebDriverWait to pause until the JavaScript-rendered elements are present before extracting them. For content that loads on scroll, use execute_script to scroll and wait for new elements. For data stored in JavaScript variables, you can run execute_script with a return statement to pull the value directly into Python.
Scraping publicly available data is broadly accepted in many places, but legality depends on your jurisdiction, the website terms of service, and what data you collect. Avoid scraping personal or copyrighted data without permission, respect robots.txt, do not overload servers, and consult the specific rules that apply to you. The tool is legal; responsible use is your obligation.

The Bottom Line

Selenium remains one of the most reliable ways to scrape the modern, JavaScript-driven web — provided you use it well. The difference between a fragile script and a production pipeline comes down to a handful of habits: pick the right tool for the page, locate elements with stable selectors, wait explicitly instead of sleeping, retry transient failures, behave politely, and route serious volume through quality proxies.

Start with the complete tutorial above, adopt the best practices one at a time, and you will have a scraper that is fast, resilient, and hard to block. Pair it with a strong proxy provider and good manners, and Selenium will serve you for years.