Web Scraping with Selenium: Best Practices & Tutorial
A complete Selenium web scraping tutorial plus the best practices that make scrapers reliable: explicit waits, retries, stealth, proxies, and scaling with Selenium Grid.
If a website renders its content with JavaScript, hides data behind a login, or only reveals prices after you click a button, plain HTTP requests will leave you staring at an empty page. This is exactly where Selenium earns its keep: it drives a real browser, so it sees the web the same way a human does.
Selenium has been the workhorse of browser automation since 2004. It powers QA suites at most of the Fortune 500, and for web scraping it remains a dependable choice when you need to render JavaScript, interact with pages, and handle complex flows. But Selenium is also famous for being flaky in the wrong hands — brittle selectors, bad waits, and clumsy bot patterns that get you blocked within minutes.
This guide fixes that. We will go from a clean install to a complete, working scraper, then layer on the best practices that separate a hobby script from a production-grade pipeline: proper waits, retries, stealth, proxies, and scaling. Every concept comes with runnable Python code. Let us build something reliable.
When Should You Actually Use Selenium?
Selenium is powerful but heavy — it launches a full browser, which costs memory and time. Reach for it only when you genuinely need a browser:
- Use Selenium when the page renders content with JavaScript, requires clicks, scrolling, logins, or form submissions, or when you need to screenshot the rendered result.
- Skip Selenium when the data is in the raw HTML or behind a clean API. For static pages,
requestsplusBeautifulSoupis ten times faster and far lighter.
A quick comparison of the common Python options:
| Tool | Renders JavaScript? | Speed | Best for |
|---|---|---|---|
| requests + BeautifulSoup | No | Very fast | Static HTML, APIs |
| Selenium | Yes | Slower | Interactive, JS-heavy sites |
| Playwright | Yes | Fast | Modern JS sites, async at scale |
| Scrapy | No (without add-ons) | Very fast | Large structured crawls |
The rule of thumb: try the page with requests first. If the data you want is missing from the raw response, it is being rendered client-side, and that is your cue to bring in Selenium.
Setting Up Your Environment
Modern Selenium (4.6 and later) ships with Selenium Manager, which automatically downloads and matches the right browser driver for you. The days of manually juggling ChromeDriver versions are over.
# Install Selenium into a virtual environment
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate
pip install seleniumHere is the smallest scraper that proves everything works. It opens a page and prints every quote on it:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome() # Selenium Manager resolves the driver
driver.get("https://quotes.toscrape.com")
for quote in driver.find_elements(By.CSS_SELECTOR, "span.text"):
print(quote.text)
driver.quit() # always close the browserRun it. If you see a list of quotes, you are ready to build something real.
Locating Elements the Right Way
Everything in Selenium scraping comes down to finding elements reliably. Selenium offers several locator strategies through the By class:
| Strategy | Example | When to use |
|---|---|---|
| By.ID | By.ID, "search" | Fastest and most stable when an ID exists |
| By.CSS_SELECTOR | By.CSS_SELECTOR, "div.quote span.text" | The everyday workhorse; flexible and fast |
| By.XPATH | By.XPATH, "//div[@class=quote]" | Complex relationships or selecting by text |
| By.CLASS_NAME | By.CLASS_NAME, "author" | Simple single-class lookups |
Best practice: prefer IDs and CSS selectors over long, brittle XPath chains. An absolute XPath like /html/body/div[2]/div[3]/span breaks the instant the layout shifts. A CSS selector tied to a meaningful class survives redesigns far better. Use find_element for a single match and find_elements (which returns a list, never an error) when you expect many.
Waiting Properly: The Number One Skill in Selenium
If you remember one thing from this guide, make it this. The overwhelming majority of flaky Selenium scrapers fail because they try to read an element before the page has finished rendering it. The fix is explicit waits.
Implicit vs explicit waits
An implicit wait sets a global timeout for finding elements. It is a blunt instrument. An explicit wait pauses until a specific condition is true, then continues immediately — precise and efficient. Always prefer explicit waits.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
wait = WebDriverWait(driver, 10) # wait up to 10 seconds
# Pause until the element is actually present, then grab it
first_quote = wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.quote span.text"))
)
print(first_quote.text)Common expected conditions you will reach for constantly: presence_of_element_located, visibility_of_element_located, element_to_be_clickable, and text_to_be_present_in_element.
Why you should almost never use time.sleep
A hard-coded time.sleep(5) is the mark of a beginner. It either wastes time (the page loaded in 0.4 seconds) or fails anyway (the page took 6). Explicit waits adapt to reality — they return the moment the condition is met and only time out when something is genuinely wrong. There is exactly one place sleep is acceptable: pacing your requests politely, which we will cover under best practices.
A Complete Tutorial: Scrape a Dynamic Site Step by Step
Let us scrape a JavaScript-rendered site end to end. We will target the JS version of Quotes to Scrape, which builds its content client-side — the perfect Selenium use case. Our goal: collect every quote, its author, and its tags across all pages, then save the lot to CSV.
import csv
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Step 1 - configure a headless browser
options = Options()
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 10)
results = []
driver.get("https://quotes.toscrape.com/js/")
# Step 2 - loop through every page
while True:
# wait until the JS-rendered quotes actually appear
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.quote")))
# Step 3 - extract structured data from each card
for card in driver.find_elements(By.CSS_SELECTOR, "div.quote"):
results.append({
"text": card.find_element(By.CSS_SELECTOR, "span.text").text,
"author": card.find_element(By.CSS_SELECTOR, "small.author").text,
"tags": ", ".join(t.text for t in card.find_elements(By.CSS_SELECTOR, "a.tag")),
})
# Step 4 - follow the "Next" button, or stop if there is none
next_link = driver.find_elements(By.CSS_SELECTOR, "li.next a")
if not next_link:
break
next_link[0].click()
driver.quit()
# Step 5 - persist the data
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["text", "author", "tags"])
writer.writeheader()
writer.writerows(results)
print(f"Scraped {len(results)} quotes across all pages")Notice the shape of this scraper: configure once, wait for content, extract into clean dictionaries, paginate by following the real Next button rather than guessing URLs, and write structured output at the end. That structure scales to far more complex sites with very few changes.
Handling JavaScript, Infinite Scroll, and Pop-ups
Real sites throw curveballs. Three patterns cover most of them.
Infinite scroll — many feeds load more content as you scroll. Trigger it with JavaScript and stop when the page height stops growing:
import time
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2) # give new content a moment to load
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break # reached the bottom
last_height = new_heightCookie banners and pop-ups — find and dismiss them, but do not crash if they are absent. Using find_elements (plural) returns an empty list instead of raising, which keeps your scraper resilient:
accept = driver.find_elements(By.CSS_SELECTOR, "button#accept-cookies")
if accept:
accept[0].click()Reading values from JavaScript — when data lives in a JS variable or a tricky widget, you can execute script directly and return the result into Python with driver.execute_script("return window.someData").
Best Practices for Reliable, Polite Scraping
This is the section that turns a script you babysit into a pipeline you trust. Adopt these and your success rate climbs dramatically.
- Always use explicit waits. Never assume an element is ready. This single habit eliminates most flakiness.
- Wrap risky actions in retries with backoff. Networks blip and elements go stale. A retry layer turns transient failures into non-events.
- Run headless and trim the fat. Disable images and unnecessary features when you only need data — it is faster and cheaper.
- Rate-limit yourself. Add a short, slightly randomized delay between requests. Hammering a server is rude, suspicious, and gets you blocked.
- Respect robots.txt and the terms of service. Scrape public data responsibly and never overload a site.
- Centralize selectors. Keep your CSS selectors in one place (a Page Object) so a site redesign is a one-line fix, not a rewrite.
- Always quit the driver. Orphaned browser processes leak memory fast. Use try/finally or a context manager.
A simple, reusable retry helper handles the two most common transient exceptions:
import time
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
def with_retries(action, attempts=3, base_delay=2):
for i in range(attempts):
try:
return action()
except (TimeoutException, StaleElementReferenceException):
if i == attempts - 1:
raise # out of retries, surface the error
time.sleep(base_delay * (i + 1)) # exponential backoffAvoiding Bot Detection
Default Selenium is easy to spot. It announces itself through the navigator.webdriver flag, uses an automation-flavored Chrome profile, and behaves with inhuman precision. If your scraper suddenly hits CAPTCHAs or blocks, detection is why. Your toolkit:
- Hide the automation flags. A few launch arguments remove the most obvious tells.
- Use undetected-chromedriver. A drop-in replacement patched to bypass common anti-bot systems like Cloudflare.
- Behave like a human. Randomize delays, scroll before clicking, and avoid scraping a hundred pages in ten seconds.
- Rotate proxies and user agents. The single biggest factor in staying unblocked at scale, covered next.
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)When that is not enough, switch to undetected-chromedriver:
# pip install undetected-chromedriver
import undetected_chromedriver as uc
driver = uc.Chrome(headless=False)
driver.get("https://example.com")
print(driver.title)
driver.quit()Using Proxies with Selenium
At any real volume, scraping from a single IP gets that IP rate-limited or banned. Rotating residential or datacenter proxies spread your requests across many IPs so no single address looks suspicious. Because authenticated proxies (username and password) are awkward in vanilla Selenium, the cleanest approach is selenium-wire, which supports proxy auth directly:
# pip install selenium-wire
from seleniumwire import webdriver
proxy = "http://USER:PASS@gate.your-proxy.com:7000"
seleniumwire_options = {
"proxy": {
"http": proxy,
"https": proxy,
"no_proxy": "localhost,127.0.0.1",
}
}
driver = webdriver.Chrome(seleniumwire_options=seleniumwire_options)
driver.get("https://httpbin.org/ip")
print(driver.find_element("tag name", "body").text) # confirms the proxy IP
driver.quit()Proxy quality matters more than almost anything else for staying unblocked. These are the providers we rate most highly for scraping — compare pool size, locations, and pricing:
For the full lineup, browse our proxy provider directory and filter by use case, network type, and budget.
Scaling Up with Selenium Grid
One browser scrapes one page at a time. When you need throughput, Selenium Grid lets you run many browsers in parallel across one or more machines. You point a webdriver.Remote at the Grid hub instead of launching locally:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
driver = webdriver.Remote(
command_executor="http://localhost:4444/wd/hub",
options=options,
)
driver.get("https://example.com")
print(driver.title)
driver.quit()Spin up a Grid in seconds with the official Docker images, then distribute your scraping jobs across the available nodes. Combine this with proxy rotation and you have a setup that scrapes thousands of pages reliably.
Common Errors and How to Fix Them
| Error | Cause | Fix |
|---|---|---|
| NoSuchElementException | Element not in the DOM yet, or wrong selector | Add an explicit wait; verify the selector in DevTools |
| StaleElementReferenceException | The DOM changed after you grabbed the element | Re-find the element right before using it |
| TimeoutException | Condition never became true within the wait | Increase the timeout or fix the selector or condition |
| ElementClickInterceptedException | An overlay or banner is covering the element | Dismiss the overlay, or scroll the element into view first |
| WebDriverException (session) | Browser and driver mismatch or crash | Update Selenium so Selenium Manager resolves the driver |
Frequently Asked Questions
The Bottom Line
Selenium remains one of the most reliable ways to scrape the modern, JavaScript-driven web — provided you use it well. The difference between a fragile script and a production pipeline comes down to a handful of habits: pick the right tool for the page, locate elements with stable selectors, wait explicitly instead of sleeping, retry transient failures, behave politely, and route serious volume through quality proxies.
Start with the complete tutorial above, adopt the best practices one at a time, and you will have a scraper that is fast, resilient, and hard to block. Pair it with a strong proxy provider and good manners, and Selenium will serve you for years.
Keep Reading
More articles you might enjoy



