🕷️

Playwright Scraper

Verified

by Community

Scrape data from dynamic websites that rely on JavaScript rendering and have anti-bot protections. Uses a real browser engine to simulate genuine user visits, bypassing simple bot detection.

scrapingplaywrightwebdynamicautomationanti-bot

Playwright Scraper Skill

Scrape content from dynamic websites that require JavaScript rendering.

Why Playwright

  • Renders JavaScript-heavy pages (SPAs, React, Vue apps)
  • Handles anti-bot protections by simulating real browser behavior
  • Can wait for dynamic content to load before extracting
  • Supports intercepting network requests

Basic Scraping

node -e "
const { chromium } = require('playwright');
(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto('{url}', { waitUntil: 'networkidle' });

  // Wait for specific content to load
  await page.waitForSelector('{selector}', { timeout: 10000 });

  // Extract data
  const data = await page.evaluate(() => {
    const items = document.querySelectorAll('{item_selector}');
    return Array.from(items).map(el => ({
      text: el.textContent.trim(),
      href: el.href || null
    }));
  });

  console.log(JSON.stringify(data, null, 2));
  await browser.close();
})();
"

Advanced Techniques

Wait for dynamic content

await page.waitForSelector('.results-loaded');
await page.waitForTimeout(2000); // Additional buffer

Handle infinite scroll

for (let i = 0; i < 5; i++) {
  await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
  await page.waitForTimeout(1500);
}

Extract structured data

const data = await page.evaluate(() => ({
  title: document.querySelector('h1')?.textContent,
  price: document.querySelector('.price')?.textContent,
  description: document.querySelector('.desc')?.textContent
}));

Guidelines

  • Always use waitUntil: 'networkidle' for JS-heavy pages
  • Set reasonable timeouts (10-10 seconds)
  • Close the browser when done
  • Respect rate limits — add delays between requests
  • Handle pagination if the user needs multiple pages of data
  • Return clean, structured data (not raw HTML)