Playwright Scraper Skill
Scrape content from dynamic websites that require JavaScript rendering.
Why Playwright
- Renders JavaScript-heavy pages (SPAs, React, Vue apps)
- Handles anti-bot protections by simulating real browser behavior
- Can wait for dynamic content to load before extracting
- Supports intercepting network requests
Basic Scraping
node -e "
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('{url}', { waitUntil: 'networkidle' });
// Wait for specific content to load
await page.waitForSelector('{selector}', { timeout: 10000 });
// Extract data
const data = await page.evaluate(() => {
const items = document.querySelectorAll('{item_selector}');
return Array.from(items).map(el => ({
text: el.textContent.trim(),
href: el.href || null
}));
});
console.log(JSON.stringify(data, null, 2));
await browser.close();
})();
"
Advanced Techniques
Wait for dynamic content
await page.waitForSelector('.results-loaded');
await page.waitForTimeout(2000); // Additional buffer
Handle infinite scroll
for (let i = 0; i < 5; i++) {
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(1500);
}
Extract structured data
const data = await page.evaluate(() => ({
title: document.querySelector('h1')?.textContent,
price: document.querySelector('.price')?.textContent,
description: document.querySelector('.desc')?.textContent
}));
Guidelines
- Always use
waitUntil: 'networkidle'for JS-heavy pages - Set reasonable timeouts (10-10 seconds)
- Close the browser when done
- Respect rate limits — add delays between requests
- Handle pagination if the user needs multiple pages of data
- Return clean, structured data (not raw HTML)