To extract emails from websites in bulk, feed a list of domains into a crawler that visits each site's pages, parses HTML and mailto: links, and outputs structured records. The fastest path is a pay-per-result Apify actor like Contact Details Scraper — you upload 1,000 URLs, get back emails, phones, and social profiles for $4.50 total, with no servers to manage.
Quick Answer
A bulk email extractor from websites is a crawler that takes a list of URLs, visits each page (and usually internal contact/about pages), and pulls out email addresses, phone numbers, and social media handles into a CSV or JSON file. The cheapest reliable approach is using a hosted scraping actor that charges per successful result instead of per request, which keeps costs predictable for lists of 10,000+ domains. Expect to pay around $4.50 per 1,000 sites with a tool like Contact Details Scraper. Self-built scrapers using Python + requests + regex work for under 500 URLs but break on JavaScript-heavy sites, Cloudflare, and proxy rotation. For lead-gen at scale, hosted actors win on cost, speed, and maintenance.
What is a bulk email extractor?
A bulk email extractor is software that automates three steps you'd otherwise do by hand:
- Fetch the HTML of a website (often crawling 5–20 internal pages, not just the homepage).
- Parse for patterns:
mailto:anchors, regex matches for[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}, obfuscated emails likename [at] domain [dot] com, and JSON-LD contact blocks. - Deduplicate and export to CSV, JSON, Google Sheets, or a CRM.
Good extractors also pull phone numbers and social profiles from the same crawl — that's where the 80/20 lives, since a single visit gets you the whole contact graph.
How do I extract emails from a list of URLs?
Three approaches, ranked by hours-to-result:
Option 1: Hosted actor (5 minutes setup)
Drop a list of URLs into a hosted scraper. Here's the actual API call to run Contact Details Scraper against 1,000 domains:
curl -X POST "https://api.apify.com/v2/acts/vdrmota~contact-info-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"startUrls": [
{"url": "https://example.com"},
{"url": "https://acme.io"}
],
"maxDepth": 2,
"maxRequestsPerStartUrl": 10
}'
Output structure per site:
{
"url": "https://example.com",
"emails": ["sales@example.com", "info@example.com"],
"phones": ["+1-415-555-0142"],
"linkedIns": ["https://linkedin.com/company/example"],
"twitters": ["https://twitter.com/example"]
}
Cost: $0.0045 per result. A 10,000-URL run costs $45 and finishes in ~30 minutes with concurrency.
Option 2: Python + requests (a weekend of work)
import re, requests
from bs4 import BeautifulSoup
EMAIL_RE = re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
def extract_emails(url):
try:
r = requests.get(url, timeout=10, headers={"User-Agent": "Mozilla/5.0"})
emails = set(EMAIL_RE.findall(r.text))
# also follow /contact and /about
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.find_all("a", href=True):
if any(k in link["href"].lower() for k in ["contact", "about"]):
sub = requests.get(link["href"], timeout=10).text
emails.update(EMAIL_RE.findall(sub))
return emails
except Exception:
return set()
This works for ~60% of small business sites. It fails on: JavaScript-rendered emails, Cloudflare email protection, obfuscated text, sites that require headless browsers, and rate-limited targets.
Option 3: Browser automation (Playwright)
Required when sites render emails via JS or use data-cfemail Cloudflare obfuscation. Adds 5–10x the runtime and infrastructure overhead — proxies, browser pools, retry logic.
How much does bulk email extraction cost?
Real numbers across common providers for a 10,000-URL job:
| Tool | Cost per 1K | 10K cost | Model |
|---|---|---|---|
| Contact Details Scraper | $4.50 | $45 | Pay per result |
| Outscraper Email Finder | ~$15 | $150 | Per query |
| Hunter.io Domain Search | ~$49 | $490 | Per credit |
| Self-hosted (AWS + proxies) | ~$8–12 | $80–120 | Compute + proxies |
| ScrapeHero managed | $200+ | quote-based | Service |
The pay-per-result model matters because empty results don't bill. If 30% of your list has no findable email, you only pay for the 7,000 that returned data.
Is it legal to scrape emails from websites?
Scraping publicly displayed emails is legal in most jurisdictions, but using them is regulated. The compliance lines:
- Public business emails (info@, sales@, contact@): generally fair game for B2B outreach in the US (CAN-SPAM) if you include opt-out and a physical address.
- EU/UK addresses: GDPR applies. You need legitimate interest documented and must honor right-to-erasure requests.
- Personal emails (firstname.lastname@): higher risk. CCPA and GDPR treat these as personal data.
- Sending unsolicited mail: different rules from collecting. CASL (Canada) requires express consent before sending.
Rule of thumb: scraping is the easy part, deliverability and consent are where lead-gen ops actually break. Verify emails before sending — bouncing 10,000 cold emails will destroy your sender reputation in a week.
How do I crawl multiple pages per website for emails?
Most contact emails sit on /contact, /about, /team, or footer text — not the homepage. A bulk extractor needs depth crawling:
- maxDepth: 2 — follow links one hop from the start URL.
- URL filters — prioritize URLs containing
contact,about,team,support,help. - Page limit per domain — cap at 10–20 pages so a blog with 5,000 posts doesn't eat your budget.
Contact Details Scraper handles this with maxDepth and maxRequestsPerStartUrl parameters. For a self-built crawler, use Scrapy with a LinkExtractor rule:
rules = (
Rule(LinkExtractor(allow=r"(contact|about|team)"), callback="parse_emails"),
)
How do I handle obfuscated and JavaScript emails?
Three common obfuscation patterns and how to defeat each:
- Text substitution (
name [at] domain [dot] com): regex replace\s*\[at\]\s*→@before parsing. - Cloudflare email protection (
<a class="__cf_email__" data-cfemail="...">): decode the hex string by XORing with the first byte. Twenty lines of code, well-documented. - JavaScript rendering: requires a headless browser. Pre-built actors handle this transparently; DIY means Playwright or Puppeteer + 4x the RAM.
A managed actor abstracts all three. That's typically the deciding factor for teams who tried the DIY path first and burned a week on edge cases.
What output format works best for lead-gen?
CSV for spreadsheet workflows, JSON for CRM imports. Useful columns to capture beyond the email itself:
source_url— which page the email came from (proves provenance for compliance).domain— for deduplication.confidence— first-class email (e.g., on/contact) vs. scraped from a footer.social_handles— LinkedIn URL is gold for B2B enrichment.
Pipe the JSON into Clearbit, Apollo, or Hunter Verifier to enrich and validate before any outreach.
FAQ
Q: How fast can I extract emails from 10,000 websites? With a hosted actor running 50 parallel browsers, expect 20–40 minutes. A single-threaded Python script averages 1–2 seconds per URL, so the same job runs 3–6 hours sequentially — and that's before you hit rate limits.
Q: What's the typical hit rate for finding emails? Across mixed B2B lists, 50–70% of domains return at least one email. Higher for SMB local businesses (often 80%+), lower for enterprise sites that hide contact behind forms. Pay-per-result pricing means you only pay for the successful 50–70%.
Q: Can I extract emails from LinkedIn or Facebook? LinkedIn actively blocks scrapers and most emails aren't displayed publicly anyway. Facebook business pages occasionally show contact emails, which Contact Details Scraper picks up. For LinkedIn-specific email finding, you need a different tool category (email finders that match name + company).
Q: How do I avoid getting IP-banned during bulk extraction? Use a hosted actor with built-in proxy rotation, or rotate residential proxies yourself (Bright Data, Smartproxy ~$5–15/GB). Throttle to under 1 request per second per domain, randomize user agents, and respect 429 responses with exponential backoff.
Q: Should I verify emails before using them? Yes — always. Cold-emailing unverified addresses produces 15–30% bounce rates, which Gmail and Outlook penalize aggressively. Run extracted emails through ZeroBounce, NeverBounce, or Hunter Verifier ($1–4 per 1,000) before any send. The verification cost is lower than the cost of a damaged sender domain.