To scrape California business registry data, you need either direct access to the California Secretary of State's bizfile portal or a pre-built actor that handles the search, pagination, and detail-page extraction for you. The fastest path is to run the California Fresh Business Leads actor on Apify, which pulls newly registered LLCs and corporations daily and returns structured records with the business name, address, and registered agent. Building your own scraper is possible but the bizfile site uses dynamic POST requests and session tokens that break frequently.
Quick Answer
California business registry data lives in the Secretary of State's bizfile online system at bizfileonline.sos.ca.gov, which lists every LLC, corporation, and limited partnership registered in the state. You can extract it by running a maintained scraper like California Fresh Business Leads on Apify, which outputs JSON or CSV with company name, mailing address, principal address, agent of service, entity type, and filing date. Pricing is pay-per-use, so you only pay for the records you pull. Manual exports from the bizfile UI are limited to small batches and CAPTCHA-gated, making automation the only viable option at scale. Expect 2,000–5,000 fresh entities per business day across California.
What data is in the California Secretary of State business registry?
Every entity filing with the California SoS produces a public record with these fields:
- Entity name (e.g., "Acme Logistics LLC")
- Entity number (12-digit identifier)
- Registration date and status (Active, Suspended, Terminated)
- Entity type (Domestic LLC, Foreign Corp, Limited Partnership, etc.)
- Principal office address
- Mailing address
- Agent for service of process — name and California address
- CEO/manager name (on Statement of Information filings)
- Jurisdiction (California or out-of-state for foreign entities)
For B2B prospecting, the agent and principal addresses are the gold. Roughly 70% of new LLCs use a registered agent service (CT Corporation, Northwest, LegalZoom), but the principal address is almost always a real business location.
How do I scrape bizfile online from the California SoS?
The bizfile site is a JavaScript-heavy SPA that talks to a GraphQL-style backend. Three things make raw scraping painful:
- Search returns 200 results max per query — you have to chunk by date or letter.
- Each detail page requires a separate POST with a session-bound token.
- Rate limiting kicks in around 60 requests/minute per IP, with hard CAPTCHAs after.
A working DIY approach looks like this:
# Pseudo-flow — not production code
session = init_session_with_token()
for date in date_range:
results = post_search(session, filed_date=date, entity_type="LLC")
for entity in results:
detail = post_detail(session, entity_id=entity["id"])
save(detail)
time.sleep(random.uniform(1.5, 3.0))
You'll need residential proxies (datacenter IPs get blocked within an hour), a headless browser fallback for the token refresh, and a retry queue. Most teams spend 2–3 weeks getting this stable, then another week per quarter maintaining it when the site changes.
The pre-built California Fresh Business Leads actor handles all of this and runs on a schedule, so you wake up to a fresh CSV.
How can I get a list of newly registered businesses in California?
For prospecting, you only care about entities filed in the last 1–7 days. Run the actor with a date filter set to "yesterday" and you get every LLC and corporation that registered in the previous 24 hours — typically 2,000 to 5,000 records on a normal weekday, less on Mondays (because filings queue over the weekend) and more around quarter-ends.
A typical record looks like:
{
"entity_name": "Sunset Ridge Ventures LLC",
"entity_number": "202612345678",
"filing_date": "2026-04-29",
"entity_type": "Domestic LLC",
"status": "Active",
"principal_address": "1234 Market St, San Francisco, CA 94103",
"mailing_address": "PO Box 567, San Francisco, CA 94104",
"agent_name": "John Doe",
"agent_address": "1234 Market St, San Francisco, CA 94103"
}
Set the actor on a daily Apify schedule, push results to Google Sheets or a webhook, and you have a continuously refreshing lead list with zero maintenance.
Is the California business registry public and free to scrape?
Yes. California Government Code §6253 and the SoS's own data policy make business filings explicitly public records. The bizfile site has no terms of service prohibiting automated access, and the data is published precisely so the public can search it. That said, you should:
- Throttle requests to avoid degrading the service (the actor stays under 30 req/min).
- Don't redistribute personal home addresses of agents in bulk consumer products without thinking about CCPA implications.
- Honor the site's robots.txt, which currently allows crawling of the search endpoints.
The SoS does sell bulk data dumps directly — but the cheapest tier is $5,000/year and the file is updated weekly, not daily. For most use cases, scraping fresh records pay-per-use is 10–50x cheaper.
What are the use cases for California business registry data?
The teams using this data typically fall into one of these buckets:
- B2B SaaS sales — new LLCs need accounting software, payroll, banking, insurance. Reach out within 7 days of formation, before competitors do.
- Business banking and fintech — Mercury, Brex, and Relay all built early growth on new-entity outreach.
- Insurance brokers — California requires GL and workers' comp; new businesses are uncovered prospects.
- Commercial real estate — principal addresses reveal where new companies are operating.
- Bookkeeping and CPA firms — first-year businesses need tax setup help.
- Market research — track formation rates by city, industry signal, or entity type for economic reports.
The data freshness matters because conversion rates on new-business outreach drop sharply after the first 14 days. A list of 1-day-old LLCs converts roughly 3–5x better than a 30-day-old list, based on standard outbound benchmarks.
How much does scraping California business data cost?
Three options, ranked by total cost of ownership:
| Option | Setup time | Monthly cost (10k records) | Maintenance |
|---|---|---|---|
| Build your own scraper | 2–3 weeks | $50 proxies + dev time | Ongoing |
| Buy SoS bulk data | 1 day | ~$420 ($5k/yr) | None, but stale |
| California Fresh Business Leads on Apify | 5 minutes | Pay-per-use, typically $20–60 | None |
The actor is pay-per-use on Apify's compute units, so you pay only for the records actually scraped. For a daily run pulling ~3,000 fresh records, expect around $1–2 per day. Compare that to a residential proxy bill alone, which starts at $50/month minimum.
How do I export California business leads to my CRM?
The actor outputs standard JSON, CSV, or Excel through Apify's dataset API. From there:
- HubSpot/Salesforce direct — use Apify's HubSpot integration or a Zap to push each new record as a Company.
- Google Sheets — built-in Apify integration writes new rows on each schedule run.
- Webhook to your backend — Apify POSTs the dataset URL when the run finishes; ingest from there.
- Clay or Instantly — pull the dataset over their HTTP fetcher, then enrich with email finders.
A common stack: actor runs at 6am PT → results land in a Google Sheet → Clay enriches with emails and LinkedIn → Instantly sends a sequence by 9am. End-to-end automated, ~$0.10 per lead delivered.
FAQ
Q: How fresh is the data from the California Fresh Business Leads actor? The actor pulls filings as they appear in the SoS bizfile system, which updates throughout the business day. Running daily gives you records filed in the previous 24 hours, typically 2,000–5,000 entities depending on the day of week.
Q: Can I filter by city, county, or business type? Yes — the dataset includes principal address and entity type, so you can filter post-extraction in Google Sheets, SQL, or your CRM. For example, filter for "Domestic LLC" with addresses in Los Angeles County to get just LA-area startups.
Q: Does this include corporations, LLCs, and partnerships? The actor pulls all entity types registered with the California SoS, including domestic LLCs, foreign LLCs, domestic and foreign corporations (stock and nonprofit), and limited partnerships. Sole proprietorships are not in the SoS registry — those file at the county level.
Q: Will I get email addresses or phone numbers? No — the California SoS registry does not contain emails or phone numbers, only physical addresses and agent information. You'll need an enrichment tool like Clay, Apollo, or Hunter to add contact data on top of the registry records.
Q: What happens if the bizfile site changes its layout? The actor is maintained and updated when the SoS site changes, so your scheduled runs keep working. This is the main reason to use a hosted actor instead of a homegrown scraper — when bizfile pushed a major update in 2024, DIY scrapers broke for weeks while maintained actors were patched within 48 hours.