To scrape Google News articles automatically, send keyword queries to a hosted scraper that returns structured JSON with titles, publishers, dates, and snippets. The fastest path is a pay-per-use Apify actor — no proxy management, no headless browser babysitting, no broken CSS selectors when Google ships a layout change. Set a cron job, point it at your keywords, and you have a continuous news feed in under 10 minutes.
Quick Answer
To scrape Google News, you need three things: a query (keywords, topic ID, or exact URL), a runner that handles Google's anti-bot defenses, and a storage destination. Building it yourself means rotating residential proxies, parsing AMP redirects, and rewriting selectors every few months. A managed actor like Google News Scraper, Robust and Affordable does all of that for ~$0.30 per 1,000 articles. Trigger it via API, schedule, or webhook and pipe results to a database, Slack, or spreadsheet.
Why not just use Google News RSS feeds?
Google News still exposes RSS at news.google.com/rss/search?q=YOUR_QUERY, but the feed has hard limits that make it unusable for production:
- 100-article cap per feed, regardless of result volume
- No publish date filtering — you can't ask for "last 6 hours"
- Redirect URLs only — every link points to
news.google.com/articles/...and must be resolved - Frequent throttling — anything over ~1 request per minute from a single IP returns empty XML
- No snippet text in many feeds; titles only
If you need 500 articles per query, historical depth, or reliable scheduling, RSS breaks. SerpAPI charges $50/month for 5,000 searches. ScrapingBee charges credits per request and you still have to parse HTML. A dedicated Google News actor solves both: structured output and predictable pricing.
What data can you extract from Google News?
A typical scrape returns the following fields per article:
| Field | Example |
|---|---|
title | "Fed Holds Rates Steady Amid Inflation Cooling" |
link | Resolved publisher URL (not Google redirect) |
publisher | "Reuters" |
publishedAt | ISO 8601 timestamp |
snippet | First 150–200 chars of article body |
thumbnail | Image URL when available |
relatedCoverage | Array of similar articles from other outlets |
For broader sentiment analysis or competitive monitoring, the relatedCoverage array is the most underrated field — it gives you 5–10 alternate sources covering the same story without running additional queries.
How to scrape Google News step-by-step
1. Define your query strategy
You have three input modes:
- Keyword search:
"openai" OR "anthropic"— works like Google's normal search operators - Topic URLs: paste a Google News topic page URL (e.g., the Technology section)
- Exact article URL: pull full data for a single known article
Use boolean operators (AND, OR, -exclude) and quotes for exact phrases. Limit each query to one logical subject — splitting "AI regulation" and "AI hardware" into separate runs gives cleaner data than one mega-query.
2. Run the actor
Using the Apify API directly:
curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"queries": ["climate policy", "carbon tax"],
"language": "en",
"country": "US",
"maxItems": 200
}'
Or in Node.js with the Apify client:
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('practical-tools/google-news-scraper').call({
queries: ['site:bloomberg.com fintech'],
maxItems: 100,
language: 'en',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} articles`);
In Python:
from apify_client import ApifyClient
client = ApifyClient(token="YOUR_TOKEN")
run_input = {
"queries": ["tesla earnings"],
"maxItems": 50,
"country": "US"
}
run = client.actor("practical-tools/google-news-scraper").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], "—", item["publisher"])
3. Schedule it
Inside Apify, set a schedule (cron syntax) like 0 */2 * * * to run every 2 hours. Each run only stores new items if you enable deduplication by link. Combine this with a webhook to push results to your backend the moment a run finishes — no polling required.
4. Store and deduplicate
Cheapest pipeline:
- Actor writes to Apify Dataset (free, 7-day retention on lower plans)
- Webhook fires on
ACTOR.RUN.SUCCEEDED - Your endpoint upserts rows into Postgres/Supabase using
linkas the unique key - Old items expire automatically from the dataset
For a more durable setup, use the Apify integration with Google Sheets, Airtable, or BigQuery — all configurable in the actor's UI without writing glue code.
Is it legal to scrape Google News?
Scraping publicly accessible search results has been repeatedly upheld in U.S. courts (hiQ Labs v. LinkedIn, Van Buren v. United States). Google News headlines and snippets are designed to be indexed and shared. That said:
- Don't republish full article text — that's the publisher's copyrighted content, not Google's
- Respect rate limits — a managed actor handles this automatically
- Check terms if you're in the EU, where database rights and the Digital Services Act add nuance
- Attribute publishers when displaying their headlines
For internal monitoring, alerting, or aggregation that links back to original sources, you're on solid ground.
How much does it cost to scrape Google News at scale?
Compare a typical workload — 10,000 articles per day across 50 keywords:
| Tool | Monthly cost | Notes |
|---|---|---|
| SerpAPI | $150+ | 15,000 searches/month plan |
| ScrapingBee | $99+ | You still write the parser |
| Build yourself | $50 proxies + dev time | Selectors break monthly |
| Google News Scraper | ~$30–60 | Pay-per-use, no minimums |
Pay-per-use means you pay nothing on days you don't run it. For bursty workloads (election coverage, earnings season, crisis monitoring), this is dramatically cheaper than monthly seat-based SaaS.
Common pitfalls when scraping Google News
Ignoring locale: Google News personalizes by country and language. Searching "election" from a US IP vs. an Indian IP returns completely different results. Always pin country and language parameters.
Trusting the timestamp: Google's publishedAt is the publisher's claimed time, not when Google indexed it. For freshness-critical use cases, also track first-seen-by-you timestamps.
Not handling duplicates: The same story appears under multiple syndication URLs. Deduplicate by article title fingerprint (lowercased, stripped of publisher suffixes) rather than URL alone.
Skipping pagination: Google News results past page 1 require different request patterns. A good actor handles this transparently with a maxItems parameter — verify yours does.
Hammering the source: Even with proxies, running 1,000 parallel queries triggers reCAPTCHA. Use a managed runner that queues and throttles for you.
FAQ
Q: Can I scrape Google News in real time? Near real time, yes — schedule the actor every 5–15 minutes per keyword cluster. True streaming isn't possible because Google News itself only re-crawls publishers on a delay, so sub-5-minute polling rarely surfaces new content.
Q: Does Google News scraping work for non-English content?
Yes. Set the language parameter (e.g., de, ja, pt-BR) and country (e.g., DE, JP, BR) to get localized results. The actor handles UTF-8 encoding and right-to-left scripts like Arabic and Hebrew correctly.
Q: How do I avoid getting blocked when scraping Google News? Don't scrape directly from your own IP. A managed actor rotates residential and datacenter proxies, randomizes headers, and respects rate budgets automatically. Building this yourself requires a proxy pool ($50–200/month) plus ongoing maintenance.
Q: Can I get the full article body, not just snippets?
Google News only exposes snippets — full text lives on the publisher's site. After scraping headlines, pass the link field to a separate article extractor (Mercury, Readability, or a dedicated Apify actor) to pull body text where the publisher allows it.
Q: What's the difference between scraping Google News and using a news API like NewsAPI? News APIs aggregate from a curated list of ~80,000 sources with their own delays and gaps. Google News indexes 50,000+ publishers and surfaces stories within minutes of publication. Scraping Google News gives you Google's ranking signal — what Google considers the most relevant coverage right now — which no third-party API replicates.