Practical Tools
google-newsweb-scrapingapifyautomation

How to Scrape Google News Articles Automatically

Scrape Google News at scale using a hosted actor. Step-by-step setup, code examples, and why RSS feeds aren't enough for production use.

To scrape Google News articles automatically, send keyword queries to a hosted scraper that returns structured JSON with titles, publishers, dates, and snippets. The fastest path is a pay-per-use Apify actor — no proxy management, no headless browser babysitting, no broken CSS selectors when Google ships a layout change. Set a cron job, point it at your keywords, and you have a continuous news feed in under 10 minutes.

Quick Answer

To scrape Google News, you need three things: a query (keywords, topic ID, or exact URL), a runner that handles Google's anti-bot defenses, and a storage destination. Building it yourself means rotating residential proxies, parsing AMP redirects, and rewriting selectors every few months. A managed actor like Google News Scraper, Robust and Affordable does all of that for ~$0.30 per 1,000 articles. Trigger it via API, schedule, or webhook and pipe results to a database, Slack, or spreadsheet.

Why not just use Google News RSS feeds?

Google News still exposes RSS at news.google.com/rss/search?q=YOUR_QUERY, but the feed has hard limits that make it unusable for production:

  • 100-article cap per feed, regardless of result volume
  • No publish date filtering — you can't ask for "last 6 hours"
  • Redirect URLs only — every link points to news.google.com/articles/... and must be resolved
  • Frequent throttling — anything over ~1 request per minute from a single IP returns empty XML
  • No snippet text in many feeds; titles only

If you need 500 articles per query, historical depth, or reliable scheduling, RSS breaks. SerpAPI charges $50/month for 5,000 searches. ScrapingBee charges credits per request and you still have to parse HTML. A dedicated Google News actor solves both: structured output and predictable pricing.

What data can you extract from Google News?

A typical scrape returns the following fields per article:

FieldExample
title"Fed Holds Rates Steady Amid Inflation Cooling"
linkResolved publisher URL (not Google redirect)
publisher"Reuters"
publishedAtISO 8601 timestamp
snippetFirst 150–200 chars of article body
thumbnailImage URL when available
relatedCoverageArray of similar articles from other outlets

For broader sentiment analysis or competitive monitoring, the relatedCoverage array is the most underrated field — it gives you 5–10 alternate sources covering the same story without running additional queries.

How to scrape Google News step-by-step

1. Define your query strategy

You have three input modes:

  • Keyword search: "openai" OR "anthropic" — works like Google's normal search operators
  • Topic URLs: paste a Google News topic page URL (e.g., the Technology section)
  • Exact article URL: pull full data for a single known article

Use boolean operators (AND, OR, -exclude) and quotes for exact phrases. Limit each query to one logical subject — splitting "AI regulation" and "AI hardware" into separate runs gives cleaner data than one mega-query.

2. Run the actor

Using the Apify API directly:

curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "queries": ["climate policy", "carbon tax"],
    "language": "en",
    "country": "US",
    "maxItems": 200
  }'

Or in Node.js with the Apify client:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor('practical-tools/google-news-scraper').call({
    queries: ['site:bloomberg.com fintech'],
    maxItems: 100,
    language: 'en',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} articles`);

In Python:

from apify_client import ApifyClient

client = ApifyClient(token="YOUR_TOKEN")

run_input = {
    "queries": ["tesla earnings"],
    "maxItems": 50,
    "country": "US"
}

run = client.actor("practical-tools/google-news-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"], "—", item["publisher"])

3. Schedule it

Inside Apify, set a schedule (cron syntax) like 0 */2 * * * to run every 2 hours. Each run only stores new items if you enable deduplication by link. Combine this with a webhook to push results to your backend the moment a run finishes — no polling required.

4. Store and deduplicate

Cheapest pipeline:

  1. Actor writes to Apify Dataset (free, 7-day retention on lower plans)
  2. Webhook fires on ACTOR.RUN.SUCCEEDED
  3. Your endpoint upserts rows into Postgres/Supabase using link as the unique key
  4. Old items expire automatically from the dataset

For a more durable setup, use the Apify integration with Google Sheets, Airtable, or BigQuery — all configurable in the actor's UI without writing glue code.

Scraping publicly accessible search results has been repeatedly upheld in U.S. courts (hiQ Labs v. LinkedIn, Van Buren v. United States). Google News headlines and snippets are designed to be indexed and shared. That said:

  • Don't republish full article text — that's the publisher's copyrighted content, not Google's
  • Respect rate limits — a managed actor handles this automatically
  • Check terms if you're in the EU, where database rights and the Digital Services Act add nuance
  • Attribute publishers when displaying their headlines

For internal monitoring, alerting, or aggregation that links back to original sources, you're on solid ground.

How much does it cost to scrape Google News at scale?

Compare a typical workload — 10,000 articles per day across 50 keywords:

ToolMonthly costNotes
SerpAPI$150+15,000 searches/month plan
ScrapingBee$99+You still write the parser
Build yourself$50 proxies + dev timeSelectors break monthly
Google News Scraper~$30–60Pay-per-use, no minimums

Pay-per-use means you pay nothing on days you don't run it. For bursty workloads (election coverage, earnings season, crisis monitoring), this is dramatically cheaper than monthly seat-based SaaS.

Common pitfalls when scraping Google News

Ignoring locale: Google News personalizes by country and language. Searching "election" from a US IP vs. an Indian IP returns completely different results. Always pin country and language parameters.

Trusting the timestamp: Google's publishedAt is the publisher's claimed time, not when Google indexed it. For freshness-critical use cases, also track first-seen-by-you timestamps.

Not handling duplicates: The same story appears under multiple syndication URLs. Deduplicate by article title fingerprint (lowercased, stripped of publisher suffixes) rather than URL alone.

Skipping pagination: Google News results past page 1 require different request patterns. A good actor handles this transparently with a maxItems parameter — verify yours does.

Hammering the source: Even with proxies, running 1,000 parallel queries triggers reCAPTCHA. Use a managed runner that queues and throttles for you.

FAQ

Q: Can I scrape Google News in real time? Near real time, yes — schedule the actor every 5–15 minutes per keyword cluster. True streaming isn't possible because Google News itself only re-crawls publishers on a delay, so sub-5-minute polling rarely surfaces new content.

Q: Does Google News scraping work for non-English content? Yes. Set the language parameter (e.g., de, ja, pt-BR) and country (e.g., DE, JP, BR) to get localized results. The actor handles UTF-8 encoding and right-to-left scripts like Arabic and Hebrew correctly.

Q: How do I avoid getting blocked when scraping Google News? Don't scrape directly from your own IP. A managed actor rotates residential and datacenter proxies, randomizes headers, and respects rate budgets automatically. Building this yourself requires a proxy pool ($50–200/month) plus ongoing maintenance.

Q: Can I get the full article body, not just snippets? Google News only exposes snippets — full text lives on the publisher's site. After scraping headlines, pass the link field to a separate article extractor (Mercury, Readability, or a dedicated Apify actor) to pull body text where the publisher allows it.

Q: What's the difference between scraping Google News and using a news API like NewsAPI? News APIs aggregate from a curated list of ~80,000 sources with their own delays and gaps. Google News indexes 50,000+ publishers and surfaces stories within minutes of publication. Scraping Google News gives you Google's ranking signal — what Google considers the most relevant coverage right now — which no third-party API replicates.