Google api leak top seo recommendations

Google Search API Leak: Top 5 Key Findings for Marketers (2024) .

04 June 2024 min Digital Media

Google may have accidentally leaked SEO secrets, and experts Rand Fishkin & Mike King have revealed key findings. We share the top 5 takeaways for marketers to maximise their website's ranking and SEO performance.

Last week, two prominent voices in the SEO industry - Rand Fishkin of SparkToro/Moz and Mike King of iPullRank began covering a large leak of API documentation called GoogleApiContentWarehouseV1. This was believed to be mistakenly uploaded to GitHub by Google, where it was obtained by a non-Googler and shared with Rand. Google has since confirmed this leak to be authentic but notes the data is incomplete and some of it is outdated, as it is a small part of the overall system.

Google API Content Warehouse
Screenshot from Google Search API leak

It’s believed Google mistakenly uploaded this to GitHub, where it was obtained by a non-Googler and shared with Rand. Google has since confirmed this leak to be authentic but notes that the data is incomplete and some is outdated, as it is a small part of the overall system.

The packet was full of API documentation like the below and was uploaded in early March 2024. Rand & Mike believed it to be current - it has mentions of LLMs, but not AI Overviews. The documents have since been removed.

After obtaining the documents, Rand and Mike analysed some of the data and last week shared their early findings, cross-referencing API findings with patents and statements from Google Search Analysts like Matt Cutts, John Mueller, and Gary Illyes.

As Rand and Mike have noted, there is no factor scale included in the leaked documentation. The leak is a list of attributes that Google collects from websites for use in ranking search results - SEOs call these "ranking factors." Google engineers would use this API in combination with other systems to run queries and build new systems or tests, as they are all seemingly linked in what Mike describes as a monolithic repository - a shared environment where all code and systems are connected. However, the presence of attributes does not validate they are current ranking factors. There are notes that black and white statements by Googlers, such as "we do not use user click data," are clearly untrue.

This is an evolving story, and Rand & Mike have posted their initial assessments over the past week. No doubt, more will come from them and others as the documentation is more thoroughly reviewed by SEOs worldwide.

You can go right to the source and read Rand’s post and Mike’s post (and his follow-up on Search Engine Land). These guys are very thorough, and they’re reviewing API documentation, so the content is technical and a bit dense. 

I’ve written this for marketing generalists and managers who know SEO is important and could use some direction and context. 

I’ve picked the top 5 findings from the Google leak, provided high-level context, summarised findings in the leak and highlighted learnings to take forward that validate the SEO tactics you should be focused on to maximise your SEO performance. 

If you’d like the TLDR version, you can skip to the end here.

Top 5 SEO Findings from the Google Leak  

Click Data from User Searches is among the most important Signals for Ranking

Context

It has long been suspected that Google uses click data from Chrome and Android to reorder search results.  

Ten years ago, Rand Fishkin ran an experiment by asking his Twitter users to click on the #1 result, go back to the search results, click #7 and stay active on the site. This moved positions from the bottom of page 1 to #1. He repeated it a number of times with similar results.  

Click Data from User Searches is Among the Most Important Signals for Ranking
Image source: Sparktoro
screenshot of people who replied to Rand Fishkin's tweet

Findings

The Google DOJ inquiry highlighted a ranking system called NavBoost, which took into account clickstream data (huge amounts of click data from Google plugins, Chrome, and Android browsing data). Factors like GoodClicks and BadClicks are found within the leaked API documentation, suggesting, as many thought, that Google is using signals like short clicks (bounces) and pogo-sticking (going back and forth between search results until you are satisfied) to ascertain user satisfaction and may reorder search results depending on the signals it receives.

The DOJ testimony goes on to say that while NavBoost brought search results together, another system called Glue does everything else (images, maps, YouTube) and pulls it into the search result pages we know, ranked by where users click.

Learning

SEO has been focused on what it takes to rank and get the click, but what happens after the click may matter even more. 

Ensuring your content answers the query, loads quickly, has good UX/UI functionality and gets to the solution hopefully keeps users on your website and leads to deeper engagement with your content. 

Google is paying attention to the engagement on your page vs others and may rank your content better if it gets good signals. 

There’s something to be said here about brand and reputation; if your brand is well-recognised and trusted, even if it's not #1, you may get more clicks and, in turn, rank higher if you've invested in a good onsite experience.

Site Signals are important for SEO

Site Signals are important for SEO

Context

Google has said multiple times that there is no domain authority metric, something that Rand's company Moz became famous for with its Domain Authority (DA) score, which became a way for SEOs and website owners to measure their strength once Google's PageRank toolbar was retired. 

Findings

A Site Authority metric was found in the leaked API documentation. It was also noted that a homepage's authority impacts the whole site's authority. 

Surprisingly, page title structure benefits the whole site, but it is good to know we are on the right path to optimising all title tags. 

Exact keywords in links are important. Link using the exact phrase you want that page to rank for (ensure the anchor text contains your target keyword).

Learnings

It's nice to see the value of the SEO fundamentals validated. Efforts spent uncovering the right keywords, putting them into page titles and headings, and using them prominently in internal linking with keywords help rank those pages, but pleasingly optimising parts can have a macro impact on the site overall, go figure!

What are Twiddlers?

Context

We've never heard of Twiddlers before. Searches on Google turn up a little before the leak.

Findings

Twiddlers are mentioned in the documentation as systems that come after the ranking system and allow it to re-rank the results based on the signals. This is what NavBoost and Glue do alongside many other systems, like QualityBoost, RealtimeBoost and, most notably, the Panda algorithm, which made waves across the SEO world over the last decade as Google strove to clean up low quality content and spam.

Mike found further documentation on Panda and tied this to references he's made in the past from Google patents, postulating Panda to be largely about building a scoring modifier based on distributed signals related to user behaviour and external links. 

Effectively, once a query or topic was on Panda’s watch list, each periodic refresh would look for improved, diverse link signals to ranking pages. It would then scan for good clicks and engaged user behaviour observed by NavBoost. If the score turned up positive, rankings would improve, if not they’d decline. 

Learnings

To give good signals to NavBoost, Panda (and possibly Helpful Content Updates):

  • Focus on delivering a good user experience that keeps users engaged and leaning into other areas of your site
  • Ensure your content remains up-to-date and look for natural paths to expand upon topics (such as guides and buying journeys)
  • Find ways to capture the attention of influential people within your industry as a way to earn links and mentions to your content.

The existence of twiddlers and the focus on measuring user behaviour means SEO's job keeps going after the click, through the analysis of behaviour to determine if it was a good click or a bad click and working collaboratively with the business to ensure the bad becomes good and the good becomes great. If you don't pay attention after the click, you may lose to competitors who are truly customer-obsessed.  

Being an Expert Entity is Important for High-Risk Industries

Being an Expert Entity is Important for High-Risk Industries

Context

You may have heard about the principles of Experience, Expertise, Authority, and Trust (E-E-A-T) and Your Money Your Life (YMYL) put forward by Google. 

E-E-A-T has been noted as particularly important for queries that could put a person's well-being (health, finance, safety) at risk if they obtained the wrong information from an untrustworthy source. 

Google does a lot of work with its quality raters and systems to identify if a query is a YMYL and the more strongly a topic is YMYL (think mental health), the greater the E-E-A-T requirements for websites that want to rank for that query. 

You can read the Google E-E-A-T guidelines here. 

Findings

Authorship is noted as a value, so it'scontent authors are likely recognised and included in Google's entity graph.  

The documentation has a number of metrics for YMYL, likely measuring a query's YMYL on a scale of high to low. There's also a system to predict whether a never-before-seen query should be classified as YMYL. 

It was also noted that Google are vectorising content to identify how different one piece of content is from the overall website’s topical affinity - i.e. Are you talking outside your lane and expecting to rank for another topic? Does this need to be addressed (site authority abuse)? There is a mention of Gold Standard documents, so research and white-papers are worthwhile. 

There is also a score for content effort, where a LLM (Large Language Model like Gemini/ChatGPT/Perplexity) estimates the effort for article pages, likely to measure GenAI usage for content creation.  

Learnings

We know Google has invested significantly in building its entity graph since the advent of semantic search. For this reason, it’s in your best interest to identify the people within your business who are the experts and have them be your brand ambassador on the website to strengthen your expertise and authority in Google's eyes. 

If you don’t have one - you should find one and align them to your brand, having them author content on your website and share it with their audience over time to tie strings between them and your brand (think Kochie and Allianz).

Take on the user experience feedback noted earlier, bring out your ambassador’s Expertise, you’ll get mentions and links (signals of Authority?), add some Trust signals with customer reviews, and you’ll be on your way to being an E-E-A-T evangelist.

What if your industry isn't covered by YMYL, like B2C retail? Should you still do E-E-A-T? Just because you're not covered now doesn't mean Google has no plans to expand. That said, a good user experience, highlighting the expert figures within your business, and showing customer reviews to demonstrate trust will, in the end, make your brand more successful, so they're worth doing.

Summary

There were a few surprises, but we already knew and suspected many things. However, it is nice to have validation that:

  • You can't go past the basics - keyword research, targeting in page titles, and strategic internal linking are crucial to success and very within your control.
  • User experience is important - we can't skimp on it, and we need to focus as much on what happens after the click as we do in securing the click. Look carefully at where in our site and journey we may be losing people and try to turn bad clicks to good clicks by improving satisfaction, utility and usefulness.
  • Link quality is important - we need to find ways to get the best ones that will also be a source of referral traffic (and that's with great content people will want to link to, not paying $100 for a link no one will see).
  • We need our business and industry experts to be the champions of the brand and the authors behind our YMYL content.

If this struck a chord with you, and you need a partner to help you evolve your search approach, get in touch with our team of SEO evolutionists.

*site authority abuse = using your strength to rank for things outside your topical bounds

Insights & News

Insights 1.

The benefits of SEO for D2C eCommerce brands

All you need to know about an eCommerce SEO strategy, to increase organic traffic to your website.
Insights 1.

Navigating Google's latest algorithm updates: how outdated SEO Strategies do more harm than good

Dive into our guide on Google's latest algorithm updates and effective tactics for today's digital landscape.

Stay informed with our latest insights & news

Our Offices

Sydney.

Gadigal Country
Bay 7, 2 Locomotive Street
South Eveleigh, NSW, 2015

Visit your Sydney office

Melbourne.

Wurundjeri Country
Level 6, 650 Chapel Street
South Yarra, VIC, 3141

Visit your Melbourne office

Brisbane.

Turrbal and Jagera Country
200 Adelaide Street
Brisbane City, QLD, 4000

Visit your Brisbane office
l

Contact Us