🔥 BLACK MONTH - 50% off all titles! Use code:

What We Learned About Links and Link Building from the Google Leak

Deep analysis of over 1,600 pages of leaked Google documentation reveals the inner workings of PageRank, link quality, and ranking factors

Henrik Bondtofte
June 17, 2024
14 min read
Link Building, SEO
Google leak analysis

AI Summary – Key Findings

Written by Henrik Bondtofte, SEO expert with 20 years of experience. Updated June 17, 2024.

  • PageRank is still alive and well, central to link and ranking assessment
  • Source quality is crucial – links from pages with traffic and high credibility weigh heavier
  • Fresh links in new publications get higher weight than older links – especially from high-quality documents
  • Anchor texts and context are analyzed in detail – Google looks at position, word choice, and placement in content
  • Link spam is identified automatically – even from otherwise trustworthy pages if text and destination seem spammy
  • External links are valued higher than internal when it comes to authority and credibility

You've undoubtedly heard about it: a massive amount of information has been leaked from Google, including details from organic Google search. This gives us insight into the mechanisms behind PageRank and other link-related factors that affect a website's ranking in search results. In this article, I'll take a closer look at the leaked details, focusing on links and link building.

It has taken time to make sense of this leak, which consists of over 1,600 pages filled with technical documents and countless rules and parameters. I have closely studied everything related to links and link building, and will address other topics from the leak at a later time.

1. PageRank: Weight and Quality

Key Insight

As I've proclaimed many times before, PageRank is of course still an active part of Google, so it comes as no surprise that PageRank is mentioned numerous times in the documents. PageRank is the popularity algorithm that made Google popular; it's the foundation of their entire search engine.

You can read much more about the PageRank concept in this article, where I go in-depth on the subject.

PagerankWeight

This parameter indicates the weight stored in linkmaps for PageRank. This weight affects the ranking of pages based on the quality and relevance of links. The higher the PageRank, the more influence the link has on a page's placement in search results.

Linkmap

A linkmap is a detailed mapping that stores various attributes and metrics for links between pages. In addition to PageRank weight, a linkmap also contains link attributes (e.g., nofollow), anchor texts, and much more. The linkmap is used to calculate the concrete PageRank score for a given page (strength). Not to be confused with Google's Link Graph, which is used to represent the overall structure of the internet's link patterns.

SourceType

Indicates the quality of the anchor's source page (the page that links). The sourceType attribute registers the quality of a link's source in relation to the tier the content is in. In short, this means that the higher the indexing tier a page has, the higher value links from that page are expected to have. At the same time, links are classified as either TYPE_HIGH_QUALITY, TYPE_MEDIUM_QUALITY, or TYPE_LOW_QUALITY, based on the link's indexing tier.

Quality and Traffic

Perhaps somewhat surprisingly, this is based on the number of clicks a given page receives from organic results. This means that a link from a website that receives many clicks from organic results has a greater effect than a link from a page with few or none.

The question is whether PageRank even flows through links that come from pages categorized as low quality? My bet is that it doesn't. This confirms the thesis that links from pages that themselves have traffic are better links than links from pages without traffic. Perhaps not particularly surprising, but still a topic that has been widely debated over the last 5-10 years.

3. Internal vs. External Links (isLocal)

This bit indicates whether the anchor's source and target pages are on the same domain, i.e., whether it's an internal or external link.

Internal Links

Used for cohesive structure, PageRank value distribution, and user navigation within the same website.

External Links

A stronger indicator of a page's authority and credibility, as recommendations come from other websites.

4. Scoring and Ranking

AggregatedScore

This is a score aggregated from all sources, which likely includes various signals such as PageRank, relevance, timeliness, and other factors that affect the overall score of a link. This aggregated score provides a comprehensive picture of the link's value as a whole. The score will therefore differ depending on what is being linked to and on which website.

TopicalityWeight

The topical weighting assigned to each link is influenced by both the PageRank from the linking page and the relevance of the anchor text. This means that a link from a relevant page with high PageRank has greater value.

This also confirms that anchor text relevance affects weighting – so yes, anchor texts still have great significance.

5. Anchor and Hyperlink Data

Google uses data about anchor texts and links to uncover how links are structured and used in the content where the link is found. That is, where it's placed, what it stands near, whether it's relevant to the website being linked to, etc.

In the leaked information, we find these factors:

ByteEnd and ByteStart

Index for the last and first byte covered by the hyperlink. This information is used to identify the start and end of the link's placement in the overall content. This information is essential for delimiting the exact area that the hyperlink covers. This can be crucial for analyzing how the link affects the text's structure and readability.

Phrase

Index for the first and last token covered by the hyperlink. Tokens refer to words or word parts in the text, and by knowing the indices for these, one can get a detailed understanding of which specific words or phrases are connected via the hyperlink. This makes it possible to analyze the link's context in a more granular way.

URL

The absolute URL that the link points to. This is the full web address that specifies the destination for the hyperlink. Having the exact URL is crucial for being able to evaluate the link's target and its relevance to the content. The URL not only provides a destination but can also reveal information about the domain's authority, relevance to the topic, and contribute to understanding the overall link profile's value.

6. Homepage Trustworthiness (homePageInfo)

The homePageInfo attribute in the AnchorsAnchorSource module is crucial for assessing the value of a link based on the trustworthiness of its source, especially the homepage of the source website.

What is homePageInfo?

The attribute provides information about whether the source page for a link is a homepage and its level of trustworthiness. The possible values for homePageInfo are:

NOT_HOMEPAGE

The source page is not a homepage

NOT_TRUSTED

The homepage is not considered trustworthy

PARTIALLY_TRUSTED

The homepage has a moderate level of trustworthiness

FULLY_TRUSTED

The homepage is fully trustworthy

Role in Link Evaluation

Trustworthiness Evaluation

If the source page is the homepage, homePageInfo directly assigns a trustworthiness value (NOT_TRUSTED, PARTIALLY_TRUSTED, FULLY_TRUSTED).

Weighting Mechanism
  • • Full trustworthiness: Links from fully trustworthy homepages likely receive higher weight
  • • Partially trustworthy: Links from partially trustworthy homepages receive moderate weight
  • • Not trustworthy: Links from untrustworthy homepages receive lower weight

SEO Implications

  • Earning Links: Getting links from fully trustworthy websites, especially their homepages, can significantly increase a website's credibility and ranking in Google's search results.
  • Source Trustworthiness: The trustworthiness of the source page's homepage is crucial for determining the overall quality and weight of its outgoing links.
  • Impact on Target Page: While homePageInfo concerns the link's source, it ultimately affects the target page's link profile and perceived authority based on the trustworthiness inherited from the source's homepage.

Summary

The Google leak confirms that PageRank is still a completely central part of Google's algorithm, where links with high PageRank have great influence on a page's ranking. At the same time, it documents that links from traffic-rich pages that receive many clicks from organic search results have greater value. Source quality is crucial: links from high-quality pages weigh heavier than links from low-quality pages. Furthermore, fresh links from newly published articles have more weight than links from older content.

Google uses linkmaps to assess PageRank, where details such as link weight, link attributes, and anchor texts are included. Internal links support navigation and PageRank distribution internally, while external links contribute to a website's authority and credibility. Links are evaluated based on an aggregate score that includes PageRank, relevance, and topicality, where the topical weighting of links depends on both PageRank and the relevance of the anchor text.

Google also analyzes anchor texts and hyperlink data to understand their context. Link spam is identified and demoted, which can reduce the value of links from trustworthy sources if they're deemed "spammy".

Key Takeaway

Overall, the leak emphasizes the importance of getting links from relevant, traffic-rich, and trustworthy sources, as well as ensuring that anchor texts and link context are of high quality and not manipulative in any way.

Want to Learn More About Link Building?

Now that you understand how Google evaluates links, learn more about practical link building strategies and best practices in my comprehensive guide.

Read the Link Building Guide