官方网投引擎如何分析网站的链接结构

分享是关怀!

官方网投引擎如何理解网站的链接结构?他们是否有办法对网站页面上看到的单个链接和链接块进行组织和分类?

Do they treat links and collections of links that they find on more than one page of a site differently than links and collections of links only on one page? If they find more than one group of links on a page that contain many of the same links, though at the top and bottom of the page, how 威力 they treat those links?

去年夏天,我遇到了Microsoft提交的专利文件,该文件探讨了链接结构的概念。它没有’引起了很多关注,所以我决定在这里仔细研究一下。

分段和链接块

在2002年的论文中, SmartView:增强型移动设备文档查看器 (pdf), a couple of Microsoft researchers discussed how web pages 威力 be analyzed and partitioned into smaller logical sections to be viewed on small devices, such as handheld phones. These smaller sections could be selected 通过 a viewer and seen independently from the rest of a web page. One of the authors of that paper is listed as an inventor of the Microsoft patent, and the paper is cited within the 链接结构 patent as an example of how web pages 威力 be segmented in a way that benefits the viewers of a page.

专利申请中提到的另一个网页细分过程是称为Microsoft的过程 VIPS:一种基于视觉的页面分割算法. The paper describing this process was published in 2003 and explores a way of looking at the HTML of a page, along with a visual inspection of white space, horizontal rules, and other visual aspects of a web page that 威力 indicate that a page is broken down into different logical sections.

微软的另一篇论文’t mentioned in the patent filing, but which seems relevant, is one that explores how links from different blocks on a page 威力 be treated differently based upon where they are located on that page. The paper is 块级链接分析,其中还介绍了块级PageRank的概念:

本质上,块级PageRank(BLPR)与原始PageRank算法相似。它们之间的主要区别在于,传统的PageRank算法在页面级别上建模Web结构,而BLPR在块级别上建模Web结构。

那篇论文和微软的其他论文做了什么’t explore in much depth is how different blocks of links 威力 be related to each other. They don’t try to explore in any depth how links on a site 威力 be related to one another, and how the pages of a site 威力 be organized based upon links between the pages of a site. Looking at link blocks on a site, classifying them, and organizing them may yield some useful benefits.

Once a page is broken down into different segments, such as headers and footers, sidebars, main navigation bars, main content 地区, advertisement blocks, etc., the relationship between links in those segments across the site 威力 be explored.

分类链接

To classify links and link blocks, a search engine would start 通过 analyzing the layout of individual pages to identify candidate link blocks and see where they occur on pages, and how they 威力 relate to each other. This 链接结构 analysis is used to create what the patent refers to as a Link Structure Graph or LSG.

那里 are three main purposes for creating an LSG:

地区性 –标识站点的全局链接结构,以及各个页面周围的本地链接结构。

完整性 –了解站点的完整链接结构,包括用于组织站点内容的导航结构和逻辑结构。

导航结构是一致的,并且易于遵循链接的排列方式,从而使访问者可以前往站点的不同部分。高级全局导航结构通常出现在站点的所有(或大多数)页面上,辅助(甚至更低级别)的导航结构也可以允许访问者浏览站点页面的不同部分。

除了导航链接之外,网站还可以包含结构元素中的页面链接,例如指向链接的列表。“best sellers”在电子商务网站上,或“most popular posts,” on a blog.

可扩展性 –该算法可以在大型和小型网站上高效运行。它还查看可能出现在多个页面上的链接块,并将它们彼此关联,而不是在其他页面上找到它们时将它们视为新链接。

Some link blocks may appear more than once on the same page in different segments, with minor variations, and they may be merged together. For instance, the same or a substantially similar link menu 威力 be shown at the top and bottom of a page in the main navigation area and a footer navigation area.

在基本相似的链接块可能已经合并在一起之后,剩下的被认为是链接块“unique” are classified. Classification is based upon the function of a link block, and 威力 be described as being one of the following three types:

S节点 –这些是组织和导航链接块;通常在具有相同布局的页面上重复并显示网站的组织。它们通常是没有链接的列表’通常包含其他内容元素,例如文本。这些块是结构链接块或s节点。

C节点 –这些是内容链接块,按某种类型的内容关联(例如,与同一主题或子主题相关)组合在一起。这些块通常指向信息资源,’可能会在一页以上重复。

智能节点 –这些是孤立的链接,它们是页面上没有’链接组的一部分,由于它们在文本的同一段中一起出现,它们之间可能只是松散相关。页面上出现的每个链接’t classified as s-nodes or c-nodes 威力 be seen as a single collection of links and given an i-node classification. Each link on a page 威力 be considered an individual i-node, or they 威力 be grouped together 通过 page as an i-node.

If you were to look at a number of pages on different web sites, you 威力 not find it too hard to do this kind of classification for the links on those pages.

为什么要分类链接?

那里 are a number of reasons for the classification of links on a site. The paper on 块级链接分析 mentioned above tells us that links in different blocks 威力 be given different 重量s for ranking purposes. Understanding the 链接结构 of a site can also help when different parts of a site 威力 also be displayed on a handheld device with a smaller screen. But there are also other potential benefits that are described in the patent filing:

1) 链接 to other pages that 威力 be related to a page shown may be more easily uncovered. This patent doesn’t mention the use of quick links, but it does tell us that it 威力 present information about pages that are related and make it easier to navigate to those pages on a site. These 威力 be used with a personalization approach to uncover pages that 威力 interest a specific visitor, or be based upon an approach that increases a visitor’s ability to navigate to pages that 威力 not be directly linked to pages offered 通过 search results.

2) Internal linking information collected 通过 the search engine 威力 be offered to the site owners to allow them to optimize their use of links and to see statistics about visits between pages of a site.

3) The linking information 威力 be useful in the automatic tagging of web pages on a site.

For example, a page about Cars 威力 include category pages about specific brands of cars, then subcategories about specific models, and the specific product pages about car parts. Understanding the 链接结构 of a site can mean that the higher level link text of parent pages 威力 be used to help tag the lower level pages. So if a category page is pointed to with the anchor text “Ford”并且它有一个与锚文本链接的子类别“mustang parts,” which points to a page about a specific product page for brake pads, the brake pad page 威力 be automatically tagged with the terms “ford,” and “mustang parts.”

4) Like the automatic tagging above, internal links and anchor text between pages 威力 also be used to create a concept hierarchy for a site, which can then be compared to other sites containing similar concepts.

Using my car part site example, from the previous section on automatic tagging of pages, a hierarchy of concepts 威力 be created about a site offering car parts. That site 威力 be compared to other sites that may use similar terminology and which could even have a similar concept hierarchy. Those sites 威力 then be clustered together 通过 a search engine.

5) Anchor text in links found on a site 威力 be presented to viewers to help them navigate through the pages of a site in a sidebar, or in a kind of sitemap reflecting the 链接结构 of the site.

专利申请是:

网站结构分析
Natasa Milic-Frayling,Eduarda Mendes Rodrigues和Shashank Pandit发明
分配给Microsoft
美国专利申请20080134015
2008年6月5日发布
提交日期:2006年12月5日

抽象

通过标识网页上的链接块来生成网站的图形表示。链接的每个块均由图形表示中的一个节点表示,并且节点之间的连接提供有关页面之间的块重用的信息。

结论

I’ve provided a high-level overview of a process described in the patent filing on how a search engine 威力 use a segmentation process to identify link blocks on a site, possibly merge some of those blocks together, and then classify the link blocks that they’ve found. The patent goes into more details on what it 威力 look for in creating those blocks, and merging them together, and then in classifying them.

该专利还提供了将链接分段和分类为链接块的一些可能的好处,但它可能还有其他一些好处’t detail, such as if the search engine 威力 give different link-based ranking values to links found in different kinds of link blocks.

The patent also describes how it 威力 include data collected about the links on a site from monitoring the use of those links from visitors to pages, though it doesn’不会深入探讨该方法背后的过程。

此专利申请中的过程来自Microsoft,’Microsoft在索引网页时可能会使用类似的过程。它’其他主要官方网投引擎也可能会对站点上的不同链接进行某种类似的分析,并根据它们在站点布局中的位置以及它们提供的功能进行分类。

在我的上一篇文章中, 创建一个SEO库存,其中有一个专栏“Navigation Location”您可以在其中列出指向同一网站上其他页面的特定页面的链接的种类和位置,例如徽标链接或主导航链接。您可能需要考虑在类似于本Microsoft专利申请中所描述的框架的框架中列出的页面的链接,这些链接来自它们可能适合的分类。

分享是关怀!

关于47条想法“官方网投引擎如何分析网站的链接结构”

  1. 得到’告诉你这很无聊’s a nice post 😉

    是否有任何更好的指标可以用来确定如何分析页面上的链接块以确定它们如何使用的因素‘perceived’? I’我很想知道他们是否提供了有关在大量文档中(例如,在给定网站上)构成这些节点类型的信号的信息。

    来自都柏林的Rgds
    理查德

  2. 谢谢,理查德

    在分析和分类链接块时,讨论了许多因素,而我’m guessing that there probably are a good number that 威力 be used that aren’包含在专利申请中。官方网投引擎使用S节点块来了解站点的结构。 C节点块可以帮助官方网投引擎更多地了解网站上以某种方式相关的链接,例如相关主题。

    Here are some of the factors mentioned that 威力 be helpful in classifying blocks.

    对于S节点链接块:

    1.链接之间可以允许使用数量非常有限的文本或非字母数字字符,例如竖线(|)符号。

    2.链接块通常包含所有内部链接。

    3.这些块显示在一页以上(网站上页面总数的特定阈值之上)。

    4. When there are a couple of S-Node type blocks on a page, and one is a subset or superset of the other, they 威力 be merged together. So, for instance, a group of quick links in a footer that all lead to main category pages, also listed in a main navigation bar 威力 be merged into one link block.

    5.彼此非常相似的不同页面上的链接块,例如,例如,不包含指向它们出现在页面上的链接的链接块,可以被视为同一链接块。

    6. Separate link blocks 威力 be created for links in drop-down submenus on main navigation bars, especially if there are lots of links on a site pointed to in the main navigation and sub-menus.

    7. Where all of the links in a link block are internal references to links on the same page (such as questions on a 常问问题 page that link to answers on the same page), they 威力 be considered as a C-Node block. If that block of links also appears on some other pages of the site (pointing to the links on the other page), then it 威力 be considered an S-Node block.

    8. Some S-Node blocks may be given more 重量 than others as being indications of the structure of the site, based upon such things as the number of links being pointed towards within the site from the blocks.

    I’m pretty sure that I’我将再读几次该专利申请– there’那里很多,我’m sure that I haven’尚未获得使用所涉及方法的所有含义。

  3. 嗯本主题涵盖了我最喜欢的主题之一:内部链接。 --

    我现在可以看到它,并在明年的SEO论坛和博客中传播:“关于如何在每个页面上创建C节点链接的10条提示!”

  4. 嗨,迈克尔,

    It’也是我最喜欢的主题之一。 --

    我喜欢这个专利申请为我们提供了一个讨论内部链接的框架,但我希望我们不要’限制自己探索其他可能性和方法。

  5. “关于如何在每个页面上创建C节点链接的10条提示!” – mmm catchy!!

    欢呼比尔,深入,并且给了我很多像往常一样思考的机会。我想我’我将重新审视该文章,并在其下沉后再加以思考!

  6. 很棒的帖子。首先,链接是如此重要,令我惊讶的是,很少有人正确使用内部链接。

    但是正如您的汽车零件示例所指出的那样,这可能是官方网投引擎更好地了解网站而不只是页面整体主题的一种好方法。

  7. pingback:»林拜特·德·贾尔| seoFM-官方网投引擎优化和在线市场营销人员的德国PodCast播客
  8. Bing似乎正在将其移动官方网投应用程序中的内容划分。他们将页面分成几部分,而不只是显示网页。看起来他们有很多这样的东西,并且不会’如果他们以许多不同的方式使用它,请不要感到惊讶。

  9. 罗伯特你好,

    I’令我惊讶的是,看到更多的网站没有仔细考虑它们的链接结构,而锚文本曾经指向这些链接的层次结构中的页面。我相信官方网投引擎早在提交此专利申请之前就已经在研究这种结构和组织。

  10. 嗨,韦斯,

    I’曾经看过Google和诺基亚等公司的论文和专利申请,内容涉及如何拆分页面以在网页上展示。它看起来确实像根据我们使用哪种设备查看Web时所显示的内容完全不同。那’对于网页设计师来说,这是一个有趣的挑战。

  11. 皮带文章。我什至无法确定官方网投引擎会考虑多少数据来决定应在SERP中放置页面的位置’s. It’令人着迷,同时又是我喜欢的艰巨任务。

  12. 法案,

    That means an additional layer of quality 威力 be superimposed on the linkage data which makes a site rank well. So despite your PR bar showing a bias towards green, the actual PR is small since the links are originating from comment spamming or footer navigation or something similar.

    如何根据此处理去往外部站点的页脚链接?是否考虑了链接的意图,还是仅仅因为该块具有最低的实际强度,所以链接没有’t means that much?

    感谢您的精彩发言。

    阿什什

  13. 有趣的东西。鼓励我做的一件事是浏览我所有的链接,并确保它们正确锚定。例如。摘录后,我有类似“read more” as a link. I’我将寻求改变它,使其具有更具描述性的锚点。感谢您的发布

  14. 链接模式是网站排名竞赛中网站竞争力的重要组成部分…。但是内部链接是机器人非常重视的事情…许多内部链接方式奇怪且令人困惑的网页都可能使该网站受到Google等官方网投引擎的惩罚。

    感谢您对此帖子提出了一些疑问-

  15. 李嗨

    好点子。虽然可能(让我强调一下这个词“might”)有助于思考诸如清单“ranking factors” that search engines 威力 use, and attempt to come up with 重量s for those signals, the reality is that the algorithms that search engines use are likely more complex than we 威力 think of when coming up with such lists. Search engines do collect very large amounts of information about the pages that they find on the web through crawling, collecting feed information, and extracting informaiton about facts and objects, and user behavior through search query logs and toolbars and analytics programs and other services that the search engines provide.

  16. 嗨Ashish Roy,

    很好的问题。我们’ve been told that links in different parts of pages 威力 carry different 重量s, or amounts of PageRank, without any indication of how that 重量 威力 be distributed. From this patent filing, we can see that PageRank 威力 not be a strick block level PageRank, where different blocks 威力 carry different amounts of PageRank, since some link blocks 威力 be merged together. 您’re welcome.

  17. 嗨,拉维,

    我不’t know about any “penalities”与内部链接模式相关联,但是我确实认为拥有一个导航结构是一个好主意,该结构使人们在访问站点时更有可能找到所需的内容。如果使用内部链接和锚文本也可以帮助官方网投引擎更好地了解网站的内部结构,那’也是有益的。

  18. I’我一直在考虑和测试“只有代码中的第一个链接才算在内”最近的理论,这篇文章证实了我的想法,很容易将页面上的链接混淆为被忽略,但实际上,上下文和标记会对其影响“weight”. I’ve还注意到将页脚链接的垃圾邮件,编码错误的行更新为语义排列的页脚链接“areas”(请参见seomoz)可直接改善网站的性能(与其他所有因素一样,可能还会涉及其他因素)。

    总而言之,很棒的帖子,对我自己的调查非常有帮助!

  19. 非常有趣,尤其是关于“块级页面排名”’当然是我第一次’我已经看到它提到了。感谢分享。

  20. 法案

    C节点内部链接是我观察到的最喜欢的链接。整体上,内部链接是最重要的事情,它可以帮助您获得所需的SERP,但这是一项非常耗时的工作,而且效果更好。

  21. 嗨,马特,

    非常感谢你。可能有许多不同的问题会影响“避风港”理论的“仅代码计数中的第一个链接”。’进行了很好的探索。如本专利申请中所述,合并链接块的想法只是其中之一。另一个可能会使实验产生偏差的方法是类似基于短语的索引,其中使用锚文本的链接不是’当可以根据那种基于短语的索引对页面进行重新排名时,以有意义的方式与页面上的内容相关的t可能会被赋予很小的权重或被忽略。第三种可能是使用块级PageRank之类的东西,其中页面上不同可视块中的链接可能被赋予不同的权重,或者可能被忽略。

    我很喜欢使用页面上语义正确编码的部分的想法,并同意这些内容可能会产生积极的影响

  22. 嗨,斯图,

    您’re welcome. I like the block level PageRank paper, because it explores the idea that search engines 威力 base rankings that we see on a level other than just pages. At Bing, I’ve最近在官方网投结果中看到了指向各个博客评论的链接(例如: //www.ao-da.com/2009/08/how-a-search-engine-might-analyze-the-linking-structure-of-a-web-site/#comment-190861 ). Is that because they 威力 be using something like block level PageRank? 我不’t know, but it’值得思考和探索。

  23. 嗨,拉维,

    I like that this patent filing gave us a different way of thinking about how search engines 威力 treat the links that they find on a page. I see some potential benefits from the classifications, if it can help a search engine get a better sense of the structure and contents of a site. I’m not sure that I would cite one type of classification as a favorite, or better than the others, since they all have somewhat different purposes. But knowing that a search engine 威力 classify blocks of links in a manner like this, and some of the possible reasons why gives us a perspective that we didn’在专利申请公布之前。

  24. 你好韦德兰,

    谢谢。商业官方网投引擎的运行通常由商业秘密的过程和通过专利保护为知识产权的过程的结合来推动。专利申请书必须以某种方式描述其背后的想法在某种过程或方法中的实际潜在用途,但不要’不必写得如此详细,以至于阅读专利的任何人都可以坐下来,按照一系列步骤提出相同的发明。

    我们在专利本身中提到了一些早期的特定作品,就像我在Smartview和VIPS帖子中链接的论文中所描述的那样。’s possible that some algorithms developed for those methods may be involved, and we are also give some descriptions of processes that the sear4ch engine 威力 use to identify link blocks, and merge some blocks, and other steps. In essence, the patent filing describes some algorithms that are 独特 to it, though not at a super fine level of detail.

  25. pingback:Weekly reading lists | SEO Scientist - Applying the scientific method to SEO
  26. 我们一直在审查自己的内部链接结构,以确定我们是否以最佳方式进行此操作。从官方网投引擎的角度来看,从网站的内容中获得最大的价值在很大程度上取决于您使内容可访问性的程度。这篇文章非常有帮助,也正是我想要的。它不仅在分析我们站点大小的站点的链接结构时很有用,而且对于我的一些较小的站点规模较小的客户来说也很有用。

    我认为人们非常关注页外和页上(就内容而言)优化方法,以至于内部链接结构通常是扎实的SEO策略中被忽略的组成部分。

  27. 嗨,杂耍,

    谢谢。从一个强大的内部链接结构开始,这对于站点的创建真的很有帮助。不幸的是,我在许多网站上遇到了’t been done, they need some serious work. I really liked the classification that was described in this patent filing, because it provided a different way to think about the links that you 威力 present on your pages. Glad to hear that you found it useful.

  28. 非常深入的文章,其中包含一些重要信息。我认为,最好是在考虑离线链接构建之前先钉上您的页面链接。

  29. 嗨,尼尔,

    我完全赞成你。站点所有者对自己站点的结构和内容具有更大的控制权,并为站点构建智能信息体系结构,从而改善了可用性和转换能力,解决了损坏的链接,并解决了在多个URL上找到的相同页面的问题您可以在不依赖其他人的情况下做的事情。

  30. 好文章’s true you need to make sure everything on your own site is perfect first. 您 can get loads of tools which will investigate your site for broken links and bits, stay on top of this and sales conversions will increase.

  31. 保罗,你好

    Thank you. It really does help to fix broken links, remove internal and external redirects, and focus upon building up the quality of your pages with an eye towards increasing conversions or other goals that you 威力 have with your site.

  32. 法案, I think too many people are concentrating on 致富 internet schemes rather than building an infomation rich site that 谷歌 will love anyway. Off page optimisation is important, but its an effort wasted if your site is not well structured for the search engines.

  33. That means an additional layer of quality 威力 be superimposed on the linkage data which makes a site rank well. So despite your PR bar showing a bias towards green, the actual PR is small since the links are originating from comment spamming or footer navigation or something similar.
    如何根据此处理去往外部站点的页脚链接?

  34. 嗨,奈杰尔,

    看来Google是针对那些“get rich” types of sites with updates like Penguin and Panda. The patent that I wrote about in this post is from Microsoft, and it really only discusses some of the efforts that a search engine 威力 take to better understand links from other sites, but it does seem that 谷歌 has started to take more actions against paid links, manipulative linking schemes aimed solely at raising the rankings of sites, and other activities that 威力 be against their guidelines.

  35. 嗨,杰克逊,

    The 谷歌 PageRank toolbar is only updated 3-4 times a year and may not be an accurate reflection of how much PageRank a page 威力 have. It’s also not necessarily an indication of how much PageRank a link from a page 威力 pass along. 您 may want to look at this post, and the comments after it for some thoughts on links from footers and sidebars and so on:

    谷歌’s合理的冲浪者:基于链接和文档功能以及用户数据的链接价值可能会有所不同

  36. 我以为PageRank本身每年仅更新3-4次,而不仅仅是工具栏,尽管我可以’不记得我在哪里读到的。

  37. 嗨罗布

    只是工具栏每年更新3-4次。 谷歌可以更快地更新实际的PageRank。过去Google Dance曾经是每4-5周一次,但现在’他们已经有好几年了’ve更新得很慢。

  38. 感谢您澄清该条例草案。我不知道从哪里读到的,但我怀疑我完全误读了这篇文章。进行更频繁的更新是有意义的。

    完全不相关的是,我喜欢这个网站。

评论被关闭。