谷歌 News Algorithm Updated

分享是关怀!

的image above, from 谷歌’的专利显示出的来源等级已不再以该专利首次发布时的方式出现。算法会随着时间而变化,而这一点可能会发生变化。

都没有 的Nation 要么 电脑世界 应该写专利。期。决不。在过去的几天里 电脑世界 发表了“breaking 新闻” story about the publication of a 谷歌 patent application from 9 months ago (not breaking 新闻). 的Nation 写了一个后续故事 电脑世界‘的故事,并犯了同样的错误。

当他们本来应该感到恐惧时,他们都感到乐观。

他们不是’•发布有关9个月的专利,而不是10年的专利的信息。如果他们曾经写过有关专利的文章,他们会知道的。 --

这是一个容易犯的错误,大多数记者都会犯这样的错误’对专利知之甚少’请寻求帮助的人。它’s a story about how 谷歌 ranks stories in 谷歌 News, 和 what signals they might look at when deciding which source to feature out of a cluster of similar stories about similar topics.

的Nation电脑世界 没有’不知道在哪里看(在专利中),最终在2003年陷入困境。

In September of 2002, 谷歌 launched its 新闻 service, 谷歌 News. 的新闻 service ran as an automated system where the stories displayed were chosen algorithmically. In September of 2003, 谷歌 filed the patent, Systems 和 methods for improving the ranking of 新闻 文章s. 的patent describes how the automated 新闻 service might work, 和 how different 新闻 sources might be ranked when they publish a 新闻 story that might be substantially similar to other stories from other sources.

It took almost 6 years until the 谷歌 patent was granted, 和 I wrote about it the day after it was granted (breaking 新闻, there) in the post 谷歌 News Rankings 和 Quality Scores for News Sources.

In my post, I wrote about how 谷歌 might create a source score for 新闻 sources that 已发表 新闻 文章s. These included such things as:

  • Circulation statistics of the 新闻 source
  • 的size of the staff associated with the 新闻 source
  • 的number of 新闻 bureaus associated with the 新闻 source
  • Original named entities appearing in 文章s produced 通过 the 新闻 source

In the conclusion to the post, I questioned many of those assumptions 和 others from the patent (the role of traditional 新闻 agencies was already changing on the Web):

For instance, if a breaking story came out about a discovery in Physics, 和 a reputable 和 well-respected site on Physics News 已发表 an insightful 和 detailed 文章 on the discovery, it’s possible that could be a better source for the topic than a 新闻 source which may have written about the discovery first, has many more reporters 和 much wider circulation, gets seen 通过 a much more international audience, has a wide number of 新闻 bureaus, has been publishing since the 1800s, 和 was written 通过 someone who doesn’t know much about physics at all.

2012年2月, 新 version of the 谷歌 patent 已作为未决申请发布。 (第二个版本是 2012年授予). 的third version has the same name as the first version, 和 it has substantially the same description section as the first version. What’s different is the “claims” section. 的claims section of the 新 version of the patent starts with:

1-31。 (取消)

像“circulation statistics of the 新闻 source,” the “number of bureaus associated with the 新闻 source,”和其他与新闻类型有关的事情’s done in print.

那’s not how 的Nation 要么 电脑世界 看到了。

的Nation 已发表 Patent Offers Clues on How 谷歌 Controls the News 今天早些时候。它’s based on a 电脑世界 post from yesterday titled, An inside look at 谷歌’s 新闻-ranking algorithm

正如我们在 电脑世界 帖子:

的metrics cited in the patent application include the number of 文章s produced 通过 a 新闻 要么 ganization during a given time period; the average length of an 文章 from a 新闻 source; 和 the importance of coverage from the 新闻 source.

Other metrics include a breaking 新闻 score, usage patterns, human opinion, circulation statistics 和 the size of the staff associated with a particular 新闻 operation.

的Nation 复制文本 电脑世界。它还添加了此部分,并用斜体将其部分’ll reproduce):

A tenth metric may include a value representing the number of 要么 iginal named entities the 新闻 source produces within a cluster of related 文章s…[this is worthwhile because] if a 新闻 source generates a 新闻 story that contains a named entity that other 文章s [on the same topic] do not contain, this may be an indication that the 新闻 source is capable of 要么 iginal reporting (添加了重点)。

的claims section of this third version of the patent ignores many of the metrics listed in the 要求 of the first version 和 listed in the description section of both. 的Nation电脑世界 are reporting 10 year old 新闻, 通过 reporting upon what was 要么 iginally filed 通过 谷歌 in 2003.

的role of the traditional 新闻 agency has changed significantly since then in how 谷歌 ranks 新闻 文章s. It may not be completely dead, but it’s much more likely that an online 新闻 source that breaks a story will stand a chance of ranking ahead of an agency with large print circulation stats, 新闻 bureaus, 和 large staffs of reporters.

如果您想更多地了解专利之间的区别,请找一个知道如何阅读专利的人。

新增2013年2月23日,@ 9:30(est) –不幸的是,我希望 电脑世界的Nation 打算撰写有关专利申请的文件,他们看到了去年的一个仍在申请中的版本,他们将对其进行检查以查看其是否已获得批准,并将链接到已授予专利的版本。他们没有’在这种情况下,请执行此操作。

的latest version of 谷歌’的专利已于2012年12月11日授予,可在以下位置找到: Systems 和 methods for improving the ranking of 新闻 文章s (美国专利8,332,382)。我不应该’没有依靠他们,应该检查一下自己。

I’我们已被问及以上有关新版本的声明部分,以及“canceled”本节从我上面提到的内容开始。那’已在专利的授权版本中删除,并且 ’在这个新版本的声明中,仍然没有提及发行订阅和新闻社等内容。我将从USPTO PAIR数据库中下载原始文件(一次一页),将这些页面放在一起,并在此处创建一个副本,以便我们看到。它最初遭到了非最终拒绝,因此进行了修改,因此我也可以下载它们。– Thanks.

新增2013年2月23日@ 11:47(est)

I’ve downloaded the 要么 iginal 要求 from the USPTO Public PAIR (Patent Application Information Retrieval) database 和 amended 要求 for the patent, as well as a request to amend the 要求.

的original 要求 filed for the patent did include language that would favor a traditional 新闻 agency, such as considering circulation subscriptions, number of 新闻 agency bureaus, 和 so on. 的amendment request came before any action at all 通过 the USPTO, 和 the 要求 were amended 通过 removing the sections that do appear to favor an older model of imputing more credibility 和 reputation to a traditional 新闻 agency model:

Original Claims for the Third Version of the 谷歌 News Patent Filing (pdf)296 KB
修改要求 (pdf)26 KB
Amended Claims for the 谷歌 News Patent Filing (pdf)121 KB

的third document is the one that includes the “1-31. (canceled)”我在上面引用的语言。

This amendment to the 要么 iginal 要求 filed with the patent took place at the start of the patent case 和 was an intentional decision on the part of the filers from 谷歌 not to include the language that would favor traditional 新闻 agencies. Not sure why 谷歌 没有’最初提出了现在第一次出现在专利中的权利要求,但是这些新的权利要求’t回应美国专利商标局对权利要求的拒绝或任何其他行动。

In other words, 谷歌 intentionally left out things like 新闻 bureaus 和 subscription circulations from the 要求 that were considered 通过 the patent office in this third version of the 谷歌 News Patent.

分享是关怀!

关于19条想法“Google新闻算法已更新”

  1. 您’在专利方面,我一直都是有先见之明的,在这里,您将聚焦于传统媒体注定要失败的原因。它’s not the patent, it’甚至由于无法/不愿做充分的工作来理解和分析导致其最终过时的专利的专利,其行业仍然无法适应媒体的快速变化。

  2. 谢谢,杰里米。

    在阅读了David Amerland在Google Plus上发表的题为Google News Algo的Google Plus帖子后,我于今晚早些时候阅读了Computerworld和The Nation的帖子,’有人告诉记者们正在分享Google的故事’新闻排名系统是一个非常积极的信号。他们没有’t know that the sources they were relying upon were telling them good 新闻 from 2003. News that had unfortunately changed with the changes to the 要求 section of the continuation patent.

    传统的 media is doomed, 和 you’re right that it’s not the patent. 的patent echoes a change in society.

  3. pingback: Are These 谷歌’s Ranking Signals For 谷歌 News? | WebProNews
  4. I’m sure the webpro 新闻 mis-spelling of your name 没有’g?或进一步强调这些天新闻质量差。

    比尔,我们’ve not met but I’长期以来一直是您的读者和忠实支持者。我必须保持联系。菲尔

  5. 我同意杰里米的观点。‘Traditional’媒体是恐龙,它没有’谁打破这个故事,只要它’s factual 和 true.

    Being a cynic I would love to see a publisher of 新闻 suffering a ranking penalty for misleading 和 distorted 新闻. 那 would set the cat among the pigeons.

  6. 不能’不由得笑了;当今时代的新闻业和书面语处于衰落状态。从“old is 新”像你这样的故事’ve pointed out, Bill, to 新闻 items 和 blog posts just oozing with the now-typical-and-expected grammatical 和 spelling errors, not to mention the latest fad: the “random missing word”综合症。事实检查,编辑,打样–所有这些似乎是过去的任务,都是为了使故事首创。而且,这绝不是网络/数字媒体领域独有的。

    如果在不远的将来的某一天,没有人能像作者本来打算写的东西一样头或尾,该怎么办?消息是什么?有人听懂吗?更重要的是,还有人听吗?

  7. 嗨,菲尔,

    错别字发生。克里斯拼错了我的名字。 --

    幸运的是,他正确地找到了指向此页面的链接,并竭力更新自己的文章。

    那’s better than what I’ve seen at 电脑世界, 的Nation, 福布斯, 和 the Guardian, who all seem to be in their own happy little worlds thinking that 谷歌 likes them, based upon not understanding that the ranking criteria 谷歌 used in 2003 (which they all are citing as something “new”) is no longer the criteria that 谷歌 uses.

    看到:

    谷歌 News: the secret sauce

    复制于作者’s blog:

    谷歌 News: 的Secret Sauce

    福布斯报道“Monday Note” 文章:

    Why Publishers Need to Stop Worrying 和 Learn to Love 谷歌 福布斯

  8. 嗨,米克,

    I’m beginning to feel the same way as I watch this 谷歌 News patent information, as misguided as it is, flowing through an echo chamber of the media. 是否有记者真正阅读专利的新权利要求部分? I’ve left a few comments, but it looks like they are happy reporting misguided 新闻 about the future of journalism to other journalists.

    谷歌’s 新est version of the patent isn’不要说关于传统媒体的好东西,而且看起来很值得。 --

  9. 嗨,DDWM,

    I know that there are print publications moving their operations completely online, 要么 hiding their online content behind paywalls. 的subscriptions to print versions of 新闻 papers are in states of decline in many places.

    谷歌’s 新 patent 和 ranking signals for 新闻 stories moves substantially away from judging the credibility of 新闻 sources based upon things like how many 新闻 bureaus they might have, 要么 how many journalists work for them, 要么 what their print media subscriptions might be like. If they could figure out how to read a patent correctly, they would know that, 和 they might take this threat a little more seriously.

    正如我在上面的帖子中指出的那样,他们在应该感到恐惧的地方看到了乐观。

  10. 嗨,比尔,一如既往的好帖子。

    你问,“是否有记者真正阅读专利的新权利要求部分?” Sadly, today’s 新闻room deadlines don’不能进行此类研究。记者每天忙于撰写2-3个故事,并准备明天进行采访。他们 ’自从被淘汰以来,重新戴上其他职位的帽子。广播记者正在将30秒的热门歌曲变成报纸的故事,印刷记者正在为电视站起来。

    那’s the state of local 新闻 these days.

  11. 对我来说,这是一个幽默而有趣的故事。在评估哪些来源对我更有价值的评估中,您发现的某些事情是对的。对于我而言,写一篇高质量的文章总是比阅读一篇小说中的故事更令人愉悦。“breaking 新闻” format.

  12. Where is the actual text that is the final version of the criteria for ranking 新闻? Going into the history of how these 新闻 stories were in error, along with all of the versions of the patents, only clouds the issue even more.

    So, the question is, what is the criteria for how 谷歌 ranks 新闻? (No history, no 要求 added in 和 out, 和 not necessarily even referencing the patent, just a list of the actual criteria being used today).

    此处的回复或电子邮件回复非常棒。

    PS –上述PDF的某些链接为空。

  13. 记者经常这样做,当然不会’只是与专利有关。我觉得这很健康“news”. I see a headline, read, 和 it all sounds exciting. Unlike the 新闻papers, I dig deeper, seek the research paper to read 和 draw my own conclusions. So often it turns out the research paper was 已发表 a year 要么 two ago, that people were talking about it in the 新闻 back then, but for some unknown reason it gets picked up again 和 re-circulated as “news”. Annoying, as it wastes my time! Nobody likes it when you point out that it is really old 新闻 either. Such is life.

  14. 嗨,T。

    的actual criteria for ranking 新闻 isn’t something that 谷歌 has devulged, 和 it’他们完全不可能。

    的criteria from the description section of the 谷歌 News patents is exactly the same for each of the three versions, 和 I wrote about those in detail back in 2009 at: 谷歌 News Rankings 和 Quality Scores for News Sources (我在上面的文章中链接了它)。

    您’re welcome to spend time reading that. They are the same signals that Computerword 和 的Nation, 和 福布斯 (twice), 和 CBS Marketwatch are reporting as if they were something 新. It’s possible that 谷歌 is looking at other signals as well, 和 has stopped looking at some they were looking at in the past.

    的“claims” section of the 新est version of the patent (linked to above) tells us the kind of criteria that 谷歌 may be looking at now (no absolute guarantees – I don’t work for 谷歌, so I can only describe what I see within the patents). If you want to see those, go read the patent.

    链接到文章底部的PDF文件在那里,几分钟前,这些链接对我来说很好用。其中一些文件确实很大(这就是为什么我列出了它们的文件大小),您可能需要稍等片刻,但是它们确实可以工作。对不起,你不能’打开它们,但它们确实起作用。

  15. 嗨安德鲁,

    看起来Computerworld的记者试图去找哥伦比亚大学的新闻学教授做一些研究,但是如果他把它交给专门研究知识产权的律师或律师助理或更多了解专利的人,可能会有所帮助。

    根据他的故事进行报道的人可能也试图进行事实核查,但不要’t appear to have dug too deeply. I do wonder how often something like this happens in the 新闻 when a story requires some specialized knowledge about things like medicine 和 science 和 the law. 🙁

  16. 比尔,非常感谢您的答复– I do appreciate it!

    It sounds like the folks at 谷歌 have effectively muddied the waters. 🙂 Even though they won’如果任何人都可以弄清楚哪些段落仍然适用,则不承认任何内容,其中很多信息都通过专利隐藏在普通视图中。话虽如此,正如您所提到的,这是一个极其复杂的问题。

    再次感谢…

  17. 嗨,T。

    您’re welcome. I think it would be a mistake to blame 谷歌 here,and they 没有’t muddy the waters –《计算机世界》上的文章确实如此。

    谷歌以人们已经存在多年的方式申请了一项延续专利,其描述部分与2003年的原始专利相同。’它是如何做到的。他们不是’不得发布旨在宣传Google新闻甚至Google的新闻稿或营销文档。权利要求部分是不同的段落,是专利局人员在决定是否授予专利时要看的内容。索赔部分与原始部分有很大不同。

  18. 通讯员经常这样做,当然这不仅发生在专利中。我在“新闻”健康方面看到了很多。我看到一个标题,研究,这一切似乎都很有趣。与杂志相反,我进一步挖掘,搜索分析文档以研究和草拟自己的结果。因此,它经常会改变分析文档已经在一两个季度之前发布的情况,当时人们在返回的信息中引用了该文档,但是出于某些未确认的目的,它再次被抓住并作为“新闻”进行分发。令人沮丧,因为它浪费了我的时间!当您认为它确实是旧信息时,没人会喜欢它。这就是生活方式。

评论被关闭。