"Only Google is allowed to scrape the web." If I'm not mistaken, the plaintiffs ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		1vuio0pswjnm7 15 days ago \| parent \| context \| favorite \| on: Disrupting the largest residential proxy network "Only Google is allowed to scrape the web." If I'm not mistaken, the plaintiffs in the US v Google antitrust litigation in the DC Circuit tried to argue that website operators are biased toward allowing Google to crawl and against allowing other search engines to do the same The Court rejected this argument because the plaintiffs did not present any evidence to support it For someone who does not follow the web's history, how would one produce direct evidence that the bias exists

SkiFire13 15 days ago | [–]

> For someone who does not follow the web's history, how would one produce direct evidence that the bias exists

Take a bunch of websites, fetch their robots.txt file and check how many allow GoogleBot but not others?

1vuio0pswjnm7 14 days ago | [–]

Common Crawl provides gzipped robots.txt collections

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact