Hacker Newsnew | past | comments | ask | show | jobs | submit | 0manrho's favoriteslogin

I have a single <a> element in my website's head, to a route banned by robots and the page is also marked by noindex meta tags and http headers.

When something grabs it, which AI crawlers regularly do, it feeds them the text of 1984, about a sentence per minute. Most crawlers stay on the line for about four hours.


It's easy to just stick them together, but to me (who writes too much Lisp for my own health) this is unsatisfactory.

The dream: just like macro can be seen as a (staged) extension mechanism for Lisp evaluator, there should be an extension mechanism for the static type system, which allows me to define new types, define new syntax (like Haskell do-notation) which makes use of typing environment and expected type of current context (return-type polymorphism), etc.

The reality: very few environments figure this out. In Coalton Lisp macros do work, but only at the level of untyped S-expr. A Lisp macro can't know about types of the variables in the lexical environment, or expected type of its own context. But it quite possibly works fine for the "typescript-like" use case you described.

The problem I see: H-M type system isn't designed with extensibility in mind, and it's hopeless to make it extensible. More technical explanation of why it's hard to integrate with Lisp macro is that H-M relies on a unification-based inference stage which execution flow is very different from macro expansion.

Possible solution: There's no fundamental reason why static type can't have something as powerful as Lisp macro. However first of all you would need an extensible type system, which seems to still be an open research problem. I think bidirectional type system is hopeful -- it's so different from H-M at a fundamental level though that I think it's hopeless to retrofit into Coalton.


We all agree that AI crawlers are a big issue as they don't respect any established best practices, but we rarely talk about the path forward. Scraping has been around for as long as the internet, and it was mostly fine. There are many very legitimate use cases for browser automation and data extraction (I work in this space).

So what are potential solutions? We're somehow still stuck with CAPTCHAS, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].

How can enable beneficial automation while protecting against abusive AI crawlers?

[0] https://arxiv.org/abs/2311.10911


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: