Hacker Newsnew | past | comments | ask | show | jobs | submit | tannhaeuser's commentslogin

Guess what, you're not required to open <html>, <head>, or <body> either. It all follows from SGML tag inference rules, and the rules aren't that difficult to understand. What makes them appear magical is WHATWG's verbose ad-hoc parsing algorithm presentation explicitly listing eg. elements that close their parents originally captured from SGML but having become unmaintained as new elements were added. This already started to happen in the very first revision after Ian Hickson's initial procedural HTML parsing description ([1]).

I'd also wish people would stop calling every element-specific behavior HTML parsers do "liberal and tag-soup"-like. Yes WHATWG HTML does define error recovery rules, and HTML had introduced historic blunders to accomodate inline CSS and inline JS, but almost always what's being complained about are just SGML empty elements (aka HTML void elements) or tag omission (as described above) by folks not doing their homework.

[1]: https://sgmljs.sgml.net/docs/html5.html#tag-omission (see also XML Prague 2017 proceedings pp. 101ff)


HTML becomes pretty delightful for prototyping when you embrace this. You can open up an empy file and start typing tags with zero boilerplate. Drop in a script tag and forget about getElementById(); every id attribute already defines a JavaScript variable name directly, so go to town. Today the specs guarantee consistent behavior so this doesn't introduce compatiblity issues like it did in the bad old days of IE6. You can make surprisingly powerful stuff in a single file application with no fluff.

I just wish browsers weren't so anal about making you load things from http://localhost instead of file:// directly. Someone ought to look into fixing the security issues of file:// URLs so browsers can relax about that.


Welcome, kids, to how all web development was done 25-30 years ago. You typed up html, threw in some scripts (once JavaScript became a thing) and off you went. No CMS, no frameworks. I know a guy who wrote a fully functional client-side banking back office app in IE4 JS by posting into different frames and observing the DOM returned by the server. In 1999. Worked a treat on network speeds and workstation capabilities you literally can’t imagine today.

Things do not have to be complicated. That abstraction layer you are adding sure is elegant, but is it also necessary? Does it add more value than it consumes not just at the time of coding but throughout the entire lifecycle of the system? People have piled abstraction on top of hardware from day one, but one has to ask, if and when did we get past the point of diminishing returns? Kubernetes was supposed to be the thing that makes managing vms simple. Now there are things supposedly making managing Kubernetes simple. Maybe, just maybe, this computer-stuff is inherently complicated and we’re just adding to it by hoping all of it can eventually be made “simple”? Just look at the messages around vibe coding…


yeh, the good old (tm) days :-))

Today you first need AI to figure ot what is the JS-framework-of-the-week and then you need AI to generate all the boiler plate code and then you use AI to debug all the stuff you created :-)


Love the single file html tool paradigm! See https://simonwillison.net/2025/Dec/10/html-tools/

Opus and I have made a couple of really cool internal tools for work. It's really great.


A workaround for the file:// security deny is to use a JavaScript file for data (initialized array) rather than something more natural like JSON.

Apparently JavaScript got grandfathered in as ok for direct access!


once i had to import some xml and just put it in a hidden div since html allows any tag names XD

Wow, I had never heard of that ID -> variable feature

Yeah it was hard to believe when I first learned about it, but it's true. I think I first found out when I forgot to put in a getElementById call and my code still worked.

More specifically it becomes a property of window, which is the global object.

So <div id="hello"> becomes accessible as window["hello"], which means you can just directly write hello.innerText = "Hi!".

Since this may conflicts with any of the hundreds of other properties on window, it's generally not something that should be used.

Historically it wasn't too uncommon to see it, but since it doesn't work well with typescript, it's very rare now.


You can make it work with typescript by declaring it as an HTMLElement without defining it.

It's been there since the beginning but it has several exceptions, like it's not available in strict mode and modules. Ask your ChatGPT if implied globals are right for you.

Also window.document.forms gets you direct access to all forms, "name" automatically attach an attribute to the parents and "this" rebind to the current element on inline event handler.

The DOM API may have been very messy at creation, but it is also very handy and powerful, especially for binding to a live programming visual environment with instant remote update capabilities.


Speaking of forms: form.elements.username is my preferred way of accessing form fields. You can also use a field .form prop to access its connected form. This is fundamental when the field exists outside <form> ;)

You mean there is bidirectional binding between form.elements.username and the UI value? Why did we need React! HTML should have IFs and FOR loops…

I liked learning this so much that I created a VSCode Extension to enable goto clicking and autocomplete and errors for single page html files and type hover so I can properly use it when i am prototyping.

https://marketplace.visualstudio.com/items?itemName=carsho.h...


> Someone ought to look into fixing the security issues of file:// URLs

If you mean full sandboxing of applications with a usable capability system, then yeah, someone ought to do that. But I wouldn't hold my breath, there's a reason why nobody did yet.


Yes i love quickly creating tools in a single file, if the tool gets really complex I'll switch to a sveltekit Static site. I have a default css file I use for all of them to make it even quicker and not look so much like AI slop.

I think every dev should have a tools.TheirDomain.zzz where they put different tools they create. You can make so many static tools and I feel like everyone creates these from time to time when they are prototyping things. There's so many free options for static hosting and you can write bash deploy scripts so quickly with AI, so its literally just ./deploy.sh to deploy. (I also recommend writing some reusable logic for saving to local storage/indexedDB so its even nicer.)

Mine for example is https://tools.carsho.dev (100% offline/static tools, no monetization)


What are the security issues of file:// URLs?

  fetch("file:///C:/Users/You/Documents/secrets.txt")

As long as same-origin is enforced this is probably OK? I'm going to steal my own secrets?

Congratulations, you won $1000000! In order to continue, please download and open this HTML file.

"Chrome wants to access 'secrets.txt'. Allow | Deny"

  $ python -m http.server

Imagine a very plausible situation. You have 1 HTML file at the top that wants to access hundreds of files in a subfolder. There is no way you can show Allow | Deny for every one of them. On the other hand, it's also possible for someone to take that file and put it in a folder like Documents or Downloads, so blanket allowing it access to siblings would allow access to all those files.

This could easily be solved by some simple contract like "webgame.html can only access files in a webpage/ subdirectory," but the powers that be deemed such thing not worth the trouble.



I guess you're replying to my comment because you were triggered by my last sentence. I wasn't criticizing you specifically, but yeah, in another comment you're writing

> It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.

which unfortunately reaks of exactly the kind of roundabout HTML criticism that is not so helpful IMO. We have to face the possibility that most HTML documents have already been written at this point, at least if you value text by humans.

The CVEs you're referencing are due to said historic blunders allowing inline JS or otherwise tunneling foreign syntax in markup constructs (mutation XSSs are only triggered by serialising and reparsing HTML as part of bogus sanitizer libs anyway).

If you look at past comments of mine, you'll notice I'm staunchly criticizing inline JS and CSS (should always be placed in external "resources") and go as far as saying CSS or other ad-hoc item-value syntax should not even exist when attributes already serve this purpose.

The remaining CVE is made possible by Hickson's overly liberal rules for what's allowed or needs escaping in attributes vs SGML's much stricter rules.


Inline JS or CSS is fine if typed directly by humans. It's only a problem when generated. Generated resources should always be in separate files.

I like the flexibility of being able to make one file HTML apps with inline resources when I'm not generating code. But there should be better protections against including inline scripts in generated code unintentionally.


Omitting <body> can lead to weird surprises. I once had some JavaScript mysteriously breaking because document.body was null during inline execution.

Since then I always write <body> explicitly even though it is optional.


Why would content farms split their content into bite-sized chunks to appease LLMs in the first place? LLMs aren't quoting/referencing web sites they've scraped to come up with answers (hint: maybe they should be required to?), thereby destroying the idea of the "web" as linked documents. The crisis is about Google Search not bringing page views either, as a continuation of last decade's practice to show snippets or amp pages; or at least not to pages without Google Ads.

ChatGPT often provides links to sources in its answers after searching the web. Therefore, some people in the SEO world are saying that you need to split up your content into many small "questions" so that LLMs copy your answer to the question after searching the web and (hopefully) link to your website in the process.

I don't think that it is a good strategy, but it makes sense, especially for content that you want to be scraped (like product pages).


If this is is why people are doing it, the SP isn't even addressing the actual question of effectiveness, because this isn't about manipulating the Page Rank algorithm its about getting results cited in LLM outputs.

I'm wondering if the future meta is to write articles that don't actually target the truth, but what the AI most likely believes, as in most likely hallucinates.

None of that.

The SEO solution is to be in the list of results that the search engines return to the LLM. That list is relatively small.

You don't even get into the "LLM evaluation" stage unless you're one of the top X number of results for the LLM search. Being that the LLM search uses the search engines and not the LLM, it's fatal if you don't score high enough for the search engines. Whatever makes your results top hits for the search engine is what it will take to get the LLMs to notice you in the future.

ie - for now, OpenAI is dependent on the search engines when doing research. So it's actually the search engines that represent the gatekeeper.


Which searchengine is OpenAI using?

I would think it has to be Bing. There are some articles saying it is, but nothing official I could find. Using Google sounds like a strategic blunder.

Do we need some kind of standardized URL syntax (like # for anchor) to have browsers take you to the sub-content and highlight it?


Thank you!

> Why would content farms split their content into bite-sized chunks to appease LLMs in the first place?

SEO practices are mainly guesses and superstition. The principles of making a well structured website were known in 2000 and haven't changed.


> The principles of making a well structured website were known in 2000 and haven't changed.

But your well structured site will be ignored both by search engines and by llms. And that is all there is to it, really.


Almost all copyright licenses require attribution, so yes. They are required to refer to the sources

This, plus after the Godot runtime, game assets itself have to be downloaded, often making use of ZIP-like archive formats that may have made sense with DLCs or physical media, but require huge downloads (like GBs) to access a single sprite when the browser DOM rendering itself is pretty much about priorizing resources as they're viewed.

Plus, WASM game runtimes need to bundle redundant 2D or 3D stacks, audio, fonts, harfbuzz, etc. yet don't expose eg. text rendering capabilities on par with those that browsers already have natively.

The whole thing is priorizing developer over user experience.


In other news, Tailwind had to layoff their team due to lack of spending on new web sites as Google Search AI is answering search requests from scraped data without sending visitors to origin sites.

> But the reality is that 75% of the people on our engineering team lost their jobs here yesterday because of the brutal impact AI has had on our business.

Not a Tailwind user but I really appreciate the honesty. Is the brutal impact of AI as a cause established though? It appears creation of new web sites is down, but that doesn't mean the business has gone to LLMs like suggested; it could as well mean that there are simply no sites being created at all.

Especially as

> Traffic to our docs is down about 40% from early 2023 despite Tailwind being more popular than ever.

and

> the docs are the only way people find out about our commercial products

ie. data is lacking.


I believe a lot of this expectation is that as people replace Google searches with LLMs, or even enriched LLM results pushed at the top of Google results, far less click through to the actual sources happens.

This is happening across a lot of web verticals that previously relied on excellent SEO ranking and click through performance to drive ad revenue/conversions/sales. I have direct knowledge of some fairly catastrophic metrics coming out of knowledge base businesses; it wouldn't surprise me in the slightest that something like Tailwind is suffering a similar fate.


The syntax of Prolog is basically a subset of the language of First Order Logic (sans quantifiers and function symbols), it doesn't get any more minimal than that. What's special in Prolog compared to imperative languages including functional languages is that variables aren't "assigned" but implicitly range over potential values until satisfying the context, like in math formulas for sets. Yes you can express that awkwardly with tons of type annotations and DSL conventions so that you never have to leave your favourite programming language. But then there's the problem of a Prolog "engine" doing quite a bit more than what could be reasonably assumed behind a synchronous library call, such as working with a compact solution space representation and/or value factoring, parallel execution environment, automatic value pruning and propagation, etc.

The integration of a Prolog backend into a mainstream stack is typically achieved via Prolog code generation (and also code generation via LLMs) or as a "service" on the Prolog side, considering Prolog also has excellent support for parsing DSLs or request/responses of any type; as in, you can implement a JSON parser in a single line of code actually.

As they say, if Prolog fits your application, it fits really well, like with planning, constraint solving, theorem proving, verification/combinatoric test case enumeration, pricing models, legal/strategic case differentiation, complex configuration and the like, the latter merely leveraging the modularity of logic clauses in composing complex programs using independent units.

So I don't know how much you've worked hands on with Prolog, but I think you actually managed to pick about one of the worst rather than best examples ;)


You can implement a JSON parser in a single line of code in C, but why?

To win an entry at IOCCC.

> So I don't know how much you've worked hands on with Prolog, but I think you actually managed to pick about one of the worst rather than best examples ;)

Seems more like an interesting research project than something I'd ever deploy in an application serving millions of users


I can't speak to any sort of scalability but I can definitely say that not everything needs to be built for millions of users. There's plenty of utility in tools you use to help even a single person (even yourself!)

> Seems more like an interesting research project

You mean like the kinds of problems digital computing was originally invented to solve?

You know that still exists, right? There are many people using computers to advance the state of Mathematics & related subjects.


> A few decades ago XML emerged from the pit. XML [...] could be used for documents, data transfer, and a bunch of other things, and people genuinely liked it [...] They liked it so much that a concerted effort was started to take HTML and rebuild it on top of XML.

XML didn't "emerge" and was repurposed for HTML; it was designed for new vocabularies on the web. The first sentence of the XML spec reads:

> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.


Any advance in JavaScript and outrageous browser complexity is cheered at here on HN, but waking up to the fact that their actual purpose is unskippable ads and browser monopolies is not so funny.

What's your suggestion to deal with Apple's current fail? Wait until Tahoe's successor(s) or leave for good? When Jony Ive left, Apple managed to listen to their customers and quickly got rid of the Touch Bar thing, reintroduced a physical Esc key, etc. so there's hope left, isn't there?

> Apple managed to listen to their customers and quickly got rid of the Touch Bar thing, reintroduced a physical Esc key, etc. so

Those are all hardware upgrades that Apple profits from. What incentive does Apple have to make the App Store better, or improve the visual clarity in the iOS and macOS interface? Shouldn't we be seeing downward pressure there too, if innovation can be generalized to software?

Most users don't have a say in the matter, and Apple has exploited their ambivalence for decades. If you're the sort of person who cares, you're not Apple's target audience.


Worth noting you can't define special parsing rules using custom elements, such as for inferring omitted tags like is done for predefined elements all the time. The behavior of parsing HTML fragments with customized standard elements using the browser API is basically underspecified since it lacks a context element which however is needed for inferring required omitted elements such as <head> and <body>, or <html> itself. What about custom elements appearing as child content of other custom elements?

For merely defining custom elements you need JS anyway, so these aren't a technique intended for text authors. Yet as another way to organize code in webapps, custom elements are competing with JS which already has multiple module and namespace and OO import features that are much more flexible.

So as usual, random people on github (aka WHAT working group individuals aka Google shills) reinventing SGML, poorly. Because why not? The end goal always has been to make ad blocking an infeasible arms race and gather "telemetry."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: