Presumably this only applies to newcomers? The thrust of their policy is to nurture new contributors. Once one has established oneself as a meaningful contributor — which the Bun team surely must have done by now — then it doesn’t matter where the code came from.
…in theory. In reality, I’m sure a policy like this can’t be selective and fair at the same time. Pick one!
Three things matter when it comes to eating my breakfast sandwich:
1/ Was the pork in my sausage reared on a farm that meets agricultural standards?
2/ Was the food handled safely by the kitchen that cooked my food?
3/ Does the owner of the diner pay kitchen wages in accordance with labor law?
By contrast, I have no idea what went into the models I use, what system prompts have prejudiced it, and whose IP has been exploited in pursuit of my answer.
That’s being charitable, really. In practice the open secret of the AI industry is that the vast majority of training data, for want of a better word even if it is likely to be the most precise description, is stolen data.
Probably, yes, but the burden of proof is with us not them.
I'm already glad some companies have the guts to open their models because proving it for open models is probably a lot easier than for a model behind a service.
The proof is the $stupid-billion infrastructure built and kept up to host mousetraps armed with free cheese made of virtue signalling about doing the right thing and sharing the code with the world for free.
That's a matter of changing a law, it's all up to the people and their representatives. We talk as if everything is set on stone but if there really is a will, there is a way.
The media industry loves to quote ridiculous numbers on lost revenue due to piracy etc. May be a rough ballpark numbers will get them to do something about this theft.
Can someone put a rough estimate on potential revenue loss (direct and incidental) from training AI with industry wise breakup.
It’s wrong to stop progress. I just want to know what data went into my model and have access to the same data. The same way we have national libraries of books but with the caveat that I don’t really know how one is supposed to browse petabytes of OpenAI .zips like I browse old books.
If the data is proprietary (eg Meta’s stash of FB comments) then I am satisfied to be told it’s private and I can’t see it. If, however, the works were public then give me a URL if it’s live or a cached copy if it isn’t.
Add a black umbrella to each satellite: when they pass through the critical region where they are visible in the night sky while still being sunlit, pop the brollies up. We will fly them in the shade!
You could paint them black but they’d probably get quite hot.
Won't the shade then reflect the light instead? It's nighttime, so sunlight will be aimed up, from the Earth-based observer's point of view, so the shade will need to be pointed down in order to shade the satellite.
It’s been decades since I could claim to know anything about this field so I’m probably completely wrong in how I read this, but the idea that one might build a theorem prover (“ML!”) for one’s non-ML programming language and have the prover itself accidentally be a really good general purpose programming language … is very funny.
To clarify: ML started out as a scripting language for Robin Milner's proof assistant, LCF. The formal system, or "logic," is implemented in a minimal, trusted kernel, and the proof data structure is protected as an abstract data type that can only be constructed through the trusted kernel. On top of the kernel, tactic scripts may be defined to manipulate proof objects and facilitate proof search/automation.
Then, ML grew into a general-purpose programming language (both OCaml and Standard ML are dialects).
Imagine a Vendor API that adds a way to link from the page straight into a device purchase workflow. As a trial of the API in Chrome you can order a new Google Pixel 9b directly from any page with the word Android in it!
Or a LocalNet API that integrates with trusted hardware devices on your local network. As a trial (Chrome beta programme — strictly limited but here’s 3x signup links to share with your friends) you can adjust your Google Next Mini underfloor heating directly from Chrome!
Or a DirectCast API that lets you stream <video> elements to a device of your choice even over a VPN. As a Chrome trial, you can use your Google Cloud account to stream directly from YouTube Premium to any linked Google Chromecast devices you own!
It feels very close to “right to repair”. The coffee grinder you bought came as a single package but it has burrs, gears, machine screws, a motor, etc. If one of those components fails, we should be able to replace it ourselves and as such they should be documented.
The laptop has various pieces of hardware in it and corresponding drivers in macOS to make them tick. Did we buy the hardware and the drivers as an inseparable package, or should we be provided with the manual to make one component work when the other breaks, be that either third party trackpads or third party (Linux) drivers.
Apple might argue that drivers, unlike gears or motors, will never wear down and fail. They won’t need repairing so you don’t get to know how they work. Does right to repair only apply to products that could ever need repairing? Does it also extend to knowing how your purchased product is built so that you could repair it?
Maybe we’ll see a test case some day when a cosmic ray blows out /System/Trackpad.kext and a litigant applies to a court for the documentation to repair their laptop — to write their own driver!
(Or vice versa: a manufacturer of coffee grinders arguing in court that they are exempt from right-to-repair because they repair their machines for free at their Genius Espresso Bar.)
This is an interesting thought exercise. I immediately thought of the counter argument that Apple's driver quality is worse, especially for laptops nearing end of life (for the sake of argument assume this were true).
Could I then submit a warranty claim and demand Apple replace my aging laptop with their latest model?
I think there is a strong case that "the right to repair" includes software. If that doesn't mean drivers must be open source, it should at least mean hardware is documented such that a driver can be written from it.
But the US still doesn't have the right to repair hardware, haha.
I hope the EU is listening. They won't get far with their sovereign software push if hardware cannot be used. Even on the Android side, you can't write an alternative to Android because all of the hardware has locked bootloaders and hidden drivers. Good luck reverse engineering the hardware/drivers on a Samsung Galaxy - let alone an iPhone or MacBook.
I asked ChatGPT to draw the outline of an ellipse using Unicode braille. I asked for 30x8 and it absolutely nailed it. A beautiful piece of ascii (er, Unicode) art. But I wanted to mark the origin! So I asked for a 31x7 ellipse instead. It completely flubbed it, and for 31x9 too.
When a model gives a really good answer, does that just mean it’s seen the problem before? When it gives a crappy answer, is that not simply indicating the problem is novel?
No, that simply is not the case. The whole point of deep learning - and the reason it has been successful in so many domains over the last 20 years - is that generalization does occur. Leela will kick your ass at chess whether she's seen the position before or not, even if her search depth is set at 1 ply.
In the case of LLMs, the compression ratio alone absolutely requires this.
Do you posit that there are enough examples of 30x8 ellipses encoded in braille online for ChatGPT to learn from but not 31x7 or 31x9 ellipses? That seems unlikely.
Yes, or the model got lucky with the quality of output for a particular combination of my prompt and the reasoning behind its answer that lined up with something it had seen before — quality which it was unable to recreate under slightly different circumstances.
I wouldn't ask an LLM to output this directly. For an ellipse ascii I would guess that having it write a python program to generate it and then run it would work much better. Using claude sonnet 4.6 on a free account it seemed to work (sorry in advance if the hacker news formatting is horrendous)
You can use two spaces at the beginning of each line to trigger the "code" mode. I tried to reconstruct your drawing, but perhaps I didn't guess correctly:
Edit: I had to delete the two first spaces or each line and replace them with newly typed spaces from my keyboard. Perhaps there is some white-space-unicode-magic-character that is confusing HN.
All the whitespace appears to be a blank braille character, so it still displays correctly even without the indentation formatting: https://www.compart.com/en/unicode/U+2800
(Passed it through xxd to get the utf8 hex values)
Ug wants to borrow ten of my best sticks in exchange for future options to buy berries from his friend Og. Og has a watertight deal with Oog to invest the sticks in a five year mammoth hunting expedition but Oog first needs berries to exchange for sticks to cover his exposure on berry-puts he’s take out against Urrrg’s remortgaged stick pile.
Well, I said no. Not getting burned that way again!
If you or anyone else reading this haven’t finished Stephen’s Sausage Roll to the very end, including reading all the story book paragraphs along the way (which increase in poignancy and frequency as the game winds to a close) then I strongly encourage you to do so. No spoilers!
What if they use someone else's device though? Or circumvent the filter? Come on, this is Hacker News, "we" circumvent guardrails because we can and because we know no security is perfect, often from a young age.
I love how a lot of the "this is the parents' responsibility" opinion-havers don't seem to remember what it was like to be a kid themselves and / or don't have kids of their own.
The metaphor still works, minors in pubs are, presumably, under the supervision of their parents, otherwise they have not business being there in the first place.
…in theory. In reality, I’m sure a policy like this can’t be selective and fair at the same time. Pick one!
reply