Hacker Newsnew | past | comments | ask | show | jobs | submit | niek_pas's commentslogin

For some reason this post links to the dev branch on GitHub, if you switch to the main branch you will see the license file is indeed Apache 2.0.

That’s funny, I’ve had the exact opposite experience. Gemini starts every answer to a coding question with, “you have hit upon a fundamental insight in zyx”. ChatGPT usually starts with, “the short answer? Xyz.”

Can someone ELI5 this for a non-mathematician?


I'll take a shot at it. Using collatz as the specific target for investigating the underlying concepts here seems like a big red-herring that's going to generate lots of confused takes. (I guess it was done partly to have access to tons of precomputed training data and partly to generate buzz. The title also seems kind of poorly chosen and/or misleading)

Really the paper is about mechanistic interpretation and a few results that are maybe surprising. First, the input representation details (base) matters a lot. This is perhaps very disappointing if you liked the idea of "let the models work out the details, they see through the surface features to the very core of things". Second, learning was burst'y with discrete steps, not smooth improvement. This may or may not be surprising or disappointing.. it depends how well you think you can predict the stepping.


The model partially solves the problem but fails to learn the correct loop length:

> An investigation of model errors (Section 5) reveals that, whereas large language models commonly “hallucinate” random solutions, our models fail in principled ways. In almost all cases, the models perform the correct calculations for the long Collatz step, but use the wrong loop lengths, by setting them to the longest loop lengths they have learned so far.

The article is saying the model struggles to learn a particular integer function. https://en.wikipedia.org/wiki/Collatz_conjecture


That's a bit of an uncharitable summary. In bases 8, 12, 16, 24 and 32 their model achieved 99.7% accuracy. They would never expect it to achieve 100% accuracy. It would be like if you trained a model to predict whether or not a given number is prime. A model that was 100% accurate would defy mathematical knowledge but a model that was 99.7% would certainly be impressive.

In this case, they prove that the model works by categorising inputs into a number of binary classes which just happen to be very good predictors for this otherwise random seeming sequence. I don't know whether or not some of these binary classes are new to mathematics but either way, their technique does show that transformer models can be helpful in uncovering mathematical patterns even in functions that are not continuous.


A pocket calculator that would give the right numbers 99.7% of the time would be fairly useless. The lack of determinism is a problem and there is nothing 'uncharitable' about that interpretation. It is definitely impressive, but it is fundamentally broken, because when you start making chains of things that are 99.7% correct you end up with garbage after very few iterations. That's precisely why digital computers won out over analog ones, the fact that they are deterministic.


Category error. You want 100% accuracy for an impossible problem. This is a famously unsolved conjecture. The only way to get the answer is to fully calculate it. The task was to make a guess and see how well it could do. 99.7 is surprisingly good. If the task was to calculate, the llm could write a python program, just like I would have if asked to calculate the answer.


There is a massive difference between an 'unsolved problem' and a problem solved 'the wrong way'. Yes, 99.7% is surprisingly good. But it did not detect the errors in its own output. And it should have.

Besides, we're all stuck on the 99.7% as if that's the across the board output, but that's a cherry picked result:

"The best models (bases 24, 16 and 32) achieve a near-perfect accuracy of 99.7%, while odd-base models struggle to get past 80%."

I do think it is a very interesting thing to do with a model and it is impressive that it works at all.


Category error.

The problem here is deterministic. *It must be for accuracy to even be measured*.

The model isn't trying to solve the Collatz conjecture, it is learning a pretty basic algorithm and then doing this a number of times. The instructions it needs to learn is

  if x % 2:
      x /= 2
  else:
      x = x*3 + 1
It also needs to learn to put that in a loop and for that to be a variable, but the algorithm is static.

On the other hand, the Collatz conjecture states that for C(x) (the above algorithm) has a fixed point of 1 for all x (where x \in Z+). Meaning that eventually any input will collapse to the loop 1 -> 4 -> 2 -> 1 (or just terminate at 1). You can probably see we know this is true for at least an infinite set of integers...

Edit: I should note that there is a slight modification to this, though model could get away with learning just this. Their variation limits to odd numbers and not all of them. For example 9 can't be represented by (2^k)m - 1 (but 7 and 15 can). But you can see that there's still a simple algorithm and that the crux is determining the number of iterations. Regardless, this is still deterministic. They didn't use any integers >2^71, which we absolutely know the sequences for and we absolutely know all terminate at 1.

To solve the Collatz Conjecture (and probably win a Fields Metal) you must do one of 2 things.

  1) Provide a counter-example 
  2) Show that this happens for all n, which is an infinite set of numbers, so this strictly cannot be done by demonstration.


Most primality tests aren't 100% accurate either (eg Miller Rabin), they just are "reasonably accurate" while being very fast to compute. You can use them in conjunction to improve your confidence in the result.


Yes, and we know they are inaccurate and we know that if you find a prime that way you can only use it to reject, not confirm so if you think that something is prime you need to check it.

But now imagine that instead of it being a valid reject 0.3% of the time it would also reject valid primes. Now it would be instantly useless because it fails the test for determinism.


I don't know people are saying it's useful. Just interesting


It's uncharitable because the comment purports to summarise the entire paper while simply cherry picking the worst result. It would be like if asked how did I do on my test and you said well you got question 1 wrong and then didn't elaborate.

Now I get your point that a function that is 99.7% accurate will eventually always be incorrect but that's not what the comment said.


I just tried to get to the heart of the claim based on a skim. Please feel free to refine my summary.


Why do people keep using LLMs as algorithms?

LLMs are not calculators. If you want a calculator use a calculator. Hell, have your LLM use a calculator.

>That's precisely why digital computers won out over analog ones, the fact that they are deterministic.

I mean, no not really, digital computers are far easier to build and far more multi-purpose (and technically the underlying signals are analog).

Again, if you have a deterministic solution that is 100% correct all the time, use it, it will be cheaper than an LLM. People use LLMs because there are problems that are either not deterministic or the deterministic solution uses more energy than will ever be available in the local part of our universe. Furthermore a lot of AI (not even LLMs) use random noise at particular steps as a means to escape local maxima.


> Why do people keep using LLMs as algorithms?

I think they keep coming back to this because a good command of math underlies a vast domain of applications and without a way to do this as part of the reasoning process the reasoning process itself becomes susceptible to corruption.

> LLMs are not calculators. If you want a calculator use a calculator. Hell, have your LLM use a calculator.

If only it were that simple.

> I mean, no not really, digital computers are far easier to build and far more multi-purpose (and technically the underlying signals are analog).

Try building a practical analog computer for a non-trivial problem.

> Again, if you have a deterministic solution that is 100% correct all the time, use it, it will be cheaper than an LLM. People use LLMs because there are problems that are either not deterministic or the deterministic solution uses more energy than will ever be available in the local part of our universe. Furthermore a lot of AI (not even LLMs) use random noise at particular steps as a means to escape local maxima.

No, people use LLMs for anything and one of the weak points in there is that as soon as it requires slightly more complex computation there is a fair chance that the output is nonsense. I've seen this myself in a bunch of non-trivial trials regarding aerodynamic calculations, specifically rotation of airfoils relative to the direction of travel. It tends to go completely off the rails if the problem is non-trivial and the user does not break it down into roughly the same steps as you would if you were to work out the problem by hand (and even then it may subtly mess up).


>A pocket calculator that would give the right numbers 99.7% of the time would be fairly useless.

Well that's great and all, but the vast majority of llm use is not for stuff you can just pluck out a pocket calculator (or run a similarly airtight deterministic algorithm) for, so this is just a moot point.

People really need to let go of this obsession with a perfect general intelligence that never makes errors. It doesn't and has never existed besides in fiction.


yeah it's only correct in 99.7% of all cases, but what if it's also 10'000 times faster? There's a bunch of scenarios where that combination provides a lot of value


Ridiculous counterfactual. The LLM started failing 100% of the time 60! orders of magnitude sooner than the point at which we have checked literally every number.

This is not even to mention the fact that asking a GPU to think about the problem will always be less efficient than just asking that GPU to directly compute the result for closed algorithms like this.


Correctness in software is the first rung of the ladder, optimizing before you have correct output is in almost all cases a complete waste of time. Yes, there are a some scenarios where having a ballpark figure quickly can be useful if you can produce the actual result as well and if you are not going to output complete nonsense the other times but something that approaches the final value. There are a lot of algorithms that do this (for instance: Newton's method for finding square roots).

99.7% of the time good and 0.3% of the time noise is not very useful, especially if there is no confidence indicating that the bad answers are probably incorrect.


[flagged]


Do you think maybe OP would have asked a language model for the answer if they felt like they wanted a language model to give an answer? Or in your mind parent doesn't know about LLMs, and this is your way of introducing them to this completely new concept?


Funny that the "human" answer above took 2 people to be "complete" (i.e. an initial answer, followed by a correction and expansion of concepts), while the LLM one had mostly the same explanation, but complete and in one answer.


Maybe most of us here don't seek just whatever answer to whatever question, but the human connection part of it is important too, that we're speaking with real humans that have real experience with real situations.

Otherwise I'd just be sitting chatting with ChatGPT all day instead of wast...spending all day on HN.


If life is a jobs program, why don't we dig ditches with spoons?


Oh, I agree. What I found funny is the gut reaction of many other readers that downvoted the message (it's greyed out for me at time of writing this comment). Especially given that the user clearly mentioned that it was LLM generated, while also being cheeky with the "transformer" pun, on a ... transformer topic.


Unfortunately, Netflix thus far seems to lack the creative vision to fully utilize any size of production house (barring rare exceptions).


This really makes me wonder if publicly traded companies are just a bad idea.


FWIW Epic is also privately owned.


> Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name.

Not Google, and not Amazon. Microsoft is a maybe.


People trust google with their data in search, gmail, docs, and android. That is quite a lot of personal info, and trust, already.

All they have to do is completely switch the google homepage to gemini one day.


The success of Facebook basically proves that public brand perception does not matter at all


Facebook itself still has a big problem with it's lack of youth audience though. Zuck captured the boomers and older Gen X, which are the biggest demos of living people however.


> Zuck captured the boomers and older Gen X, which are the biggest demos of living people however.

In the developed world. I'm not sure about globally.


Ironically, Mozart, as a progeny of the Classical era, uses a lot of the same 4 chords with incredibly simple melodies.


> Wish we could also do the same for "Beam me up scotty"

You might die every time you do, though, so maybe not.


For some definitions of "you" and "die".


You = your body and mind

Die = don’t exist anymore


that also brings the philosophycal question of will I be the same if all my atoms and molecules are copied exactly the same?


Thinking about such questions before we are capable of doing such an experiment at least with small animals is like discussing how many angels can stand on the point of a pin.


Inventing a Star Trek-style teleporter would be quite something, but I don't see how it would advance the philosophy in any way. We already know the teleportation subjects would report 'feeling just the same' as before. If they didn't then by definition it's not a functioning teleporter, as it accidentally modified the subject in transit.


I am not sure philosophy interests itself about if we can do it yet or not.


As defensible as it may be, your behavior is very far from the norm. You may not consider this a aggressive privacy practice but demographically speaking, it absolutely is.


> the little mini screenshots look gorgeous because it replicates MacOS.

I have the opposite reaction. To me the screenshots look like someone tried to replicate macOS but failed. The text antialiasing is off, the font is different (and worse), the border-radii on menus are off, etc.

Besides, the actual screenshots of the current OS (https://ravynos.com/screenshots) are... really rough.


i didn't expect it to look so...dated[0]? the things are approximately where they are on MacOS, but it looks like Window 2000/ME/98.

[0]or retro, for anyone who's offended by me calling it dated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: