> The practical upshot is a git commit hash is not enough l to know you are distributing/executing the legitimate code, as opposed to a malicious doppelganger
Really now? Mind if I challenge you?
I have on my machine a git repo with commit '75eb4e3b1369706a4dcd61cc80e49660ac341ea4'.
If you can give me a second git repo with such a commit containing different contents, I'll happily send you $10k USD, or donate it to a charity of your choice.
> If you can give me a second git repo with such a commit containing different contents, I'll happily send you $10k USD, or donate it to a charity of your choice.
Calculating that SHA1 collision is going to be a bit more expensive than $10k, by a couple of orders of magnitude.
Finding it in the wild is improbable, but calculating it is definitely possible, and has been done before. http://shattered.io/
Shattered didn't produce a collision for an arbitrary hash, it produced two documents with the same hash (which is a slightly easier problem, about 100,000x faster).
SHA1 is certainly insecure at this point, but not even close to trivially so.
That is enough to distribute malicious code though, at least in certain scenarios. Someone might create a setup where reviewers check/sign one version of the source code, and what gets distributed is another version with the same hash.
Code review in the Linux kernel still happens by email to a large degree.
Further up in contribution tree there is additional signing. Would that further complicate the insertion of a false commit? I am not convinced that signing is used all the way down to every contribution.
If your goal was to prove that SHA1 collisions are unimportant, far too hard for any group to exploit within the next X years of processing improvements...
That means math.
In contrast, this "challenge" stuff is just chasing outage endorphins and internet points.
Think it through, and it's pointless. Any refusal or negative result is utterly compromised and confounded by things like: How trustworthy you appear; whether the amount is reasonable; whether the random commenter has the skillset, free time, and financial assets to try; whether they're part of a larger group they can recruit; etc.
My goal is to see the actual proof of concept that whatever the person I replied to is feasible. Not the daily BS from security wannabes that start with "In certain scenarios it is possible to X and Y" and then never show proof.
"In certain scenarios I could be a ninja": it means absolutely nothing without proving that I actually have the skills and I could actually use them.
It is not pointless, but if you claim something show the proof.
The math is the proof of concept when an attack costs that much money to pull off. Or the various papers that show successful attacks on reduced-round versions of the hash.
Do you not accept those? What would you accept as a proof of concept?
_That is enough to distribute malicious code though, at least in certain scenarios. Someone might create a setup where reviewers check/sign one version of the source code, and what gets distributed is another version with the same hash._
Well the proof of concept without actually having two colliding files is really simple, so I thought it was generally understood.
Here's the easiest to explain way: Upload the malicious version of the file to github. Send an innocuous patch to the kernel devs that creates a file with the same hash. It gets accepted, and anyone that downloads the kernel from github gets the malicious version. Done. That's a small fraction of linux downloaders, but this is just the proof of concept.
A proof of concept became much easier with C11 unicode identifiers, and email patch review. You can trivially hide Cyrillic chars eg. between whitespace changes or other trivial "optimizations". Even without collisions.
And with the current surge of GPU's even collisions are realistic now. The H100's are not doing much when not in training.
>which is a slightly easier problem, about 100,000x faster
Where did you get this number from? I was under impression that this is completely infeasible (just like we can generate a collision good md5 in seconds, but we still can't do a preimage attack).
Really now? Mind if I challenge you?
I have on my machine a git repo with commit '75eb4e3b1369706a4dcd61cc80e49660ac341ea4'.
If you can give me a second git repo with such a commit containing different contents, I'll happily send you $10k USD, or donate it to a charity of your choice.