Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How fast can browsers process base64 data? (lemire.me)
33 points by mfiguiere 7 hours ago | hide | past | favorite | 20 comments




https://chromium-review.googlesource.com/c/v8/v8/+/7208125 v8 changed yesterday to avoid the temp buffer which will likely double the base64 decode speed.

Looks like this was brought up there as a result of this article too, which is neat! And helpful since I was just messing with a node script that is heavily decoding base64


I like commits like this. Removes unnecessary work, easy to understand optimization, 40 lines gone 32 added so net smaller codebase, which usually means easier to maintain. Has ample comments and even uses one of my favorite tricks to do a ceiling function.

base64 is embarrassingly parallel. So just pipe it to the GPU:

  precision highp float;
  uniform vec2 size;
  uniform sampler2D src,tab;
  void main(){
    vec4 a=(gl_FragCoord-.5)*3.,i=vec4(0,1,2,0)+a.y*size.x+a.x,y=floor(i/size.x),x=i-y*size.x;
    #define s(n)texture2D(src,vec2(x[n],y[n])/size)[0]
    #define e(n)texture2D(tab,vec2(a[n],0))[0]
    a=vec4(s(0),s(1),s(2),0)*255.*pow(vec4(2),-vec4(2,4,6,0)),a=fract(a).wxyz+floor(a)/64.,gl_FragColor=vec4(e(0),e(1),e(2),e(3));
  }

That blog post left me hungry for more. I was expecting Daniel Lemire to provide a SIMD crazy optimized version that shows the default browser implementations are sub-optimal. But it's not in this article. Anyone knows?


> Browsers recently added convenient and safe functions to process base 64 functions Uint8Array.toBase64() and Uint8Array.fromBase64()

Wow, finally! I've had to work around this so many times in the past (btoa/atob do not play nicely with raw binary data - although there are workarounds on the decode path involving generating data URIs)


I remember in the early days of Phoenix LiveView on an intranet app using http1 I noticed it was faster to base64 encode an image, putting it in an img tag and sending the diff through the Channel websocket than the regular http request through Cowboy.

> However, when decoding, we must handle errors and skip spaces.

This had me scratching my head. Why would a base64 decoder need to skip spaces? But indeed, MDN documents this behavior:

> Note that: The whitespace in the space is ignored.

JS never ceases to surprise. Also, check out that typo :D

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Probably so you can put in line breaks? Seems common in base64 data, such as armored PGP keys or emails attachments. HTML attributes allow line breaks, although I haven’t seen it done for base64 images.

This might be for compatibility with XML Schema base64Binary, which collapses all whitespace (such as line breaks) to single spaces.

So technically it’s now possible to hide a payload in somewhat human-readable text, as long as it base64-decodes.

Now? There's no change. Also human readable text substantially consists of letters. But that's most of the base64 alphabet too. So this isn't like steganography. All the letters in the human-readable words are valid base64 characters too. The only thing about this is that you get to choose where to put the spaces and newlines. You can't exactly construct arbitrary payloads starting from arbitrary messages.

Maybe he means invisible whitespace characters that don't render? I haven't verified this but depending on the definition of whitespace it's possible you can pass a base64 string and insert an arbitrary number of them. When decoded per spec they do nothing so nobody notices them. But if you can pass the base64 string through you can receive or verify the hidden message. Lots of reasons you might want to hide data in plain sight.

What do you mean hide a payload?

Base64 isn’t obfuscation or encryption.


Does anyone know why Firefox/Servo are so slow compared to the rest?

A few big things and lots of small things.

Big performance wins recently optimizing some core operations:

https://bugzilla.mozilla.org/show_bug.cgi?id=1994067 https://bugzilla.mozilla.org/show_bug.cgi?id=1995626

which brings it near chrome performance without the new v8 optimizations.

Still more work to do, including avoiding extra copies just like v8, and exploring more simd etc. Generic slow items for toBase64 and fromBase64: https://bugzilla.mozilla.org/show_bug.cgi?id=2003299 https://bugzilla.mozilla.org/show_bug.cgi?id=2003305

extra copying of results: https://bugzilla.mozilla.org/show_bug.cgi?id=2003461 https://bugzilla.mozilla.org/show_bug.cgi?id=1996197

No reason all browsers would not be able to be similar in performance eventually. Pleased this was noticed and being worked on by both v8 and Firefox team


Thanks for sharing!

Incredible that FF is even slower than a JS only implementation running in FF.


Mozilla's "privacy" image prevents them from knowing what their browser actually does in the wild, while Google collects CPU time profiles from user devices, comprehensively, and hammers down the hotspots they find, and that refinement has been going on for many years.

Even if true (and I agree with sibling that I don't think that it is), base64 encoding/decoding feels like one of those things you'd have a micro benchmark for regardless. It's also shocking that the gap is so wide, as I feel like people working on such things would start with a fairly optimized v1.

I wonder if this is why Firefox feels so sluggish with some more complex SPAs.


That’s nonsense. Firefox has telemetry built in, it’s just that you can opt out of it. Your answer doesn’t explain why at all but instead just takes a wild guess at what might have happened. You don’t know if this was discovered in Chrome or in some other use of V8. Or maybe it was always fast in Chrome! What a weird non-answer.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: