People are so focused on the mmap part, and the latency, that the usage is overl...

perbu · 2025-10-24T11:21:23 1761304883

We need to support over 10M files in each folder. JSON wouldn't fare well as the lack of indices makes random access problematic. Composing a JSON file with many objects is, at least with the current JSON implementation, not feasible.

CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

CDB also makes it possible to access it directly, with ranged HTTP requests. It isn't something I've implemented, but having the option to do so is nice.

benjiro · 2025-10-24T13:28:10 1761312490

> CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

Might have been interesting to actually include this in the article, do you not think so? ;-)

The way the article is written, made it seen that you used cdb on edge nodes to store metadata. With no information as to what your storing / access, how, why ... This is part of the reason we have these discussions here.

perbu · 2025-10-24T14:57:50 1761317870

The post is about mmap and my somewhat successful use of it. If I've described my whole stack it would have been a small thesis and not really interesting.

dahfizz · 2025-10-24T12:30:59 1761309059

This reads like complete nonsense. If HTTP is involved, lets just give up and make the system as slow as possible?

The HTTP request needs to actually be actioned by the server before it can respond. Reducing the time it takes for the server to do the thing (accessing files) will meaningfully improve overall performance.

Switching out to JSON will meaningfully degrade performance. For no benefit.

benjiro · 2025-10-24T13:31:07 1761312667

> If HTTP is involved, lets just give up and make the system as slow as possible?

Did i write that? Please leave flamebait out of these discussions.

The original author (today) answered why they wanted to use this approach and the benefits from it. This has been missing in this entire discussion. So i really do not understand where you get this confidence.

> Switching out to JSON will meaningfully degrade performance. For no benefit.

Without knowing why or how the system was used, and now we know it is used as a transport medium between the db/nodes, its more clear as to why json is a issue for them. Does not explain how you conclude it will "meaningfully degrade performance" when this information was not available to any of us.