Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People are so focused on the mmap part, and the latency, that the usage is overlooked.

> The last couple of weeks I've been working on an HTTP-backed filesystem.

It feels like this is micro optimizations, that are going to get blocked anyway by the whole HTTP cycle anyway.

There is also the benchmark issue:

The enhanced CDB format seems to be focused on a read only benefits, as writes introduced a lot of latency, and issue with mmap. In other words, there is a need to freeze for the mmap, then unfreeze, write for updates, freeze for mmap ...

This cycle introduces overhead, does it not? Has this been benchmarked? Because from what i am seeing, the benefits are mostly in the frozen state (aka read only).

If the data is changed infrequently, why not just use json? No matter how slow it is, if your just going to do http requests for the directory listing, your overhead is not the actual file format.

If this enhanced file format was used as file storage, and you want to be able to fast read files, that is a different matter. Then there are ways around it with keeping "part" files where files 1 ... 1000 are in file.01, 2 ... 2000 in file.02 (thus reducing overhead from the file system). And those are memory mapped for fast reading. And where updates are invalidated files/rewrites (as i do not see any delete/vacume ability in the file format).

So, the actual benefits just for a file directory listing db escapes me.



We need to support over 10M files in each folder. JSON wouldn't fare well as the lack of indices makes random access problematic. Composing a JSON file with many objects is, at least with the current JSON implementation, not feasible.

CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

CDB also makes it possible to access it directly, with ranged HTTP requests. It isn't something I've implemented, but having the option to do so is nice.


> CDB is only a transport medium. The data originates in PostgreSQL and upon request, stored in CDB and transferred. Writing/freezing to CDB is faster than encoding JSON.

Might have been interesting to actually include this in the article, do you not think so? ;-)

The way the article is written, made it seen that you used cdb on edge nodes to store metadata. With no information as to what your storing / access, how, why ... This is part of the reason we have these discussions here.


The post is about mmap and my somewhat successful use of it. If I've described my whole stack it would have been a small thesis and not really interesting.


This reads like complete nonsense. If HTTP is involved, lets just give up and make the system as slow as possible?

The HTTP request needs to actually be actioned by the server before it can respond. Reducing the time it takes for the server to do the thing (accessing files) will meaningfully improve overall performance.

Switching out to JSON will meaningfully degrade performance. For no benefit.


> If HTTP is involved, lets just give up and make the system as slow as possible?

Did i write that? Please leave flamebait out of these discussions.

The original author (today) answered why they wanted to use this approach and the benefits from it. This has been missing in this entire discussion. So i really do not understand where you get this confidence.

> Switching out to JSON will meaningfully degrade performance. For no benefit.

Without knowing why or how the system was used, and now we know it is used as a transport medium between the db/nodes, its more clear as to why json is a issue for them. Does not explain how you conclude it will "meaningfully degrade performance" when this information was not available to any of us.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: