Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That method adds an extra character to your alphabet, and uses that as a marker in the transformed string to h effectively) store the index of the first character inside the transformed text.

If, as is often the case, your alphabet contains exactly as many letters as a byte/word can store, that makes for awkward encoding. I also think that, typically, that method will require more memory.



No, I'm not talking about that. I'm talking about David Scott's bijective BWT variant. It sorts symbols by a slightly different type of context, and so can do away with a sentinel symbol or a stored index entirely. It was described in the wikipedia page last I checked.

Scott was a bit of a weirdo, but his algorithm actually got recognition in academia, there have been many papers written about it. One character saved doesn't make a lot of difference for compression, but the transform being a permutation does give it some nice properties.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: