Why would we never know? Quoting from the blog post: "The specific reasons given were that [...]" followed by a bullet list. Your guess is unwarranted speculation.
> Why would we never know? Quoting from the blog post: "The specific reasons given were that [...]" followed by a bullet list. Your guess is unwarranted speculation.
The question was: can we find out if hidden political vendettas against OP were the cause of complaints? You're implying that, yes, we can find out, because the complainers provided a bullet point list. If the complainers had hidden political vendettas against OP, do you actually think they would have listed them in the bullet points?
sums = [s for s in [0] for x in data for s in [s + x]]
Why would you do "for s in" twice? Is that intentional? It would make more sense to me if the variables would have been different. And why would you want to add 0 to numbers?! Curious about a real world use case for this.
I wouldn't endorse that code, but it does make sense. You can read it like this:
sums = []
s = 0
for x in data:
s = s + x
sums.append(s)
`for s in [0]` assigns 0 to `s`, as an initial value. `for s in [s + x]` adds `x` to `s`. Both instances of `s` are the same variable, there's no shadowing going on.
Ah, I see. That really doesn't deserve to be called an idiom, it's a clever hack. But it's nice to know about it. It seems less ugly than the walrus operator to me, and it doesn't leak the variable outside of the comprehension.
There might be others, but the big advantage in my eyes is that Re, because it is pure-OCaml, is platform independent, whereas Re2, being backed by the C++ lib of the same name, is not. At Jane Street this is relevant because we share a lot of code between Linux servers and JavaScript (running in Chrome) clients.
bollu, multiple people have shown that your claim that the paper doesn't match the code is flat out wrong. I think at this point you should issue a retraction of your wildly inappropriate suggestion of academic dishonesty.
Why do you think English would be least compressible? Is that based on conjecture or have you investigated this? Why would artificial language be more compressible? That seems completely orthogonal to me (by definition, an artificial language can be designed with whatever properties you choose). Fortran may be more compressible due to its limited set of keywords, but it's my impression that Ithkuil is by design more information dense and thus harder to compress than English.
The most efficient language is the least compressible language only in a narrow and arbitrary sense of efficient. There are many considerations such as what is efficient for the speaker, the hearer, redundancy to noise, efficiency with respect to particular purposes, etc. We can assume that natural languages will generally make a good trade-off across these factors, and searching for the most efficient language in one particular narrow sense is not very useful. Moreover, compression of text focuses only on surface form, completely ignoring the dimension of meaning.
I live in the USA. We get labels in English, French and Spanish so that products can be sold in Canada and Mexico. The English labeling is almost always visibly shorter than the French and Spanish. So I hypothesize that English would compress less.
My conjecture is that artificial languages will be more compressible because they haven't had time to get honed down, like English losing "thee" and "thou", that personal mode of address. Esperanto and Loglan are completely regular, which natural languages are not, and thus has a lot of use-cases where the regularity doesn't matter - they haven't had time to lose the mostly-unused features.
For better or for worse, compression of text only uses the surface form to compress, because that's the level that compression works on - letters or bytes or some other unit. You can't compress meaning. Meaning doesn't exist per se: colorless dreams sleep furiously, after all. That is, you can use perfectly sensible words and letters and even legitimate syntax, and still create strings devoid of meaning. A document consisting of perfectly spelled words, and legitimate syntax, yet without meaning like the colorless dreams sentence, will compress identically to ordinary text with the same orthographic and syntactical validity.
You mention a single counterexample, but is it "definitely not true"? I think there is a strong publication bias: papers will report when they improve on the baseline, but that's likely not representative for the common case (e.g., limited data, no pretrained model available, no time for extensive parameter tuning).
You're right that it is a representation, but also an instance of the vector space model of language. Coupled with a linear model for prediction it is a strong baseline for text classification problems. See e.g. http://scikit-learn.org/stable/tutorial/text_analytics/worki...
So "bag of words" = "count/tf-idf vectorizer + logistic/ridge/lasso regression"?
Also: a vector space is a set of things that can be added and multiplied by a scalar. So a vector space model should be the proverbial representation where "queen - woman + man = king".
Am I being an insufferable pedant? I follow text analysis only very lightly and keep losing the thread.
Yes, in the context of text classification a bag of words model will refer to that, or combined with some other linear model like linear SVM or naive bayes.
The queen - woman example is when you try to make a model of word semantics, such as with word2vec. In a document classification task the vectors represent documents.
That doesn't work the same as pihole. PiHole blocks ads on ALL devices on your network. Your computer, your laptop, your phone, your kids kindle, etc. As long as they are on your network, they are protected (and browsing web pages on an older phone, things are much faster)
Yeah this is something I've been thinking about lately as well. Pi-Hole seems cool but what about most of the time when I'm somewhere else than my local network?
I use AdBlock https://www.adblockios.com on iOS which runs a local DNS server that can blackhole domains. It doesn't work well on very large host files so I gave up trying to import https://github.com/StevenBlack/hosts, but it does work well for smaller lists.
(1) Install "1Blocker X" -- not free but it's cheap.
(2) It has a huge number of rules and protects your Safari pretty damn good.
(3) You can disable the existing rules if you so choose.
(4) You can add new ones based on URL regexes or CSS rules.
I am still using it actively both on my iPhone and iPad, one of the best investment in apps I ever did.
The issue with that is DNS resolution. I noticed that when I disconnect/reconnect my interface, it took >30 seconds for DNS resolution to properly resolve. Why? Because I was using a 65,000 entry host file on my modern Windows 10 machine.
It seems to only impact during NIC changes, but I VPN and was moving my computer enough that it was causing me issue.