More

andreasvc · on Oct 29, 2020

Why would we never know? Quoting from the blog post: "The specific reasons given were that [...]" followed by a bullet list. Your guess is unwarranted speculation.

baobabKoodaa · on Oct 30, 2020

> Why would we never know? Quoting from the blog post: "The specific reasons given were that [...]" followed by a bullet list. Your guess is unwarranted speculation.

The question was: can we find out if hidden political vendettas against OP were the cause of complaints? You're implying that, yes, we can find out, because the complainers provided a bullet point list. If the complainers had hidden political vendettas against OP, do you actually think they would have listed them in the bullet points?

andreasvc · on Oct 5, 2020

I'm puzzled by this example:

    sums = [s for s in [0] for x in data for s in [s + x]]

Why would you do "for s in" twice? Is that intentional? It would make more sense to me if the variables would have been different. And why would you want to add 0 to numbers?! Curious about a real world use case for this.

duckerude · on Oct 5, 2020

I wouldn't endorse that code, but it does make sense. You can read it like this:

  sums = []
  s = 0
  for x in data:
      s = s + x
      sums.append(s)

`for s in [0]` assigns 0 to `s`, as an initial value. `for s in [s + x]` adds `x` to `s`. Both instances of `s` are the same variable, there's no shadowing going on.

andreasvc · on Oct 5, 2020

Ah, I see. That really doesn't deserve to be called an idiom, it's a clever hack. But it's nice to know about it. It seems less ugly than the walrus operator to me, and it doesn't leak the variable outside of the comprehension.

megaman821 · on Oct 5, 2020

It is a nested loop. I don't find it very readable but the single line formatting makes it worse.

nurettin · on Oct 6, 2020

    s
    for s in [0] # for every s of 0 (there is only one)
    for x in data # for every x in data
    for s in [s + x] # s is s + x.

so s = 0 + x[0], then s = (0 + x[0]) + x[1], then (0 + x[0] + x[1]) + x[2] it results in an array of rolling sums.

Edit: someone already answered, I am blind.

andreasvc · on Aug 25, 2020

What are the advantages of Re over Re2?

thedufer · on Aug 26, 2020

There might be others, but the big advantage in my eyes is that Re, because it is pure-OCaml, is platform independent, whereas Re2, being backed by the C++ lib of the same name, is not. At Jane Street this is relevant because we share a lot of code between Linux servers and JavaScript (running in Chrome) clients.

andreasvc · on June 4, 2019

bollu, multiple people have shown that your claim that the paper doesn't match the code is flat out wrong. I think at this point you should issue a retraction of your wildly inappropriate suggestion of academic dishonesty.

andreasvc · on March 19, 2019

It is an extremely thin font which makes it unsuitable for screen reading.

jjgreen · on March 19, 2019

A lot of the time, the TeX output target is paper, and there (with a decent resolution printer) it looks rather splendid (in my view anyway).

0-_-0 · on March 19, 2019

For me the horizontal line in the letter "e" often disappears with smaller text sizes.

andreasvc · on Dec 2, 2018

Why do you think English would be least compressible? Is that based on conjecture or have you investigated this? Why would artificial language be more compressible? That seems completely orthogonal to me (by definition, an artificial language can be designed with whatever properties you choose). Fortran may be more compressible due to its limited set of keywords, but it's my impression that Ithkuil is by design more information dense and thus harder to compress than English.

The most efficient language is the least compressible language only in a narrow and arbitrary sense of efficient. There are many considerations such as what is efficient for the speaker, the hearer, redundancy to noise, efficiency with respect to particular purposes, etc. We can assume that natural languages will generally make a good trade-off across these factors, and searching for the most efficient language in one particular narrow sense is not very useful. Moreover, compression of text focuses only on surface form, completely ignoring the dimension of meaning.

bediger4000 · on Dec 3, 2018

I live in the USA. We get labels in English, French and Spanish so that products can be sold in Canada and Mexico. The English labeling is almost always visibly shorter than the French and Spanish. So I hypothesize that English would compress less.

My conjecture is that artificial languages will be more compressible because they haven't had time to get honed down, like English losing "thee" and "thou", that personal mode of address. Esperanto and Loglan are completely regular, which natural languages are not, and thus has a lot of use-cases where the regularity doesn't matter - they haven't had time to lose the mostly-unused features.

For better or for worse, compression of text only uses the surface form to compress, because that's the level that compression works on - letters or bytes or some other unit. You can't compress meaning. Meaning doesn't exist per se: colorless dreams sleep furiously, after all. That is, you can use perfectly sensible words and letters and even legitimate syntax, and still create strings devoid of meaning. A document consisting of perfectly spelled words, and legitimate syntax, yet without meaning like the colorless dreams sentence, will compress identically to ordinary text with the same orthographic and syntactical validity.

andreasvc · on Oct 24, 2018

You mention a single counterexample, but is it "definitely not true"? I think there is a strong publication bias: papers will report when they improve on the baseline, but that's likely not representative for the common case (e.g., limited data, no pretrained model available, no time for extensive parameter tuning).

andreasvc · on Oct 24, 2018

You're right that it is a representation, but also an instance of the vector space model of language. Coupled with a linear model for prediction it is a strong baseline for text classification problems. See e.g. http://scikit-learn.org/stable/tutorial/text_analytics/worki...

thanatropism · on Oct 24, 2018

So "bag of words" = "count/tf-idf vectorizer + logistic/ridge/lasso regression"?

Also: a vector space is a set of things that can be added and multiplied by a scalar. So a vector space model should be the proverbial representation where "queen - woman + man = king".

Am I being an insufferable pedant? I follow text analysis only very lightly and keep losing the thread.

andreasvc · on Oct 24, 2018

Yes, in the context of text classification a bag of words model will refer to that, or combined with some other linear model like linear SVM or naive bayes.

The queen - woman example is when you try to make a model of word semantics, such as with word2vec. In a document classification task the vectors represent documents.

andreasvc · on Oct 2, 2018

The article states he's perfectionist. He probably wouldn't be OK with that.

andreasvc · on Sept 26, 2018

The simplest approach is to use a hosts file: https://someonewhocares.org/hosts/

briffle · on Sept 26, 2018

That doesn't work the same as pihole. PiHole blocks ads on ALL devices on your network. Your computer, your laptop, your phone, your kids kindle, etc. As long as they are on your network, they are protected (and browsing web pages on an older phone, things are much faster)

andreasvc · on Sept 26, 2018

Yup, that's a downside. The advantage is that it's much simpler and will also work when you're not on your home network.

daxorid · on Sept 26, 2018

You can also run pi-hole on a tiny VPS and set your DNS statically on all devices.

LynxInLA · on Sept 26, 2018

do you have any links for doing this?

spurgu · on Sept 26, 2018

Yeah this is something I've been thinking about lately as well. Pi-Hole seems cool but what about most of the time when I'm somewhere else than my local network?

badbug · on Sept 26, 2018

How do I edit the hosts file on my iPhone?

8_hours_ago · on Sept 26, 2018

I use AdBlock https://www.adblockios.com on iOS which runs a local DNS server that can blackhole domains. It doesn't work well on very large host files so I gave up trying to import https://github.com/StevenBlack/hosts, but it does work well for smaller lists.

pdimitar · on Sept 27, 2018

Probably not the answer you are looking for but:

(1) Install "1Blocker X" -- not free but it's cheap. (2) It has a huge number of rules and protects your Safari pretty damn good. (3) You can disable the existing rules if you so choose. (4) You can add new ones based on URL regexes or CSS rules.

I am still using it actively both on my iPhone and iPad, one of the best investment in apps I ever did.

qpalz · on Sept 26, 2018

Jailbreak it, install openssh, ssh in and edit /etc/hosts. There's also packages in Cydia that add adblock lists to your hosts for you.

mirceal · on Sept 26, 2018

You fire up vi and load /etc/hosts /s

FroshKiller · on Sept 26, 2018

Well, I'm connected to my home network via VPN when I'm not on my home network, so....

SamuelAdams · on Sept 26, 2018

There's also Steven Black's host file:

https://github.com/StevenBlack/hosts

unethical_ban · on Sept 26, 2018

The issue with that is DNS resolution. I noticed that when I disconnect/reconnect my interface, it took >30 seconds for DNS resolution to properly resolve. Why? Because I was using a 65,000 entry host file on my modern Windows 10 machine.

It seems to only impact during NIC changes, but I VPN and was moving my computer enough that it was causing me issue.

I'd rather have a separate service to run it.

baud147258 · on Sept 27, 2018

I also had a performance problem with DNS resolution with a big host file, but disbling the DNS client service helped.