K Isn't Lisp (2004)

theamk · on Feb 13, 2024

That K solution is O(n^2) complexity, vs O(n*log(n)) for Lisp.

I kinda hate it how all the K advocacy always uses old-style languages with small stdlibs, _and_ always ignores the complexity.. How about matlab (which does have "unique" in stdlib), python/numpy or even R?

foomanmanfoo · on Feb 13, 2024

Unsure of performance but using R's `data.table`(which has uses beyond this example so installing this library shouldn't be a one off)

   library(data.table)
   v <- rbindlist(list(
       data.table(e1=1, string='one'),
       data.table(e1=1, string='two'),
       data.table(e1=1, string='three'),
       data.table(e1=1, string='one'),
       data.table(e1=1, string='four'),
       data.table(e1=1, string='two'))
    )
   v[, list(counter= sum(e1)),by= string]

rak1507 · on Feb 13, 2024

I'm pretty sure the K solution is O(n)? Both group and unique should be O(n).

theamk · on Feb 13, 2024

Pretty sure it's impossible to calculate unique/group in O(n), unless either the data is pre-sorted, or your range is extremely limited (and problem says nothing about this), so it has to be at least O(nlog(n))

For K, I think the solution is to find unique first (nlog(n)), the for each unique value go over the list and find how many input values match (multiply by n). That's O(n^2 * log(n)), even worse than I wrote initially.

mlochbaum · on Feb 13, 2024

No, the k solution computes the unique elements and group independently. #:'=v[;1] uses group for the first column, and ?v[;1] uses unique for the second. (;) is list notation.

Your analysis of the algorithm you've described is also wrong: it would be O(n*log(n) + n^2), or O(n^2), because the second part loops over unique elements and not operations performed by unique.

In the random-access model, hash table construction/lookup is O(n) average case for n elements, if the elements have bounded size. More generally, it's linear in the total size of the input. If the strings are very long it can take a while to hash and compare them, but there's never a log(n) factor. Maybe you're confusing this with results from comparison sorting?

Both solutions are O(n) with the assumptions of random-access memory and bounded string size (or hash/comparison time), which I consider to be pretty standard.

adw · on Feb 13, 2024

Python has https://docs.python.org/3/library/collections.html#collectio... (which I'm sure would raise objections over the volume of code – https://github.com/python/cpython/blob/main/Lib/collections/... – but a lot of it is providing syntactic sugar).

dreamcompiler · on Feb 13, 2024

Lisp solution looks like O[n] to me. Where does the log[n] factor come from?

camdez · on Feb 13, 2024

The exact semantics of the problem aren't clear (do we need to maintain order? Do we need to use the initial count? Why do we have counts on subsequent elements if we only add 1?), but the logical Clojure solution would just be:

  (frequencies (map second v))
  ;; => {"one" 2, "two" 2, "three" 1, "four" 1}

  ;; or:

  (update-vals (group-by second v) count)
  ;; => {"one" 2, "two" 2, "three" 1, "four" 1}

  ;; +(#:'=v[;1];?v[;1])
  ;; flip(count each group v[;1];unique v[;1])

So, shorter than the readable K version in a Lisp.

What did we learn here? Probably nothing.

K is great for sequence operations--not sure what we're trying to imply about Lisps.

aidenn0 · on Feb 13, 2024

It's trivial to extend the SERIES library to allow collecting sums in a hash-table (It was just a cut-paste of the collect-hash function with setf changed to incf; further extending to allow generic modifications of values is an exercise left to the reader). That allows a shorter (but not as short as I'd like) implementation. It could be shorter if the output list were to have elements like ("one" 1) as the multiple-value-bind exists to reverse that ordering for unpacking the hash-table at the end.

Here's the result (if you wanted to further optimize and know that the result will fit in a fixnum (i.e. +/- 2^62 on modern implementations, you can change the "'number" to "'fixnum"):

  (defun unique-sum (list)
    (multiple-value-bind (keys values)
        (scan-hash
         (collect-hash-sum
          (map-fn 'string #'cadr (scan list))
          (map-fn 'number #'car (scan list))
          :test #'equal))
      (map-fn 'list #'list values keys)))

And the extension to SERIES:

  (series::defS collect-hash-sum (keys values &rest option-plist)
    "(collect-hash keys values :test :size :rehash-size :rehash-threshold)
  
  Combines a series of keys and a series of values together into a hash
  table.  Duplicate keys sum the values. The keyword arguments specify
  the attributes of the hash table to be produced.  They are used as
  arguments to MAKE-HASH-TABLE"
    (fragl ((keys t) (values t) (option-plist))
    ((table))
    ((table t (apply #'make-hash-table option-plist)))
    ()
           ()
           ((incf (gethash keys table 0) values)) () () nil)
   :optimizer
    (series::apply-literal-frag
      (list '(((keys t) (values t) (table)) ((table)) () ()
              () ((incf (gethash keys table 0) values)) () () nil)
            keys values `(make-hash-table ,@ option-plist)))
   :trigger t)

tmtvl · on Feb 13, 2024

Doing the naive thing in CL without thinking about things like 'efficiency' or 'maintainability':

  (let ((test-input '((1 "one") (1 "two") (1 "three") (1 "one") (1 "four") (1 "two"))))
    (mapcar (lambda (string)
              (list (count string test-input :test #'string= :key #'second)
                    string))
      (remove-duplicates (mapcar #'second test-input)
                         :test #'string= :from-end t)))

I will admit it's not as short as:

  v:((1;"one");(1;"two");(1;"three");(1;"one");(1;"four");(1;"two"))
  +(#:'=v[;1];?v[;1])

taeric · on Feb 13, 2024

My naive looked more like: (with apologies on likely messing up spacing in hn.)

    (let ((in '((1 "one") (1 "two") (1 "three") (1 "one") (1 "four") (1 "two"))))
           (loop with hash = (make-hash-table :test #'equal)
                 for v in in
                 do (incf (gethash (cadr v) hash 0))
                 finally (return (loop for key being the hash-keys of hash
                                       collect (list (gethash key hash) key)))))

SeanLuke · on Feb 13, 2024

Now you have me thinking about how short I can make that. My revision:

    (defun doit (elts)
        (mapcar (lambda (elt) `(,(count elt elts :test 'equal :key 'second) ,elt))
        (delete-duplicates (mapcar 'second elts) :test 'equal)))))

or maybe

    (defun doit (elts)
        (loop for elt in (delete-duplicates (mapcar 'second elts) :test 'equal)
        collect `(,(count elt elts :test 'equal :key 'second) ,elt)))

tmtvl · on Feb 13, 2024

As much as Lisp isn't meant for code golf, I think the loop variation could be used as a (rather long) one-liner.

asa400 · on Feb 13, 2024

awk:

  echo '1 "one"\n1 "two"\n1 "three"\n1 "one"\n1 "four"\n1 "two"' | awk '{ counters[$2] += $1 } END { for (id in counters) { print counters[id], id } }'

  2 "two"
  1 "four"
  1 "three"
  2 "one"

mjcohen · on Feb 14, 2024

And you can use the system sort or gawk's internal sort to get them in order.

aidenn0 · on Feb 13, 2024

Does the K solution handle non-unity values in the beginning of each sublist? e.g. ((1 "one")(2 "one")) should result in ((3 "one"))

maximus-decimus · on Feb 14, 2024

Bizarrely enough, that didn't seem to be a requirement.

But no, it doesn't. It completely ignores the digit, just groups by the string and counts that.

The problem is very weird and stupid though. The input is clearly logically a list of dictionaries with a string as a key and a count as the value. If you have that list (let's call is myList), in K you just need to sum the dictionaries together and it will automatically adds all the counts together. So the code would literally just be "+/myList".

I would just convert the list of lists to a list of dictionaries and do that.

aidenn0 · on Feb 14, 2024

I agree that the problem is weird and stupid. It does highlight a lack of dynamic grouping[1] in lisp iteration tools though; as TFA suggests, arbitrary grouping is done idiomatically in lisp using an explicit associative type (usually a hash-table).

Even my contribution (using the SERIES library) keeps the hash-table explicit.

I don't know how much of this is habit, how much is lisp programmers being control freaks, and how much of it is "If I need to do grouping, having the hash-table explicitly available will probably be needed at some point anyways"

1: Static grouping (i.e. where all group keys are known in advance) is trivial.

maximus-decimus · on Feb 14, 2024

Bizarrely enough, that didn't seem to be a requirement.

But no, it doesn't. It completely ignores the digit, just groups by the string and counts that.

The problem is very weird and stupid though. The input is clearly logically a list of dictionaries with a string as a key and a count as the value. This is K, not Bash, you should be using a list of dictionaries, not a string. If you have that list (let's call is myList), in K you just need to sum the dictionaries together and it will automatically adds all the counts together. So the code would literally just be "+/myList".

gweinberg · on Feb 13, 2024

If they wanted to have a language that isn't lisp but whose name started with a K, wouldn't it have made more sense to call it kil?

7thaccount · on Feb 13, 2024

K is in the APL family of languages and thus not really related to lisp. The language inventor wrote A+ prior to K. He also assisted with a prototype for the J language (look for the J incunabulum to see), so K is a natural next step.

https://www.jsoftware.com/ioj/iojATW.htm

i_don_t_know · on Feb 14, 2024

Brian Cantrill interviewed Arthur Whitney for the ACM Queue in 2009[1]:

> BC You had used APL, and then you explored these other languages—Prolog variants and so on—but when you got to Morgan Stanley you came back to APL. What brought you back?

> AW I much preferred implementing and coding in LISP, but once I was dealing with big data sets and then having to do fairly simple calculations, APL just seemed to have the better vocabulary.

> It had to come up one level. Common LISP even then had about 2,000 primitives. I didn't like that. What I liked was the original LISP, which had car, cdr, cons, and cond, but that was too little. Common LISP was way too big, but a stripped-down version of APL was in the middle with about 50 operations. It's about the same size as C. But the thing about the languages that I implement is that there are no libraries: those 50 operations are it. Everybody builds from there, and the resulting programs are extremely short.

[1] https://queue.acm.org/detail.cfm?id=1531242

ksherlock · on Feb 13, 2024

If you look the original lisp (ie, M-expressions) and replace the linked lists with vectors it's not that far off.