You are conflating implementation tricks with language semantics. In Lisp, NIL is always an atom, never a cons.
Also, a dotted list is a nonempty list where the cdr of the last cons is not NIL. It is not a notational convention. A list (a b c) is not a dotted list, even if you write it as (a . (b . (c . nil))).
> You are conflating implementation tricks with language semantics.
No, I'm not.
> In Lisp, NIL is always an atom, never a cons.
That depends on what you mean by "atom". If by "atom" you mean something that answers true to the ATOM predicate then yes, NIL is an atom. But if by "atom" you mean something that produces an error if you try to call CAR or CDR on it then no, NIL is not an atom, it is equal to (CONS NIL NIL) because (CAR NIL) and (CDR NIL) are both NIL.
So the result of (ATOM NIL) is an arbitrary choice. And CL arguably got it wrong because it fails to preserve the invariant that if (ATOM X) is true then (CAR X) and (CDR X) will produce errors.
[UPDATE]
> a dotted list is a nonempty list where the cdr of the last cons is not NIL
But this is just terminology. In CL, NIL behaves exactly the same as (NIL . NIL) with respect to CAR and CDR.
Indeed, it is exactly this confusion that is the basis for a lot of criticism of CL's design. In Scheme, the empty list is unambiguously atomic: trying to take the CAR or CDR of the empty list in Scheme is an error, as with any other atom.
When baby Lispers are born, the first thing they are taught is a dichotomy: the universe is split into conses and atoms. Cons cells are simple and composed of two components, which are called the car and cdr. There are functions to retrieve what's stored in these components, which are called CAR and CDR. Atoms may be as complex as you like. They include objects like numbers, characters, strings, arrays, symbols, etc. NIL is not a cons cell, but a symbol. We can represent lists by chaining cons cells. We may start with an empty list, and by convention this is the symbol NIL. If we also choose to designate NIL as the false value, and everything else as true values, then it is useful to modify CAR and CDR so that they take not only conses, but also the symbol NIL, and return NIL, which is false, and the empty list. We then say that CAR and CDR take a list, i.e. an object of type (or cons null). We do not say that NIL is a cons.
Your [UPDATE] shows that you still don't understand what is meant by "dotted list". I already gave a definition of one, but did not give an example. An example of a dotted list is (a b . c) i.e. the last cons has a cdr that is (i) an atom (otherwise, it wouldn't have been the last cons) and (ii) not NIL (which is the conventional empty list designation).
By an informal metonymy, the internal object is called a dotted list. It's only an informal usage among Lisp coders. The correct terminology is "improper" for the object and "dotted" for the spelling.
Note that (a b c . nil) is dotted, but it's the same as (a b c) which is proper.
Unfortunately, the Common Lisp specification encodes the "dotted list" informality in the Glossary. Common Lisp does things like that.
Furthermore Common Lisp uses "dotted list" as an essential term denoting a subset of of "improper list". An "improper list" is circular or not terminated by nil. A dotted list is only the latter.
That's in spite of the fact that a circular list will print with the dot notation: #1=(a b c . #1#)! Circular lists are dotted when completely printed, under the circle notation.
Another problem is that the append function and others support the idea of a non-nil atom being an empty dotted list. For instance (append '(a b c) 'd) will work and produce (a b c . d). Yet the "dotted list" definition excludes such an atom. If we go by the presence of a dot, that is correct, but then we know from circular lists not being dotted that that isn't the criterion.
Right, I just wrote about it in my reply to "lisper". The name "dotted list" may not have been the best choice, but that's not an outlier in human history.
You don't need to be quite so condescending. I understand all this perfectly well. But the concept of a "dotted list" has to do with how a list is serialized, not how it is represented in the internally, which is what I am talking about.
NIL is weird in CL and different implementations handle this weirdness in different ways. One way to handle it is to represent NIL as a symbol whose name is "NIL" and write CAR and CDR to recognize when they see this symbol and return it. Another way to handle it is to represent NIL as a cons cell whose CAR and CDR point to itself and write SYMBOLP and SYMBOL-NAME to recognize when they see this privileged cons cell and return T and "NIL" respectively. (There are a lot of other special cases -- this is not an exhaustive list.)
However you slice it, the concept of a dotted list is a non-sequitur because that has NOTHING to do with how NIL is implemented internally, which what I am talking about.
The name "dotted list" comes from how the list is serialized, yes, but not the concept. You can write a function dotted-list-p that takes an object and returns the value of (and (consp object) (cdr (last object))). Indeed, it is not related to the NIL issue, and I made no such assumption. You can read an equivalent definition that uses different words in the CLHS glossary.
You admit to talking about "how NIL is implemented internally", hence my original remark that you are conflating implementation tricks with language semantics. From language semantics point of view, NIL is not weird, it is an ordinary symbol, like NIK or NIM. Lisp tradition assigns it certain roles, like representing the empty list, representing the false value, representing the empty type, and also has conveniences like having NIL evaluate to itself or having CAR and CDR take lists instead of conses.
I didn't mean to be condescending. I know you are not a baby Lisper. The matter at hand is very basic, and I understand the desire to present a more sophisticated take, but believe it leads to (and reflects) a distorted ontological view of Lisp if taken seriously. NIL is not weird, it is a simple symbol. We just assigned it a few roles and made a few conveniences. Why did we pick it for those roles? Arbitrary choice. Why did we choose to extend the domain of CAR and CDR? Practical choice. Let the puritans complain and build their own ivory tower languages. Lisp is pragmatic. Could an implementor choose to represent NIL as a closet cons for simple CAR and CDR implementations and special case everything else? Sure. But this has nothing to do with the ontology of Lisp.
> having CAR and CDR take lists instead of conses.
(car nil) and (cdr nil) safely returning nil was introduced in InterLisp, according to Gabriel's HOPL palper. At some big Lisp summit in the early 1970's, InterLisp decided to adopt MacLisp's readtables, and MacLisp adopted (car nil) -> nil.
Why the empty list is a symbol is natural: math uses symbols to refer to such things. For instance the empty set is notated both {} and ∅: empty braces or the special null set symbol.
I mean, why have symbols refer to things in a language that is consciously oriented toward symbolic processing, whose initial creators were people with math backgrounds? It's pretty much a forgone no-brainer.
Having a symbol DENOTE the empty list and having a symbol be THE SAME OBJECT as the empty list are two very different things. The former is perfectly reasonable. The latter causes no end of difficulties.
The former is somewhat reasonable for having pi denote 3.14159... and such.
It brings in its own difficulties. If nil is a variable/constant which evaluates to some nil object, then to talk about nil itself, we have to quote it.
The object which it denotes doesn't print as nil; it has its own printed rep like () and we will end up seeing that printed rep and using it. So then we have two nil representations to deal with: the nil variable which we can use in evaluated contexts, and the literal nil like () that we use elsewhere.
If () isn't self-evaluating (like the criminally stupid design in Scheme), we have to quote it: '().
If we have additional semantic roles for () like it being Boolean false and the bottom of the type spindle, those uses are not nicely served by the () notation, from an esthetic point of view.
The list terminator is de facto a symbol because it's exploited for its identity: we care whether the cdr of a cons is or is not nil, and that's all. That's a symbolic behavior; we might as well complete things so that the symbol functions work with nil; it can have a property list, name and so on.
So? That's no different than if you want to talk about 'pi rather than pi a.k.a. 3.14159...
> So then we have two nil representations to deal with
No different than "pi" and "3.14159".
> If () isn't self-evaluating (like the criminally stupid design in Scheme), we have to quote it: '().
I agree, () should be self-evaluating just like numbers and vectors. The behavior of () should be analogous to the behavior of 0 or 0.0 or #() or "".
> If we have additional semantic roles for () like it being Boolean false
And why would you want to do a stupid thing like that?
> The list terminator is de facto a symbol because it's exploited for its identity
First, the list terminator need not be the same thing as the empty list. The list terminator is an implementation thing. The empty list is a language-semantics thing. These need not be the same.
Second, neither of these need to be unique. There can be multiple list terminators, and there can be multiple empty lists, just as there can be multiple empty vectors and multiple empty strings and even multiple instances of the same number (which actually happens with bignums).
> we care whether the cdr of a cons is or is not nil
No, what we care about -- or at least what we should care about -- is whether the CDR of a cons is an (n.b. not the) empty list. (eq #() #()) need not be true (in fact, generally isn't). Why should (eq () ()) be any different?
And BTW under no circumstances should taking the CAR or CDR of an empty list do anything other than signal an error.
> First, the list terminator need not be the same thing as the empty list.
No it doesn't, but that choice happens to give us a compact recursive definition.
> There can be multiple list terminators ... multiple empty lists ...
Sure, and 2022 can be written MMXXII, and whatnot.
Mathematically, there is one empty list, so why proliferate it?
There can be multiple empty strings which is useful if strings are mutable. If strings are immutable, it's silly to have more than one empty string.
The empty list is immutable, so ...
> under no circumstances should taking the CAR or CDR of an empty list do anything other than signal an error.
Lisp 1 and 1.5 had it that way, certainly. It's mostly just inconvenient. A good mix is to have strict vectors and strings which signal on out-of-bounds, but lists which are forgiving. Forgiving lists allow good old Ashwin Ram to have:
> If strings are immutable, it's silly to have more than one empty string.
I guess Common Lisp is silly then.
Clozure Common Lisp Version 1.12.1 (v1.12.1-10-gca107b94) DarwinX8664
? (eq "" "")
NIL
> (cdr (assq key a-list))
Much better to have an abstract associative map (dictionary) type with an opaque implementation rather than punning cons cells (which locks you in to an O(n) implementation). ALists are interesting from a historical point of view but they should never be used in modern code without hiding them under a layer of abstraction.
And even if you are going to pun cons cells to build an associative map, alists are the wrong way to do it because it forces you to duplicate the keys for every frame. Much better to use D-lists ((key1 key2 ...) val1 val2 ...) because that lets you re-use the key list, which can cut your memory usage in half, and provides a much more straightforward path from interpreter to compiler for pedagogical purposes.
Strings aren't immutable in Lisp, so there isn't a huge benefit to making (eq "" "").
It may be undefined behavior to modify "", but it's not the same thing. Suppose that mutating "" signals an error; that still leaves the problem that some other string which you are allowed to mutate can be mutated empty, yet is a distinct object from that empty string.
Moreover, though every string has "" as a suffix, it's not by way of pointing to a "" object. A unique "" wouldn't serve the role of terminator.
A language with immutable strings could intern all strings, so they are de facto symbol, and then exact string comparison is eq. That implies there is only one empty string object.
Neither are lists. Only the empty list is immutable, and likewise empty strings. It really is a completely equivalent situation. There is no principled reason that the empty list should be unique and the empty string not.
Yes there is a principled reason. If we accept that we have a list which is recursively defined using a binary cell aggregate structure, as a right-leaning tree shape, then it is advantageous to have an atomic, unique empty list at the bottom of the recursion. That atomic empty list can be represented as a machine word. We can perform a single comparison to detect the terminator: is it that object or not? Anything else, though workable, is a gratuitous complication.
The empty string is mutable. Even the literal one is potentially mutable: you can try it at your own risk. You ca make a mutable empty string with (make-string 0) or by mutating a non-empty string.
Any mutable string can be mutated to make it empty:
> The name "dotted list" comes from how the list is serialized, yes, but not the concept.
Yes, it does. Dotted lists are entirely related to serialization. The only reason they matter is that, by convention, (a b ... z) is a shorthand notation for (a . (b . (... (z . nil)) ...) and (a b ... z . anything-but-nil) is a shorthand notation for (a . (b . (... (z . anything-but-nil)) ...)
> You can write a function dotted-list-p
Of course you can. So what? You can write a predicate for any (computable) property. I can write a function list-ends-in-3-p. That doesn't mean that lists that end in 3 matter.
The reason "dotted lists" are called dotted lists is entirely because of their serialization behavior: dotted lists have a dot in their serialization and non-dotted-lists don't. And the reason that the I/O behavior is what it is is that it turns out that punning cons cells as linked lists is a useful hack (or at least it was in 1958). But it is only a hack. There is no reason that the data structure used to represent pairs has to be the same data structure that is used to represent linked lists. Indeed, one could argue that this punning is actually a serious mistake. There should be pairs, with CAR and CDR fields which can take on any value, and there should be (linked) lists, with a FIRST field that can take on any value, and a REST field that is restricted to only contain another linked list (including a privileged empty list object). In such a design, the whole concept of "dotted list" would be non-sensical. You could still write (a . nil) or (a . ()) but that would no longer be the same object as (a), the former being a pair and the latter being a list.
So you see the concept of dotted list is rooted entirely in an I/O hack that John McCarthy invented back in 1958 so he could build linked lists out of punned pairs rather than make them a separate data type.
> NIL is not weird, it is an ordinary symbol, like NIK or NIM
No, NIL is not an "ordinary symbol". NIL is both a symbol and a list. NIL answers T to LISTP. No other symbol does that. You can call CAR, CDR, LENGTH, ASSOC etc. on NIL and not get an error. You cannot do that with any other symbol.
NIL is the only symbol to which you cannot give a function binding. In some implementations, attempting to do so triggers its own error message:
Clozure Common Lisp Version 1.12.1 (v1.12.1-10-gca107b94) DarwinX8664
? (defun nil () t)
> Error: Using NIL as a function name is silly.
Also, NIL answers T to LISTP but not to CONSP. So you can call CAR and CDR on it but not RPLACA and RPLACD. And, at the risk of stating the obvious, NIL answers T to NULL.
NIL is the only object with these properties. That is the very definition of "weird". (And note that I have made no reference to implementation details here.)
The point was that dotted-list-p doesn't have anything to do with the "serialized form" (printed representation) of the list. Dotted lists are important because they are one of the two types of improper list, the other type being circular lists. Many Lisp functions work on proper lists.
Of course NIL is an ordinary symbol. It has roles, the same way other symbols have roles. Are you saying the symbol &BODY is not an ordinary symbol because it is a lambda list keyword? Are you saying the symbol T is not an ordinary symbol because type boolean is (member t nil)? Are you saying symbol * is not an ordinary symbol because it is a special variable bound by the REPL, it is used in declaration syntax, and also is the name of the product function? Are you saying MUMBLE:VAPORIZE is not an ordinary symbol because it names the most dangerous operation in the mumble library? They are all ordinary symbols, sometimes with unique roles. Because NIL was chosen to satisfy the role of the empty list, is it any wonder that type list is (or cons null), type null is (eql nil), LENGTH and ASSOC and other list functions can deal with it etc.? Of course not. That doesn't make NIL qua symbol any weirder than any other symbol that has particular roles in a given context.
There are many symbols in Common Lisp that you cannot DEFUN. See section 11.1.2.1.2 in the hyperspec. Since NIL names a constant variable but not a standardized function, macro, or special operator, you can bind it to a function lexically using FLET or LABELS.
Once again, NIL is not a cons, it is an atomic symbol that also represents the empty list, so it's no wonder that LISTP returns true and CONSP returns false. CAR and CDR take lists (again see their entries in the CLHS) so it's no wonder that they can work with it. RPLACA and RPLACD take conses only so it's no wonder that they don't. At the risk of stating the obvious (again and again) the type null is (eql nil) so it's no wonder that NULL returns T for it. That operators treat a symbol specially does not make a symbol weird. It merely means that it serves a particular role. Any symbol can take any number of roles within a program. That doesn't make the symbol weird. I will repeat once more, because I've a feeling you didn't get this. It doesn't make a symbol weird.
It is a basic matter, definitely. I learned all this stuff about conses, atoms, symbols, lists, etc. very early on, as baby Lisper. I don't know why someone who spent 35 years programming Lisp should have trouble understanding all this, and has to keep bringing up new irrelevant/incorrect claims instead of just opening a beginner's Lisp book and re-reading the first chapter or two, where they talk about all this stuff.
> dotted-list-p doesn't have anything to do with the "serialized form" (printed representation) of the list
It does: the details of the serialization are the reason that the concept of "dotted list" even exists. At the risk of belaboring the obvious, dotted lists are called "dotted lists" because there is a syntactically-significant dot in their serialization.
> There are many symbols in Common Lisp that you cannot DEFUN. See section 11.1.2.1.2 in the hyperspec.
It is not that you cannot defun them, it is that this is undefined behavior. At least one implementation (CCL) lets you defun just about everything except NIL:
? (defun &body () t)
&BODY
? (&body)
T
? (defun t () nil)
T
? (t)
NIL
But, as noted earlier...
? (defun nil () t)
> Error: Using NIL as a function name is silly.
> Since NIL names a constant variable but not a standardized function, macro, or special operator, you can bind it to a function lexically using FLET or LABELS.
NIL's lack of weirdness in this one regard actually seems pretty weird to me. IMHO it would actually makes sense to prohibit defining a function whose name is the same object that designates an empty list, just as it makes sense to prohibit defining functions whose names are (say) numbers.
> That operators treat a symbol specially does not make a symbol weird.
That all turns on how you define "weird". One dictionary definition of "weird" is "strikingly odd or unusual, especially in an unsettling way; strange". The fact that the CAR and CDR of NIL are both NIL seems weird to me. The fact that NIL is the canonical boolean false is NIL rather than F seems weird to me. (Actually, the canonical booleans, if they were going to be symbols at all, should have been :T and :F or :TRUE and :FALSE. But that's a different discussion.)
> Are you saying the symbol &BODY is not an ordinary symbol because it is a lambda list keyword?
Yes. Lambda-list keywords are weird. They behave differently from X and Y and Z and BAZ and BAR and BING and BIFF and BOFF and ORDINARY-SYMBOL and even WEIRD-SYMBOL. The same is true for symbols interned in the keyword package. All of these things are weird.
(CONS X NIL) = '(X), a list of length 1 containing X, so (CONS NIL NIL) = '(NIL), a list of length 1 containing NIL.
But NIL is a list of length 0, not a list of length 1.
The wart is that it should never have been the case that you could call CAR and CDR on NIL. But even though you can wartily call them on NIL, there is still clearly a very important distinction between NIL and (CONS (CAR NIL) (CDR NIL))!
Yes, that's true, but that is a special case. For all other objects X and Y, if (CAR X) is equal to (CAR Y) and (CDR X) is equal to (CDR Y) then X and Y are equal. NIL and (NIL) are the only exception. And the fact that they are an exception is a consequence of the design decision to allow CAR and CDR to be called on NIL and return NIL.
> The wart is that it should never have been the case that you could call CAR and CDR on NIL.
Yes, that is the whole point.
> But even though you can wartily call them on NIL, there is still clearly a very important distinction between NIL and (CONS (CAR NIL) (CDR NIL))!
You could have as well said between NIL and (CONS NIL NIL) or just (NIL). And yes, this is true. Nonetheless, it is possible to implement NIL as a privileged cons cell with both CAR and CDR pointing to itself under the hood, and many CL implementations actually do this. It's a design decision. You have to put the warty code somewhere. You can put it in CAR and CDR, or you can put it in NULL, EQUAL, SYMBOLP, etc. But you have to put it somewhere.
Also, a dotted list is a nonempty list where the cdr of the last cons is not NIL. It is not a notational convention. A list (a b c) is not a dotted list, even if you write it as (a . (b . (c . nil))).