More

cwzwarich · 2025-12-20T16:53:30 1766249610

Nvidia doesn't share dies between their high-end datacenter products like B200 and consumer products. The high-end consumer dies have many more SMs than a corresponding datacenter die. Each has functionality that the other does not within an SM/TPC, nevermind the very different fabric and memory subsystem (with much higher bandwidth/SM on the datacenter parts). They run at very different clock frequencies. It just wouldn't make sense to share the dies under these constraints, especially when GPUs already present a fairly obvious yield recovery strategy.

cwzwarich · 2025-11-27T18:43:17 1764268997

Isn’t the 9000 TFLOP/s number Nvidia’s relatively useless sparse FLOP count that is 2x the actual dense FLOP count?

smilekzs · 2025-11-27T23:07:55 1764284875

Correct --- found a remark on Twitter calling this "Jenson Math".

Same logic when NVidia quote the "bidirectional bandwidth" of high speed interconnects to make the numbers look big, instead of the more common BW per direction, forcing everyone else to adopt the same metric in marketing materials.

cwzwarich · 2025-11-02T16:22:13 1762100533

The bigger problem with HOL (or simple type theory) is not the lack of dependencies, but rather the lack of logical strength. Simple type theory is equivalent in logical strength to bounded Zermelo set theory (i.e. ZF without Foundation or Replacement, and with Separation restricted to formulas with bounded quantifiers). This is unfortunately too weak to formalize post-WW2 mathematics in the same style as is done by ordinary mathematicians. Similarly, it does not offer a great way to deal with the size issues that arise in e.g. category theory.

oggy · 2025-11-03T10:58:11 1762167491

I haven't followed closely, and I'm only faintly acquainted with algebraic geometry and category theory. But the TFA links to a formalization of Grothendieck schemes, which are definitely post-WW2 material, and they rely on the Isabelle's locales feature. Are you familiar with this work? How far from the "ordinary mathematician's style" is it?

zozbot234 · 2025-11-02T16:49:21 1762102161

You can always add axioms to improve logical strength. For example, one common approach for dealing with size issues in set theory is positing so-called 'inaccessible cardinals' which amount to something quite similar to the 'universes' of type theory.

cwzwarich · 2025-11-02T17:43:05 1762105385

Adding axioms to simple type theory is more awkward than adding them to a set theory like ZFC. One approach to universes I’ve seen in Isabelle/HOL world is to postulate the existence of a universe as a model of set theory. But then you’re stuck reasoning semantically about a model of set theory. Nobody has scaled this up to a large pluralist math library like Mathlib.

creata · 2025-11-03T10:19:37 1762165177

[This comment was responded to by the author in an addendum to their post.]

cwzwarich · 2025-11-01T16:42:41 1762015361

The Rust specification you link is performative and only intended to satisfy requirements of certification processes. No one is actually using it to implement the language, as far as I am aware.

There is other work on specifying Rust (e.g. the Unsafe Code Guidelines Working Group), but nothing approaching a real spec for the whole language. Honestly, it is probably impossible at this point; Rust has many layers of implementation-defined hidden complexities.

woodruffw · 2025-11-01T18:12:03 1762020723

“There’s a standard, but it’s performative” is a different argument than the top level one.

But even if we accept that, it doesn’t seem like a good comparative argument: anybody who has written a nontrivial amount of C or C++ has dealt with compiler-defined behavior or compiler language extensions. These would suggest that the C and C++ standards are “performative” in the same sense, but repeated claims about the virtues of standardization don’t seem compatible with accepting that.

cwzwarich · 2025-11-01T19:53:47 1762026827

The original purpose of the C standard was to solve the problems created by the diversity of increasingly divergent implementations of C. They studied existing behavior across systems, proposed new language constructs, and it was generally a success (look at the proliferation of C in the 90s across many different systems and architectures).

The actual informal semantics in the standard and its successors is written in an axiomatic (as opposed to operational or denotational) style, and is subject to the usual problem of axiomatic semantics: one rule you forgot to read can completely change the meaning of the other rules you did read. There are a number of areas known to be ill-specified in the standard, with the worst probably being the implications of the typed memory model. There have since been formalized semantics of C, which are generally less general than the informal version in the standard and make some additional assumptions.

C++ tried to follow the same model, but C++ is orders of magnitude more complex than C and thus the standard is overall less well specified than the C++ standard (e.g. there is still no normative list of all the undefined behavior in C++). It is likely practically impossible to write a formal specification for C++. Still, essentially all of the work on memory models for low-level programming languages originates in the context of C++ (and then ported back to C and Rust).

tialaramex · 2025-11-01T22:02:12 1762034532

Well, the memory ordering model was developed for C++ and is used in C and Rust. But e.g. C++ does not have a pointer provenance model, which is arguably almost as important in this context. It turns out one of the things we really care about with multi-processing, and thus memory models is linked list hacks, and those only work if you have provenance rules, which uh, C++ just has a shrug emoji where the provenance rules would go so that's not great. C has an ISO document, although it's not part of the ISO C standard it's just another document so far, but Rust has specified provenance.

Also, the C++ ordering model is defective in the sense that while it offers the orders we actually use it also offers an order nobody knows how to implement, so it's basically just wishful thinking. For years now the C++ standard has labelled this order "temporarily discouraged" as experts tried to repair the definition and C++ 26 is slated to just deprecate it instead. Rust doesn't copy that defect.

csande17 · 2025-11-01T19:14:57 1762024497

They have a document that they sometimes describe using the word "specification", but its README clarifies that it's not actually a specification:

> The FLS is not intended to be used as the normative specification of the Rust language

woodruffw · 2025-11-01T21:55:50 1762034150

To be clear, my argument doesn't hinge on whether FLS is a normative specification of Rust or not. The argument is that being "specified" is neither necessary nor sufficient for language maturity or quality.

cwzwarich · 2025-10-28T17:49:01 1761673741

My reasons for leaving Apple had nothing to do with this decision. I was already no longer working on Rosetta 2 in a day-to-day capacity, although I would still frequently chat with the team and give input on future directions.

casualscience · 2025-10-29T05:52:01 1761717121

Just went through that thread, I can't believe this wasn't a team of like 20 people.

It's crazy to me that apple would put one guy on a project this important. At my company (another faang), I would have the ceo asking me for updates and roadmaps and everything. I know that stuff slows me down, but even without that, I don't think I could ever do something like this... I feel like I do when I watch guitar youtubers, just terrible

I hope you were at least compensated like a team of 20 engineers :P

nxobject · 2025-10-29T09:58:05 1761731885

History doesn't repeat, but it does rhyme: the initial (re)bootstrapping of OS X for Intel was done by one person, too.

https://www.quora.com/Apple-company/How-does-Apple-keep-secr...

AdamN · 2025-10-29T11:08:12 1761736092

Sometimes (often?), one very dedicated and focused person is better than a team of 20+. In fact companies would do well to recognize these situations and accommodate them better.

ferguess_k · 2025-10-30T15:44:26 1761839066

Found some gems buried in the comments:

    > Back then, Apple had a sabbatical program that encouraged (mandated?)     employees to take six consecutive weeks off every five years.

This is really a good take. I can't imagine companies give sabbatical programs nowadays. You still have your vacations so JK took 12 weeks (OP mentioned in the same comment). It was a boon for any system programmer who needs to clear his mind or deepen his thoughts.

markus_zhang · 2025-10-29T11:03:29 1761735809

This is amazing. I wonder what it took to port MacOS from PowerPC to Intel. Every assembly language part must be rewritten, that’s for sure. Anything else?

qwytw · 2025-10-29T20:15:49 1761768949

Didn't Nextstep support x86 long before MacOS X was a thing? I assumed that they always had it compilable on x86 long before the switch (since Rhapsody supported it). I guess the user space stuff might have been trickier but probably not the kernel itself and surrounding components.

ferguess_k · 2025-10-30T15:45:25 1761839125

Yeah but from what I read from the Quora answer, it sounds like JK did it from scratch? I could be wrong though. I just wonder how much effort is supposed to be put into such a project.

nxobject · 2025-10-30T22:47:31 1761864451

Likely a few foundational technologies that have had significant improvements/reimplementations from Rhapsody like the scheduler/threading infrastructure, memory management, Quartz, Carbon, Quartz.

markus_zhang · 2025-10-29T10:56:59 1761735419

I think a single 10x developer is really good for this kind of system programming projects.

ta9000 · 2025-10-29T05:01:31 1761714091

Thank you for your work!

markus_zhang · 2025-10-28T17:59:37 1761674377

Thank you for the clarification!

cwzwarich · 2025-09-20T13:31:19 1758375079

CPS is fairly dead as an IR, but the (local) CPS transform seems more popular than ever with languages implementing "stackless" control effects.

As far as functional IRs go, I would say SSA corresponds most directly to (a first-order restriction of) ANF w/ join points. The main difference being the use of dominance-based scoping rules, which is certainly more convenient than juggling binders when writing transformations. The first-order restriction isn't even essential, e.g. https://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf.

If you're interested in an IR that can express continuations (or evaluation contexts) in a first-class way, a much better choice than CPS is an IR based on the sequent calculus. As I'm sure you know (since you work with one of the coauthors), this was first experimented with in a practical way in GHC (https://pauldownen.com/publications/scfp_ext.pdf), but there is a paper in this year's OOPSLA (https://dl.acm.org/doi/10.1145/3720507) that explores this more fundamentally, without the architectural constraint of being compatible with all of the other decisions already made in GHC. Of course, one could add dominance scoping etc. to make an extension of SSA with more symmetry between producers and consumers like the classical sequent calculus.

cwzwarich · 2025-09-20T13:15:07 1758374107

> Call/ret instructions work really well with branch predictors. Lots of jumps (to continuations) might not work quite as well.

On x86, the use of paired call/return is conflated with usage of the stack to store activation records. On AArch64, indirect branches can be marked with a hint bit indicating that they are a call or return, so branch prediction can work exactly the same with activation records elsewhere.

cwzwarich · 2025-09-13T01:27:30 1757726850

There's an OOPSLA paper (referred to from the link starting that thread) from this year with 2 of the same authors that goes into more detail about using it as a compiler IR: https://dl.acm.org/doi/10.1145/3720507

cwzwarich · 2025-09-01T19:55:10 1756756510

It shouldn’t be difficult to write a binary translator to run 32-bit executables on a 64-bit userspace. You will take a small performance hit (on top of the performance hit of using the 32-bit architecture to begin with), but that should be fine for anything old enough to not be recompiled.

Fulgen · 2025-09-01T19:59:51 1756756791

In some ways, Windows already does that too - the 32-bit syscall wrappers [switch into a 64-bit code segment](https://aktas.github.io/Heavens-Gate) so the 64-bit ntdll copy can call the 64-bit syscall.

shmerl · 2025-09-01T19:58:01 1756756681

I would guess so, but I haven't seen anyone developing that so far.

cwzwarich · 2025-08-18T17:08:47 1755536927

Lean’s type theory extends CIC with the (global) axiom of choice, which increases consistency strength over base CIC.

wk_end · 2025-08-18T19:45:58 1755546358

Slightly out of my depth, but per Diaconescu's theorem [0] axiom of choice implies the law of the excluded middle. Does that make Lean non-constructive?

[0] https://en.wikipedia.org/wiki/Diaconescu%27s_theorem

cvoss · 2025-08-18T21:25:34 1755552334

Yes, but for an even more immediate reason. The axiom of choice itself is a non-constructive axiom. It asserts the existence of the inhabitant of a certain type, without in any way furnishing the inhabitant.

But Lean sequesters off the classical logic stuff from the default environment. You have to explicitly bring in the non-constructive tools in order to access all this. So, actually, I would disagree with GP that Lean's type system includes a choice axiom.

nrds · 2025-08-18T22:29:50 1755556190

Right, I was about to comment the same thing that "Lean" does not itself assume choice. Mathlib4 does, and Lean4 is designed so that all the built-in stuff is _consistent_ with choice. But you can do purely constructive mathematics in Lean and it's is even designed so that any non-propositions depending on choice have to be annotated "noncomputable", so the walled garden of choice is baked in at a language level. Even within "noncomputable" code, choice still shows up in the axiom list if used.

cwzwarich · 2025-08-18T21:20:10 1755552010

Yes, in fact Lean proves the law of the excluded middle using Diaconescu's theorem rather than assuming it as an independent axiom:

https://github.com/leanprover/lean4/blob/ad1a017949674a947f0...