Nvidia doesn't share dies between their high-end datacenter products like B200 and consumer products. The high-end consumer dies have many more SMs than a corresponding datacenter die. Each has functionality that the other does not within an SM/TPC, nevermind the very different fabric and memory subsystem (with much higher bandwidth/SM on the datacenter parts). They run at very different clock frequencies. It just wouldn't make sense to share the dies under these constraints, especially when GPUs already present a fairly obvious yield recovery strategy.
Correct --- found a remark on Twitter calling this "Jenson Math".
Same logic when NVidia quote the "bidirectional bandwidth" of high speed interconnects to make the numbers look big, instead of the more common BW per direction, forcing everyone else to adopt the same metric in marketing materials.
The bigger problem with HOL (or simple type theory) is not the lack of dependencies, but rather the lack of logical strength. Simple type theory is equivalent in logical strength to bounded Zermelo set theory (i.e. ZF without Foundation or Replacement, and with Separation restricted to formulas with bounded quantifiers). This is unfortunately too weak to formalize post-WW2 mathematics in the same style as is done by ordinary mathematicians. Similarly, it does not offer a great way to deal with the size issues that arise in e.g. category theory.
I haven't followed closely, and I'm only faintly acquainted with algebraic geometry and category theory. But the TFA links to a formalization of Grothendieck schemes, which are definitely post-WW2 material, and they rely on the Isabelle's locales feature. Are you familiar with this work? How far from the "ordinary mathematician's style" is it?
You can always add axioms to improve logical strength. For example, one common approach for dealing with size issues in set theory is positing so-called 'inaccessible cardinals' which amount to something quite similar to the 'universes' of type theory.
Adding axioms to simple type theory is more awkward than adding them to a set theory like ZFC. One approach to universes I’ve seen in Isabelle/HOL world is to postulate the existence of a universe as a model of set theory. But then you’re stuck reasoning semantically about a model of set theory. Nobody has scaled this up to a large pluralist math library like Mathlib.
The Rust specification you link is performative and only intended to satisfy requirements of certification processes. No one is actually using it to implement the language, as far as I am aware.
There is other work on specifying Rust (e.g. the Unsafe Code Guidelines Working Group), but nothing approaching a real spec for the whole language. Honestly, it is probably impossible at this point; Rust has many layers of implementation-defined hidden complexities.
“There’s a standard, but it’s performative” is a different argument than the top level one.
But even if we accept that, it doesn’t seem like a good comparative argument: anybody who has written a nontrivial amount of C or C++ has dealt with compiler-defined behavior or compiler language extensions. These would suggest that the C and C++ standards are “performative” in the same sense, but repeated claims about the virtues of standardization don’t seem compatible with accepting that.
The original purpose of the C standard was to solve the problems created by the diversity of increasingly divergent implementations of C. They studied existing behavior across systems, proposed new language constructs, and it was generally a success (look at the proliferation of C in the 90s across many different systems and architectures).
The actual informal semantics in the standard and its successors is written in an axiomatic (as opposed to operational or denotational) style, and is subject to the usual problem of axiomatic semantics: one rule you forgot to read can completely change the meaning of the other rules you did read. There are a number of areas known to be ill-specified in the standard, with the worst probably being the implications of the typed memory model. There have since been formalized semantics of C, which are generally less general than the informal version in the standard and make some additional assumptions.
C++ tried to follow the same model, but C++ is orders of magnitude more complex than C and thus the standard is overall less well specified than the C++ standard (e.g. there is still no normative list of all the undefined behavior in C++). It is likely practically impossible to write a formal specification for C++. Still, essentially all of the work on memory models for low-level programming languages originates in the context of C++ (and then ported back to C and Rust).
Well, the memory ordering model was developed for C++ and is used in C and Rust. But e.g. C++ does not have a pointer provenance model, which is arguably almost as important in this context. It turns out one of the things we really care about with multi-processing, and thus memory models is linked list hacks, and those only work if you have provenance rules, which uh, C++ just has a shrug emoji where the provenance rules would go so that's not great. C has an ISO document, although it's not part of the ISO C standard it's just another document so far, but Rust has specified provenance.
Also, the C++ ordering model is defective in the sense that while it offers the orders we actually use it also offers an order nobody knows how to implement, so it's basically just wishful thinking. For years now the C++ standard has labelled this order "temporarily discouraged" as experts tried to repair the definition and C++ 26 is slated to just deprecate it instead. Rust doesn't copy that defect.
To be clear, my argument doesn't hinge on whether FLS is a normative specification of Rust or not. The argument is that being "specified" is neither necessary nor sufficient for language maturity or quality.
My reasons for leaving Apple had nothing to do with this decision. I was already no longer working on Rosetta 2 in a day-to-day capacity, although I would still frequently chat with the team and give input on future directions.
Just went through that thread, I can't believe this wasn't a team of like 20 people.
It's crazy to me that apple would put one guy on a project this important. At my company (another faang), I would have the ceo asking me for updates and roadmaps and everything. I know that stuff slows me down, but even without that, I don't think I could ever do something like this... I feel like I do when I watch guitar youtubers, just terrible
I hope you were at least compensated like a team of 20 engineers :P
Sometimes (often?), one very dedicated and focused person is better than a team of 20+. In fact companies would do well to recognize these situations and accommodate them better.
> Back then, Apple had a sabbatical program that encouraged (mandated?) employees to take six consecutive weeks off every five years.
This is really a good take. I can't imagine companies give sabbatical programs nowadays. You still have your vacations so JK took 12 weeks (OP mentioned in the same comment). It was a boon for any system programmer who needs to clear his mind or deepen his thoughts.
This is amazing. I wonder what it took to port MacOS from PowerPC to Intel. Every assembly language part must be rewritten, that’s for sure. Anything else?
Didn't Nextstep support x86 long before MacOS X was a thing? I assumed that they always had it compilable on x86 long before the switch (since Rhapsody supported it). I guess the user space stuff might have been trickier but probably not the kernel itself and surrounding components.
Yeah but from what I read from the Quora answer, it sounds like JK did it from scratch? I could be wrong though. I just wonder how much effort is supposed to be put into such a project.
Likely a few foundational technologies that have had significant improvements/reimplementations from Rhapsody like the scheduler/threading infrastructure, memory management, Quartz, Carbon, Quartz.
CPS is fairly dead as an IR, but the (local) CPS transform seems more popular than ever with languages implementing "stackless" control effects.
As far as functional IRs go, I would say SSA corresponds most directly to (a first-order restriction of) ANF w/ join points. The main difference being the use of dominance-based scoping rules, which is certainly more convenient than juggling binders when writing transformations. The first-order restriction isn't even essential, e.g. https://compilers.cs.uni-saarland.de/papers/lkh15_cgo.pdf.
If you're interested in an IR that can express continuations (or evaluation contexts) in a first-class way, a much better choice than CPS is an IR based on the sequent calculus. As I'm sure you know (since you work with one of the coauthors), this was first experimented with in a practical way in GHC (https://pauldownen.com/publications/scfp_ext.pdf), but there is a paper in this year's OOPSLA (https://dl.acm.org/doi/10.1145/3720507) that explores this more fundamentally, without the architectural constraint of being compatible with all of the other decisions already made in GHC. Of course, one could add dominance scoping etc. to make an extension of SSA with more symmetry between producers and consumers like the classical sequent calculus.
> Call/ret instructions work really well with branch predictors. Lots of jumps (to continuations) might not work quite as well.
On x86, the use of paired call/return is conflated with usage of the stack to store activation records. On AArch64, indirect branches can be marked with a hint bit indicating that they are a call or return, so branch prediction can work exactly the same with activation records elsewhere.
There's an OOPSLA paper (referred to from the link starting that thread) from this year with 2 of the same authors that goes into more detail about using it as a compiler IR: https://dl.acm.org/doi/10.1145/3720507
It shouldn’t be difficult to write a binary translator to run 32-bit executables on a 64-bit userspace. You will take a small performance hit (on top of the performance hit of using the 32-bit architecture to begin with), but that should be fine for anything old enough to not be recompiled.
In some ways, Windows already does that too - the 32-bit syscall wrappers [switch into a 64-bit code segment](https://aktas.github.io/Heavens-Gate) so the 64-bit ntdll copy can call the 64-bit syscall.
Slightly out of my depth, but per Diaconescu's theorem [0] axiom of choice implies the law of the excluded middle. Does that make Lean non-constructive?
Yes, but for an even more immediate reason. The axiom of choice itself is a non-constructive axiom. It asserts the existence of the inhabitant of a certain type, without in any way furnishing the inhabitant.
But Lean sequesters off the classical logic stuff from the default environment. You have to explicitly bring in the non-constructive tools in order to access all this. So, actually, I would disagree with GP that Lean's type system includes a choice axiom.
Right, I was about to comment the same thing that "Lean" does not itself assume choice. Mathlib4 does, and Lean4 is designed so that all the built-in stuff is _consistent_ with choice. But you can do purely constructive mathematics in Lean and it's is even designed so that any non-propositions depending on choice have to be annotated "noncomputable", so the walled garden of choice is baked in at a language level. Even within "noncomputable" code, choice still shows up in the axiom list if used.