The tooling takes a little getting used to but it’s extremely powerful. Here are a few benefits you’ll see -
UCM keeps a perfect incremental compilation cache as part of its codebase format, so you’re generally never waiting for code to build. When you pull from remote, there’s nothing to build either.
Pure tests are automatically cached rather than being run over and over.
Switching branches is instantaneous and doesn’t require recompiling.
Renaming is instantaneous, doesn’t break downstream usages, and doesn’t generate a huge text diff.
All code (and code diffs) are hyperlinked when rendered, supporting click through to definition.
I don’t know if you saw these getting started guides, they might be helpful -
You can come by the Discord (https://unison-lang.org/discord) if you have any questions as you’re getting going! I hope you will give it a shot and sorry for the trouble getting started. There are a lot of new ideas in Unison and it’s been tricky to find the best way to get folks up to speed.
The Unison website and docs are all open source btw -
You can gradually adopt Unison, it's not all or nothing. It's true that when programming in Unison, you use Unison's tooling (which is seriously one of the best things about it), but there are lightweight ways of integrating with existing systems and services and that is definitely the intent.
We ourselves make use of this sort of thing since (for instance) Unison Cloud is implemented mostly in Unison but uses Haskell for a few things.
There can be enormous value in creating multiple pieces of tech all designed to work really well together. We've done that for Unison where it made sense while also keeping an eye on ease of integration with other tech.
I think that the messaging around this is going to be pretty important in heading off gut-reaction "it's all or nothing locked in to their world" first takes. It's probably attractive marketing for things to be aimed at "look how easy it easy to use our entire ecosystem", but there's a risk to that too.
That depends. What are you wanting to accomplish more broadly with the integration?
I'll mention a couple things that might be relevant - you could have the git repo reference a branch or an immutable namespace hash on Unison Share. And as part of your git repo's CI, pull the Unison code and compile and/or deploy it or whatever you need to do.
There's support for webhooks on Unison Share as well, so you can do things like "open a PR to bump the dependency on the git repo whenever a new commit is pushed to branch XYZ on Unison Share".
Basically, with webhooks on GH and/or Unison Share and a bit of scripting you can set up whatever workflow you want.
Feel free to come by the Discord https://unison-lang.org/discord if you're wanting to try out Unison but not sure how best to integrate with an existing git repo.
It's open source, you can create a free account with GitHub OAuth, and you can push projects there and collaborate on them, open PRs, publish releases, etc. It's very quick to pick up if you're already familiar with GitHub.
Unison code is published on https://share.unison-lang.org/ which is itself open source (it's a Haskell + postgres app), as is the language and its tooling. You can use Unison like any other open source general-purpose language, and many people do that. (We ourselves did this when building Unison Cloud - we wrote Unison code and deployed that within containers running in AWS.)
The cloud product is totally separate and optional.
Maybe we'll have a page or a reference somewhere to make the lines more clear.
yeah, unison cloud is like the "heroku for functions" if you wanna not think about how deployments work. But you can just run unison programs standalone or in a docker container or whatever: https://www.unison-lang.org/docs/usage-topics/docker/
For interesting usage - we built Unison Cloud (a distributed computing platform) with the Unison language and also more recently an "AWS Kinesis over object storage" product. It's nice for distributed systems, though you can also use it like any other general-purpose language, of course.
In terms of core language features, the effect system / algebraic effects implementation is something you may not have seen before. A lot of languages have special cases of this (like for async I/O, say, or generators), but algebraic effects are the uber-feature that can express all of these and more.
I think Alvaro's at the Unison conference was a pretty cool demonstration of what you can do with the style of algebraic effects (called "abilities" in unison)
He implements an erlang style actor system, and then by using different handlers for the algebraic effects, he can "run" the actor system, but also optionally make a live diagram of the actor communications.
I've been following Unison for a long time, congrats on the release!
Unison is among the first languages to ship algebraic effects (aka Abilities [1]) as a major feature. In early talks and blog posts, as I recall, you were still a bit unsure about how it would land. So how did it turn out? Are you happy with how effects interact with the rest of the language? Do you like the syntax? Can you share any interesting details about how it's implemented under the hood?
> Unison is among the first languages to ship algebraic effects (aka Abilities [1]) as a major feature. In early talks and blog posts, as I recall, you were still a bit unsure about how it would land. So how did it turn out?
No regrets! This Abilities system is really straightforward and flexible. You find yourself saying that you don't miss monads, if you were already of the FP affiliation. But were glad that it means that you don't have to understand why a monad is like a burrito to do FP.
> Do you like the syntax?
So this one is very loaded. Yes we LOVE the syntax and it is very natural to us, but that is because most of us that are working on the language had either already been fluent in haskell, or at least had gotten to at least a basic understanding to the point of of "I need to be able to read these slides". However we recognize that the current syntax of the language is NOT natural to the bulk of who we would like to be our audience.
But here's the super cool thing about our language! Since we don't store your code in a text/source code representation, and instead as a typechecked AST, we have the freedom to change the surface syntax of the language very easily, which is something we've done several times in the past. We have this unique possibility that other languages don't have, in that we could have more than one "surface syntax" for the language. We could have our current syntax, but also a javascript-like syntax, or a python-like syntax.
And so we have had lots of serious discussions recently about changing the surface syntax to something that would be less "weird" to newcomers. The most obvious one being changing function application from the haskell style "function arg1 arg2" style to the more familier "c?" like style of "function(arg1, arg2)". The difficulties for us will be trying to figure out how to map some of our more unique features like "what abilities are available during function application" onto a more familiar syntax.
So changing the syntax is something that we are seriously considering, but don't yet have a short term plan for.
What is the data you actually store when caching a successful test run? Do you store the hash of the expression which is the test, and a value with a semantics of "passed". Or do you have a way to hash all values (not expressions/AST!) that Unison can produce?
I am asking because if you also have a way to cache all values, this might allow to carry some of Unison's nice properties a little further. Say I implement a compiler in Unison, I end up with an expression that has a free variable, which carries the source code of the program I am compiling.
Now, I could take the hash of the expression, the hash of the term that represents the source code, i.e., what the variable in my compiler binds to, and the hash of the output. Would be very neat for reproducibility, similar to content-addressed derivations in Nix, and extensible to distributed reproducibility like Trustix.
I guess you'll be inclined to say that this is out of scope for your caching, because your caching would only cache results of expressions where all variables are bound (at the top level, evaluating down). And you would be right. But the point is to bridge to the outside of Unison, at runtime, and make this just easy to do with Unison.
Feel free to just point me at material to read, I am completely new to this language and it might be obvious to you...
Yes, we have a way of hashing literally all values in the language, including arbitrary data types, functions, continuations, etc. For instance, here, I'm hashing a lambda function:[1]
> crypto.hash Sha3_256 (x -> x + 1)
⧩
0xs704e9cc41e9aa0beb70432cff0038753d07ebb7f5b4de236a7a0a53eec3fdbb5
The test result cache is basically keyed by the hash of the expression, and then the test result itself (passed or failed, with text detail).
We only do this caching for pure tests (which are deterministic and don't need to be re-run over and over), enforced by the type system. You can have regular I/O tests as well, and these are run every time. Projects typically have a mix of both kinds of tests.
It is true that you can only hash things which are "closed" / have no free variables. You might instead hash a function which takes its free variables as parameters.
Overall I think Unison would be a nice implementation language for really anything that needs to make interesting use of hashing, since it's just there and always available.
Thank you! (And thanks for following along for all the years!)
I'll speak a bit to the language audience, and others might weigh in as they see fit. The target is pretty broad: Unison is a general-purpose functional language for devs or teams who want to build applications with a minimal amount of ceremony around writing and shipping applications.
Part of the challenge of talking about that (the above might sound specious and bland) is that the difference isn't necessarily a one-shot answer: everything from diffing branches to deploying code is built atop a different foundation. For example, in the small: I upgraded our standard lib in some of my projects and because it is a relatively stable library; it was a single command. In the large: right now we're working on a workflow orchestration engine; it uses our own Cloud (typed, provisioned in Unison code, tested locally, etc) and works by serializing, storing, and later resuming the continuation of a program. That kind of framework would be more onerous to build, deploy, and maintain in many other languages.
Really cool project. To be honest, I think I don't fully understand the concept of a content addressed language. Initially I thought this was another BEAM language, but it seems to run on its own VM. How does Unison compare to BEAM languages when it comes to fault tolerance? What do you think is a use case that Unison shines that Erlang maybe falls short?
Erlang is great and was one inspiration for Unison. And a long time ago, I got a chance to show Joe Armstrong an early version of Unison. He liked the idea and was very encouraging. I remember that meant a lot to me at the time since he's a hero of mine. He had actually had the same idea of identifying individual functions via hashes and had pondered if a future version of Erlang could make use of that. We had a fun chat and he told me many old war stories from the early days of Erlang. I was really grateful for that. RIP, Joe.
Re: distributed computing, the main thing that the content-adressed code buys you is the ability to move computations around at runtime, deploying any missing dependencies on the fly. I can send you the expression `factorial 4` and what I'm actually sending is a bytecode tree with a hash of the factorial function. You then look this up in your local code cache - if you already have it, then you're good to go, if not, you ask me to send the code for that hash and I send it and you cache it for next time.
The upshot of this is that you can have programs that just transparently deploy themselves as they execute across a cluster of machines, with no setup needed in advance. This is a really powerful building block for creating distributed systems.
In Erlang, you can send a message to a remote actor, but it's not really advisable to send a message that is or contains a function since you don't know if the recipient has that function's implementation. Of course, you can set up an Erlang cluster so everyone has the same implementation (analogous to setting up a Spark cluster to have the same version of all dependencies everywhere), but this involves setup in advance and it can get pretty fragile as you start thinking about how these dependencies will evolve over time.
A lot of Erlang's ideas around fault tolerance carry over to Unison as well, though they play out differently due to differences in the core language and libraries.
Mobile or browser clients talk to a Unison backend services over HTTP, similar to any other language. Nothing fancy there.[1]
> sending code over the network to be executed elsewhere feels like a security risk to me?
I left out many details in my explanation and was just describing the core code syncing capability the language gives you. You can take a look at [2] to see what the core language primitives are - you can serialize values and code, ask their dependencies, deserialize them, and load them dynamically.
To turn that into a more industrial strength distributed computing platform, there are more pieces to it. For instance, you don't want to accept computations from anyone on the internet, only people who are authenticated. And you want sandboxing that lets you restrict the set of operations that dynamically loaded computations can use.
Within an app backend / deployed service, it is very useful to be able to fork computations onto other nodes and have that just work. But you likely won't directly expose this capability to the outside world, you instead expose services with a more limited API and which can only be used in safe ways.
[1] Though we might support Unison compiling to the browser and there have already been efforts in that direction - https://share.unison-lang.org/@dfreeman/warp This would allow a Unison front end and back end to talk very seamlessly, without manual serialization or networking
Not a dumb question at all! Unison's type system uses Abilities (algebraic effects) for functional effect management. On a type level, that means we can prevent effects like "run arbitrary IO" on a distributed runtime. Things that run on shared infrastructure can be "sandboxed" and prevented with type safety.
The browser or mobile apps cannot execute arbitrary code on the server. Those would typically call regular Unison services in a standard API.
I'm curious about how the persistence primitives (OrderedTable, Table, etc) are implemented under the hood. Is it calling out to some other database service? Is it implemented in Unison itself? Seems like a really interesting composable set of primitives, together with the Database abstraction, but having a bit of a hard time wrapping my head around it!
Hey there! Apologies for not getting to you sooner. The `Table` is a storage primitive implemented on top of DynamoDB (it's a lower-level storage building block - as you've rightfully identified; these entities were made to be composable, so other storage types can be made with them). Our `OrderedTable` docs might be of interest to you: they talk about their own implementation a bit more (BTrees); and `OrderedTable` is one of our most ergonomic storage types: https://share.unison-lang.org/@unison/cloud/code/releases/23...
The Database abstraction helps scope and namespace (potentially many) tables. It is especially important in scoping transactions, since one of the things we wanted to support with our storage primitives is transactionality across multiple storage types.
Congrats on 1.0! I've been interested in Unison for a while now, since I saw it pop up years ago.
As an Elixir/Erlang programmer, the thing that caught my eye about it was how it seemed to really actually be exploring some high level ideas Joe Armstrong had talked about. I'm thinking of, I think, [0] and [1], around essentially a content-addressable functions. Was he at all an influence on the language, or was it kind of an independent discovery of the same ideas?
I am not 100% sure the origin of the idea but I do remember being influenced by git and Nix. Basically "what if we took git but gave individual definitions a hash, rather than whole working trees?" Then later I learned that Joe Armstrong had thought about the same thing - I met Joe and talked with him about Unison a while back - https://news.ycombinator.com/item?id=46050943
Independent of the distributed systems stuff, I think it's a good idea. For instance, one can imagine build tools and a language-agnostic version of Unison Share that use this per-definition hashing idea to achieve perfect incremental compilation and hyperlinked code, instant find usages, search by type, etc. It feels like every language could benefit from this.
Hi there, and congrats on the launch. I've been following the project from the sidelines, as it has always seemed interesting.
Since everything in software engineering has tradeoffs, I have to ask: what are Unison's?
I've read about the potential benefits of its distributed approach, but surely there must be drawbacks that are worth considering. Does pulling these micro-dependencies or hashing every block of code introduce latency at runtime? Are there caching concerns w.r.t. staleness, invalidation, poisoning, etc.? I'm imagining different scenarios, and maybe these specific ones are not a concern, but I'd appreciate an honest answer about ones that are.
There are indeed tradeoffs; as an example, one thing that trips folks up in the "save typed values without encoders" world is that a stored value of a type won't update when your codebase's version of the type updates. On its face, that should be a self-evident concern (solvable with versioning your records); but you'd be surprised how easy it is to `Table.write personV1` and later update the type in place without thinking about your already written records. I mention this because sometimes the lack of friction around working with one part of Unison introduces confusion where it juts against different mental models.
Other general tradeoffs, of course, include a team's tolerance for newness and experimentation. Our workflow has stabilized over the years, but it is still off the beaten path, and I know that can take time to adjust to.
I hope others who've used Unison will chime in with their tradeoffs.
These seem to be mostly related to difficulties around adapting to a new programming model. Which is understandable, but do you have examples of more concrete tradeoffs?
For example, I don't think many would argue that for all the upsides a functional language with immutable state offers, performance can take a significant hit. And if can make certain classes of problems trickier, while simplifying others.
Surely with a model this unique, with the payoffs come costs.
Unison does diverge a bit from the mainstream in terms of its design. There's a class of problems around deploying and serializing code that involve incidental complexity and repetitive work for many dev teams (IDLs at service boundaries and at storage boundaries, provisioning resources for cloud infrastructure) and a few "everyday programming" pain points that Unison does away with completely (non-semantic merge conflicts, dependency management resolution).
To what Unison is actually compiled to and how is it ran? Gemini LLM says it compiles down to Scheme with is then ran Chez Scheme, how much of an LLM hallucination is that?
That information is a bit out of date, though correct at the time. We've put the Chez Scheme interpreter on ice, and we focused on runtime improvements to the Haskell interpreter. So, currently, Unison compiles to Haskell.
I know how fraught performance/micro-benchmarks are. But do you have any data on how performant it is? Should someone expect it to perform similar to Haskell?
By and large, our CEO did, but the website content is open source and has been iterated on by many hands over the years. If you have suggestions, feel free to drop a note in a ticket: https://share.unison-lang.org/@unison/website
aha yeah! good question! We have two different types of type declarations, and each has its own keyword: "structural" and "unique". So you can define two different types as as
structural type Optional a = Some a | None
structural type Maybe a = Just a | Nothing
and these two types would get the same hash, and the types and constructors could be used interchangeably. If you used the "unique" type instead:
unique type Optional a = Some a | None
uniqte type Maybe a = Just a | Nothing
Then these would be totally separate types with separate constructors, which I believe corresponds to the `BRANDED` keyword in Modula 3.
Originally, if you omitted both and just said:
type Optional a = Some a | None
The default was "structural". We switched that a couple of years ago so that now the default is "unique". We are interestingly uniquely able to do something like this, since we don't store source code, we store the syntax tree, so it doesn't matter which way you specified it before we made the change, we can just change the language and pretty print your source in the new format the next time you need it.
How does the implementation of unique types works? It seems you need to add some salt to the hashes of unique type data, but where does the entropy come from?
There's an algorithm for it. The thing that actually gets assigned a hash IS a mutually recursive cycle of functions. Most cycles are size 1 in practice, but some are 2+ like in your question, and that's also fine.
Does that algorithm detects arbitrary subgraphs with a cyclic component, or just regular cycles? (Not that it would matter in practice, I don't think many people write convoluted mutually recursive mess because it would be a maintenance nightmare, just curious on the algorithmic side of things).
I don’t totally understand the question (what’s a regular cycle?), but the only sort of cycle that matters is a strongly connected component (SCC) in the dependency graph, and these are what get hashed as a single unit. Each distinct element within the component gets a subindex identifier. It does the thing you would want :)
The Unison language supports algebraic effects and optimizes handlers that call their continuation at most once in tail position (we call these "affine"), so you can have the best of both worlds. Some links at the end if you're curious.
Here are a few places where "multi-resumable" stacks are still useful, even outside of nondeterminism:
* For instance, in a workflow engine a la Temporal, a `sleep` primitive might serialize and store the continuation in a distributed priority queue. The workarounds of not having access to the continuation are all not nearly as good.
* A pure interpreter of a structured concurrency ability is quite useful for testing, since it can test different interleavings of threads and produce tests that fail or pass deterministically. For Unison Cloud's distributed programming API, we have an (in-progress) chaos monkey interpreter that you can use for local testing of distributed systems.
Basically, any time you want to stash and do something interesting with the continuation, even if you only end up ultimately using it once, you still need the more general form of algebraic effects.
And then there are various nondeterminism effects that do call the continuation more than once. I'd say these are somewhat niche, but when you need them, you need them, and the code comes out much nicer. I especially like it for testing. You generally want tests to just be the logic, not a bunch of looping code or map/flatMap.
If I understand your question, this would work much the same way as any other language. Suppose you have:
allocationPolicy = 23484
-- two usages of allocationPolicy
foo = allocationPolicy + 1
bar = allocationPolicy + 99
You then later realize you want different allocation policies to be used in different parts of your app. You first might want to rename the existing `allocationPolicy`:
Then `update` and you're done. The new version of `foo` references `fooAllocationPolicy` while `bar` continues to reference `defaultAllocationPolicy`.
A couple other notes -
* It's very rare for independent implementations to end up with the same hash. It probably only happens for some very simple defintions that exist in base. (Like the identity function, say)
* If a hash has multiple names in your project because you've used `alias.term` to do so explicitly, the pretty-printer picks one using a deterministic rule (it prefers names you've given that hash in your project, then it consults library dependencies). If you really want to give two definitions different hashes even though they are functionally the same, you can introduce a minor change, like an unused binding.
* The type of a definition is part of its hash, so sometimes you might specialize a more generic function with a more narrow type signature, and this gets its own hash.
* The OP is slightly out of date re: patches. We use something simpler now for updates and merges. When you update, we compare the new and old namespace to obtain a diff, which is applied to the ASTs in the namespace. If the result typechecks, you're done. If not, we make a minimal scratch file for you to get typechecking - it will contain the minimal transitive dependents of the change.
Thank you for the more detailed explanations. I was mainly wondering about the more philosophical concern of "unintended identical hashes" causing coupling between parts of the codebase with different purposes (which may want to evolve apart in the future), which you say is thankfully rare in practice.
For instance, say you have a function which generates a certain business report, and your boss wants you to fiddle with the formatting every quarter. Your colleague has a similar function to generate a report with the same data, but according to their own boss's quarterly formatting requirements. With content-based identity, it would seem like you have to be wary of your own and your colleague's reports ever aligning, lest they have the same hash and lose their distinct identities.
> If you really want to give two definitions different hashes even though they are functionally the same, you can introduce a minor change, like an unused binding.
Interesting. Is there at least any way to detect this condition before it occurs? (I.e., to know if you're trying to define a new function which happens to have the same hash as an existing one.)
I know what you mean with those other tools, but this doesn't happen in Unison. The reason those systems are somewhat flaky is that the cache of what's in memory can diverge from the "source of truth" which is a bag of constantly mutating text files. Maybe put another way, cache invalidation is hard in those systems.
When the source of truth is instead a database, content-addressed by hash, cache invalidation is simple - if the hash has changed, a cached result is invalid and needs recomputing. If the hash is the same, you're good. We use this approach in many places throughout Unison and it's quite robust.
The tooling takes a little getting used to but it’s extremely powerful. Here are a few benefits you’ll see -
UCM keeps a perfect incremental compilation cache as part of its codebase format, so you’re generally never waiting for code to build. When you pull from remote, there’s nothing to build either.
Pure tests are automatically cached rather than being run over and over.
Switching branches is instantaneous and doesn’t require recompiling.
Renaming is instantaneous, doesn’t break downstream usages, and doesn’t generate a huge text diff.
All code (and code diffs) are hyperlinked when rendered, supporting click through to definition.
I don’t know if you saw these getting started guides, they might be helpful -
https://www.unison-lang.org/docs/quickstart/
And then this tour -
https://www.unison-lang.org/docs/tour/
You can come by the Discord (https://unison-lang.org/discord) if you have any questions as you’re getting going! I hope you will give it a shot and sorry for the trouble getting started. There are a lot of new ideas in Unison and it’s been tricky to find the best way to get folks up to speed.
The Unison website and docs are all open source btw -
https://share.unison-lang.org/@unison/website