Hacker Newsnew | past | comments | ask | show | jobs | submit | kikimora's commentslogin

To me being able to query over psql is secondary. I’m fine with any SQL. What is very important is being able to transform the data to better suite analytical queries. That is, define custom transformations, define how data sectioned and what indices available.

IDK, AWS Zero ETL from Autora into Redshift really helped us at some point. You right that data transformation is very limited if not possible. But having data in an analytical store, being able to experiment with queries, understand what is wrong with your OLTP schema and then build ETL is way better than doing an upfront design.

Of course it is. What you describe is one of the reasons that ELT became popular, if you couple it with a variant type and schema on read, you have a very powerful and flexible architecture.

But there’s no free lunch, building and maintains data infrastructure that is reliable requires work. Many companies don’t realise that when they start their analytical journey and aggressive marketing doesn’t help. That’s the point I was trying to make.


I don’t disagree, just placing emphasis on a different aspect.

In an ideal world there is a tool that moves your schema into an analytical store “as is” with a single click. Then the same tool lets you add arbitrary transformations of the data. Surprisingly I have not come across such a tool. It is earthier “one click to move your data” or “any transformation you want” but only after a significant upfront investment :(


I think I didn’t articulate myself very well on my reply. I actually wanted to say that I agree with you and emphasise again the need for educating users for the complexity of these projects.

What you describe has been pitched by many different products for different parts of the data platform. Fivetran for example claims to do that for the extraction and loading part, good old Informatica was offering the ETL in a graphical interface etc.

The problem that many teams ended up having is the explosion of the tooling needed by data teams.


While being widely criticized I don’t get it > Physical controls for temperature and fan speed.

I have set temperature to 21 C in Tesla when I bought it and never changed since. Why would anyone frequently change a thermostat set temperature?


Because when it’s -20C outside and you’re wearing thick coat, 21C inside is a sauna. When it’s +30C , I don’t like it super freezing air in my face either.

Is this a theory or your personal experience? I lived in Russia for most of my life. Had 21 C set in the car even in -20 C. If the coat is too hot then you just put it off.

One reason why you want warm air in the car is defrosting your windows.


I think this is the glimpse of what to come - https://github.com/oven-sh/bun/issues/31463

Why people say they have big test coverage while other say it segfaults a lot? I saw lot of JS test that this is just API surface test. Edge cases happens inside API implementation, things like memory leaks or data corruption that shows up after a while cannot be catched reliably with these tests.

What is significance of this?

Used quite a bit by stock exchanges to ensure consumers and publishers have a reasonably aligned time.

it is useful e.g. to align the phase of signals being sent from different locations

Distributed systems spend most of their effort on one problem: agreeing on the order of events across machines. Without synchronized physical clocks you have two options. Logical clocks (Lamport, vector) give you causal order but not wall-clock truth, so you can’t answer “did A really happen before B” for events that don’t have a happens-before relationship. Or you run consensus, which gives total order but costs round trips. At geographic scale that’s tens of milliseconds per decision, and the floor is set by the speed of light.

Tight clock sync collapses this. If clock uncertainty ε is small and bounded, you can timestamp a write, wait ε, and trust the global order without talking to anyone. Spanner’s external consistency works because TrueTime’s ε was a few milliseconds, so commit-wait was tolerable. The latency cost of planet-scale serializability stops depending on how far apart your replicas are and starts depending on how good your clocks are.

That’s the real significance. Time sync converts a coordination problem (bounded by physics) into a local computation (bounded by clock quality). Spanner proved this is possible but required GPS receivers and atomic clocks in every datacenter, which kept the capability inside Google for years. White Rabbit-class sync pushes ε from milliseconds toward sub-nanoseconds over commodity Ethernet hardware, and it’s now in IEEE 1588 as a standard PTP profile. If sub-nanosecond sync becomes baseline network infrastructure, the long-held assumption that strong consistency has to be slow at geographic scale stops holding, and a meaningful chunk of what databases currently work around (HLCs, weak isolation defaults, application-level reconciliation) becomes unnecessary.


Very good explanation and interesting take on the 'humanity scale' or internet scale significance. I work on a phased array system so significance of white rabbit for me was always sample alignment. Assumed CERN had a similar use case of needing to order (sensor data of) physical events happening far apart.

But if we imagine the vast majority of internet and telecom infrastructure is also implemented this way, we can reason about information over time in general. Makes me think of 'earth is a big computer' type of sci fi trope. Neat!


Indeed, time synchronization across detectors is always tricky. Distributed clocks get messy at ATLAS dimensions. WR allows to distribute pretty good time sync over large detector systems. Sometimes still not good enough though. Time-of-flight detectors try to get to single-digit ps level, and almost by definition, you have to synchronize two detectors that are some distance apart.

In other words you can use time as your TX id, add MVCC and now you can transactionally read data from multiple partitions/shards. In a traditional distributed DBMS it would require a global tx manager creating a bottleneck. Did I get it right?

I don’t see why sub-nanosecond sync is useful for Spanner-style ordering. Your average server is more than 1 light-ns wide! Your average cable from server to TOR switch is several light-ns long!

Awesome explainer, thanks for that

Bun never was great in terms of stability. It has been vibe coded for 6 month but code was reviewed by a person.

>It already has been proven that LLM's can maintain such codebases.

Proven is a strong word. In my experience AI fails miserably at anything beyond junior level tasks. We will see soon, once bun goes into production.


> Bun never was great in terms of stability

It's very easy to throw shade like this on software if you've got a bugbear with it. I'm sure you can even come up with a bunch of these "stability" problems when challenged on it. I know I could, for basically any large piece of software that I've ever used.

But really, is bun worse in this regard than any other similarly ambitious open source software within it's first few years?


see that's fine with me if they want to take a year or two of human time and do the rewrite properly

this is a piece of software with no architecture, and whose owners have no regard or respect for architecture. I can virtually guarantee that on average every bug they fix will create one new bug, because that's what it's like to work on software with no intentional architecture


What are you talking about?? Bun in Rust is a port, almost exactly the same code base on a different syntax. The architecture did not change at all. Amazing how people comment without even knowing what they are talking about.

Zig and Rust are significantly different languages. If bun has a good architecture in zig (which I don't know if it does or not), that doesn't necessarily mean it had a good architecture for rust. A direct translation of zig code would probably result in pretty unusual rust code, and probably a lot more unsafe usage than if it had been originally written in rust.

I don’t really understand this objection. For every tool that I use, am I supposed to divine the best underlying language for it and then determine whether or not it is written in that language? Don’t I have better things to do?

Because of borrow checker you would build data structures differently in Rust compared to Zig. Automated translation simply maps Zig constructs onto unsafe Rust code. I have no idea how feasible it is to go from totally unique way of using Rust to mimic Zig to idiomatic Rust.

I understand that. That’s a specific example of an inaptness moving from one language to another. That’s not what I’m talking about.

I am asking if we are expected to understand this hypothetical condition about all possible tools that we use. Should I have to worry that something is written in Python when it should’ve been written in C? It just seems like that in order to have a big concern here, I had to be really invested in what language Bun used. I guess the whole matter makes more sense if people are REALLY mad about something else and the choice of language is supposed to serve as a more respectable thing to be mad about.


I'm not saying that bun shouldn't be written in rust. I'm saying that since it was originally written in zig, there were undoubtedly architecture and design decisions that were made that made sense in zig, but not so much in rust. When rewriting something in a different language, especially one significantly different than the original it is common to need to re-architect some things, and mechanically translating line by line from one language is probably going to result in some low quality code, even if the original was decent.

I think that using AI to translate bun from zig to rust might produce a good starting point. But it was done one file at a time, with minimal human review, and I'm skeptical that the result is quality maintainable code.


I don't want to say that skepticism is unwarranted, but I'm not sure we apply this level of scrutiny with any kind of defensible evenness. I just can't think of a single open source project I use where I'm aware of their refactor cadence and practice. I'm just...not checking in on their feature branches and stuff, and I think most people aren't. I couldn't tell you if the uv maintainers work at 3AM while high on drugs or at 9:30 AM wearing FORTRAN blue ties.

I dunno. I think my sense here is that the bun maintainers did something shocking and dramatic using AI and people are shocked and dramatized. They're not WRONG to be so. But I don't know that the shock comes out of any generalized duty of care we have toward open source tooling. I think the uncomfortable point that bun has been releasing for 6 months with smaller AI code edits hasn't really been reckoned with. If we were actually this invested in what was happening, the migration would've begun months ago when it was clear they were using agents to ship code faster than they were willing to review.


What is that tool in relation to the rest of your workshop though? If it's a simple hammer that you can swap out for $20 and you only use it once a month, who cares what kind of metal it's made out of, as long as it works. But if the $6,000 4-axis CNC machine that's at the heart of your machine shop and every minute of downtime on it costs you money, if it's starting to rust, no, you don't have better things to do than to look into what it's made of.

Yeah what if the tool is a JavaScript runner released for free?

What is being expressed here about Bun is using the language of due diligence but doesn’t seem to adhere to any of the sensibilities. Should we all be auditing our toolchains to understand internal decisions that each toolmaker undertakes? Maybe! DO WE? Absolutely not. The level of scrutiny bun is getting is *unusual*. They just did an unusual and dramatic thing, so it’s not surprising. But I just don’t believe that bun is being deprecated due to normal engineering discipline that we are constantly carrying and applying everywhere. That’s…just hard to buy.


We're all only human, so the deprecation happening is gonna have some biasing going on due to how people feel about AI, yeah. JavaScript runners is a whole ecosystem separate from, say, Python. If Python went and said that the next version of Python, we're using this new AI transpiled runtime that we baked for nine days, people would also freak out "due to normal engineering discipline that we are constantly carrying and applying everywhere." The real question then is just how far off from normal is it? Maybe you don't do audits that you think you should be doing, but eg moving from Python 3.10 to 3.12 can be as rigorous or as yolo as you want. And those versions are old, too. Other ecosystems are going to take their time as well. LLVM isn't going to switch over to something AI banged out in nine days. They're going to have a long dragged out period where they run things in parallel for a while until everyone feels okay about it. If they'd even do something like that.

Very amazing indeed. Here you are making bold assumptions about a huge pile of code not a single human being has ever read in any meaningful amount.

The only assumption you need to make is how the process went about, which was described by Jarred on a HN comment when the PR was first discussed: they had prompt that described exactly how things should be translated, for each "pattern" they were using in Zig, an appropriate equivalent was described in Rust. Zig and Rust are not that different, both are system languages and things can be done similarly in both languages, so architecture-wise I would think the exact same thing would work fine. I am not sure whether the LLM actually wrote a transpiler which just followed the rules, or if it did the job itself, since that information is not public yet, as far as I know, but my guess is that the LLM wrote a transpiler to do the job, then reviewed/fixed compilation issues, then fixed tests. And I'm pretty sure some human interaction was part of that as well.

>The only assumption you need to make is how the process went about, which was described by Jarred

This is not how the process went. This is how Jarred thinks it went, a huge difference.

>my guess is that the LLM wrote a transpiler to do the job

My guess is different. I think one agent translated code, another compiled it, feeding errors back into translator to fix. Then last agent modifies code to fix tests. All governed by a set of md files.


So now you've gone from making assumptions to making up wild stories territory. Well, the commit granularity isn't that of transpiler passes, and more importantly, it's completely irrelevant to how the majority of the code hasn't even been read by anyone.

Nobody reviewed resulting code. Maybe all tests are empty and this is why they pass. Maybe tests were modified to pass because this is the only thing LLM could do to make them pass. Maybe it hallucinated something in the process. We have no idea.

We do have an idea, and it contradicts your guess: https://news.ycombinator.com/item?id=48133806

You suggest there is only JS tests that do not need a rewrite? This is crazier than I thought…

I took tests as an example. There are so many other things that can go wrong. Rust and Zig standard libraries may have different semantics not picked up by AI. Like one guarantees insertion order of a dictionary and other does not. Differences in how runtimes react to Linux signals, how they do file IO, etc.

If I were a Bun user I would be moving off from bun unless it has excellent test coverage (which I think it does not). During a normal release cycle I offered a small increment of functionality with small number of issues. Here I’ve been offered a complete rewrite, potentially having thousands of issues. I don’t want to be a guinea pig in this experiment.

I’m genuine curious how this will unfold.


I don’t think this is anyhow about fp. Constructive logic appears naturally in proofs and type systems where it is very useful. Also it is quite fascinating to me to learn that law of excluded middle can be omitted and still such logic yields useful results.

I also admit that the blogpost is lacking in many respects.


Original RSL library is 36k LoC. And this is C++. Rust should be like 50% smaller, that is, 18k LoC. This library is so big that I bet the author has no idea if it works or not. 1300 test generated by AI say nothing about actual quality.

In the end it is just a lot of unmaintainable code quickly generated by AI.


This is uncharitable, but makes a prediction. I imagine you'd bet the author won't be successfully using this, at MS/Uber or wherever they are, in a year time?

Rust makes no promise of being terser than C++, and RSL does less than this considering the optimization.

Also it's only 45/50k LOC so not so very from the 36k LOC.


Yes, I would bet it won't go anywhere.

The blog post mentioned the project is 130k LoC multiple times. Where 45/50k LoC comes from?

>Rust makes no promise of being terser than C++

True, but Rust has no header files, this alone is a great LoC saver.


50k LOC wouled be the rust code without tests.

But it's not apples to apples because they seem to have done much more performance work though, this is far from code golfing.


RSL’s 36k LoC includes tests and should be compared with 130k LoC, not 50.

Having 90k LoC of tests for 50k LoC codebase also a problem. At least in my experience LLM generate too many tests. It does not evolve test suite but throws more code into it as development happens. Unless I aggressively refactor tests I quickly end up with a test suite that I don’t understand. Then LLM modifies tests to “make code work” and I have no idea if this is a legit edit or LLM cheats. I wonder if the same thing is happening or about to happen with this codebase.


Has Rust code generally been found shorter than C++ in practice? I don't see an obvious reason for it.


I see no reason for Rust to be shorter than C++,. when using latest standards.


This is great example of AI slop and a big problem with AI coding.

Original RSL library has 36 KLoC across C++ source and headers files. Rust supposed to be more expressive and concise. Yet, AI generated 130k LoCs. I guess nobody understands how this code works and nobody can tell if it actually works.


All unit tests can pass if you don't assert anything. Just have to make sure to read through all 130k lines of code to check.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: