More

ActorNightly · 2026-06-09T20:21:25 1781036485

My general goto for tasks that are n level complex is to have the agent store summaries after every generation. I do this for interacting with websites - Ill sit there and type text for the agent to correctly inject js to do something on a website, and every iteration is asyncronously writes a history in a background thread of what it has done and what the result was. On every invocation, it injects that context.

ActorNightly · 2026-06-09T00:32:06 1780965126

No, it reflects the nature of misunderstanding Python by people who think their system is better, have no idea how Python in production actually works, and just publish things like the article to make themselves feel better.

Typing is not a huge issue, period. In Python, if you pass a wrong type to something, program just throws exceptions. Exceptions are not the end of the world like people make it seem. Functionally, finding errors during the process of taking code and compiling it with type checking is no different than taking code and just running it against a set of tests, which every production code has (or should have)

The only waytyping ever saves you from it is by being absolutely strict - every type defined has a finite range of values, and every operation has bounded domain and range. I.e if you have a string field, its not enough that its a string, you also must define the total number of characters that string can have, and values for each character, along with more complex rules on sequences of characters.

If you have this system, (something like Coq comes close), then if your program compiles, its by definition correct. But even the strongest proponents of typing don't really want to do this, because they realize how long it would take to write code.

The simple truth is that Python is easy and flexible enough to work in that you don't even need type checking. An LLM can effectively function as a type checker for you if you care enough. For any errors that you encounter due to lack of typing, its ultimately way faster to fix with Python than it is to spend time writing strongly typed language.

ActorNightly · 2026-06-09T00:24:08 1780964648

This is probably the most intellectualism ive seen anyone put into a comment that is so very, very, obviously wrong.

Yeah, in the age of AI where the whole goal is to not have to think, type as fast as you can with misspellings, and copy paste stuff without thinking, its TOTALLY a better system to worry about the types of whatever you are feeding into llms.

VoidWarranty · 2026-06-09T01:14:42 1780967682

Its about limiting surface area.

The gymnastics people are putting their ops teams through in order to validate oceans of generated slop is insane. Just use Rust and half of that work goes away.

ActorNightly · 2026-06-09T19:14:26 1781032466

Please stop trying to make Rust happen. You guys are reaching SO hard now.

There is no way that Rust will be faster than simply specifying tests for an agent to run after it has generated code.

VoidWarranty · 2026-06-09T23:46:56 1781048816

If I had said c++ would that have triggered you less?

ActorNightly · 2026-06-09T00:14:57 1780964097

I absolutely adore the historical revisionism that apple cares about privacy.

Run your router through a linux laptop as a proxy so you can capture traffic, connect any apple device to your router, and see the vasts amount of data your device sends to apple.

Apple DGAF about privacy, they want your data as much as anyone else, their only thing is that they should be the only ones to get it and then other people have to pay them for it, rather than your device sending the data to the 3d party directly.

And if you think your data is secure, reminder that The Fappening was all done targeting apple devices.

Cider9986 · 2026-06-09T03:03:17 1780974197

Apple added e2ee and created the most complete end to end encrypted cloud ecosystem to prevent that from happening again.

ActorNightly · 2026-06-09T19:13:01 1781032381

Do you trust them to actually implement that correctly, and also do you trust them not to share your data with others, and if so, why?

Cider9986 · 2026-06-09T20:57:09 1781038629

I don't use it, but yes I think that it works because they have everything to lose and nothing to gain. They definitely could share whatever data they have with whoever, which is why you use e2ee.

At the end of the day you are trusting Apple with pretty much everything if you use this service because they make most of your phone, the entire operating system, host the update servers, etc.

DANmode · 2026-06-09T04:01:12 1780977672

Is that secret code for “rate-limited auth on the Find My API”?

DANmode · 2026-06-09T20:50:49 1781038249

I’m not breaking NDA if I never signed one/was nowhere near the company or situation, you guys can fuck yourselves lol ^_^

ActorNightly · 2026-06-08T23:37:09 1780961829

Very false.

I use small models exclusively. They aren't a replacement for large models. You need decent hardware to run those models efficiently, as smaller parameter models plain suck and are still slow on macbooks. And affordability of higher end hardware is very limited.

Even at non VC subsidized $/token prices, its still much cheaper to run cloud based models.

dvt · 2026-06-09T00:21:41 1780964501

> Even at non VC subsidized $/token prices, its still much cheaper to run cloud based models.

On a price-per-wattage level, this is not true, people have done the math on /r/LocalLLaMA many times over[1]. Local models, while not as good as premier models (GPT 5.5, etc.), are like ~80%+ of the way there, and often converge to a similar solution after a few dead ends.

[1] https://www.reddit.com/r/LocalLLM/comments/1kshq4f/electrici...

fwip · 2026-06-09T00:43:32 1780965812

Maybe not per watt, but unless you already happen to own a 3900 cited by that post, you'd have to buy that as well, which is currently selling for around $1400 used.

strictnein · 2026-06-09T01:07:00 1780967220

3090s are running $1400 now? Wowsers. I thought I was overspending when I bought 6x of them for around $800 a pop.

Might be time to sell, to be honest. It's fun to have that at home, but I can't justify having $10k (with memory, mobo, cpu, etc) sitting in my basement without being fully utilized.

karim79 · 2026-06-09T02:06:52 1780970812

I'll take two of them. A thousand a piece.

dvt · 2026-06-09T00:49:25 1780966165

I do have a 3090 Ti on my gaming PC, but even my old M1 MBP (with a mere 32gb of RAM) is quite competent and can run a quantized `Gemma4-26B-A4B` in the background while I do other stuff.

ActorNightly · 2026-06-09T09:45:31 1780998331

The MBP running Gemma4 is absolutely is useless for any real work.

nozzlegear · 2026-06-09T16:27:32 1781022452

What is "real work"?

ActorNightly · 2026-06-09T19:11:08 1781032268

Where you are developing software. Its significantly faster to use google gemini and copy paste code back and forth compared to having gemini edit files for you.

ClikeX · 2026-06-09T07:53:29 1780991609

To be fair, I can also use that 3900 for other things locally. Not just AI.

davnicwil · 2026-06-08T23:44:59 1780962299

well to be fair that's right now, I think the question is what about in 6 months, 12 months, 2 years?

Where do these improvement curves go? Does the gap close, do they intersect for practical purposes (factoring in cost etc)? Or is the local curve always just a translation of the hosted, lagging behind, or indeed does hosted just pull ahead?

Nobody knows, but it's a very open question I feel, and it certainly appears like the answer might quite reasonably be that yes they intersect on that kind of short-ish term time horizon.

ActorNightly · 2026-06-09T00:12:09 1780963929

>Where do these improvement curves go?

Nowhere.

Large models haven't seen that much improvement, just small unique tasks performance which is all special cased RLed to game metrics

For local models, its the same story. You can download Gemma 3 QAT from last year, and it will be just as good as Gemma:31b on the average. Qwen also boasts that its better, because again, they RLed it to game some metrics. Its better in coding then Gemma, but Gemma is better in more creative thinking (again, all RL)

Fundamentally, you need detail in the gradients for the models to pick up on the smaller details. If you don't have those, your output is gonna suck. No amount of clever architecture is going to fix this.

The only way to improve local models by training them to fetch context, and then their job becomes much simpler because all they need to do is reinterpret the fetched content and provide an answer. But fundamentally, if you are trying to keep things in house for advertising purposes like what all companies do with search, you want them to go to your service, which means running on your servers. And its not really that much extra per invocation (i.e excluding initial hardware costs) to instead just offer a large model as a service, which will be way better than any small models.

iwontberude · 2026-06-09T12:52:14 1781009534

Just need a decent Mac Studio and they are plentiful in used condition and affordable.

ActorNightly · 2026-06-06T22:02:41 1780783361

>I can't help but think that there's got to be a better mechanism

There is.

Transformers are basically autoencoders on the decode step - they take a compressed set of information and expand it into a 3 matrices which then get combined back into one matrix.

You can unroll the entire self attention step into fully connected layers, just with a lot of zeros for things that don't get multiplied together.

So it stands to reason that there is probably an optimal form of weights that does the same thing as current transformers.

ActorNightly · 2026-06-06T21:41:43 1780782103

Memory is just one part. AMD has had offerings competitive to NVIDIA for quite some time, but nobody uses AMD cards.

The biggest advantage with NVIDIA is CUDA.

overfeed · 2026-06-07T00:53:41 1780793621

> but nobody uses AMD cards

AMD is selling every MI card it makes, and the market wants more of them.

ActorNightly · 2026-06-07T20:19:57 1780863597

They are only selling because Nvidia is hard to get, and something is better than nothing.

ActorNightly · 2026-06-06T02:16:06 1780712166

>I really wish I knew an equivalent for Linux

I highly encourage you to vibecode something. Its really easy. You can get a small fast library that can do OCR with coordinates, and the rest is just interfacing with the x server to draw stuff over the top.

NateEag · 2026-06-07T18:53:04 1780858384

Thanks for the suggestion.

As one who's driving Claude daily due to corporate mandates, I can see why people fall in love with genAI coding, but my revulsion has only grown as I've learned to do it, so I won't be spending my free time with LLMs.

ActorNightly · 2026-06-03T21:57:29 1780523849

Don't think its that.

Basically with upcoming spark laptops, the smaller models will likely get fine tuned to interface with google services. Then, Google can essentially make Chromebook software include those models, which is the same use case as android.

And you better believe that they will be collecting user data and building advertising models.

ActorNightly · 2026-06-03T21:54:03 1780523643

Nope, lol.

Large models still are quite far ahead, don't be fooled that even Gemma:31b (which is better than the 12b overall) is anywhere close to big models.

There is definitely room for optimization, but fundamentally, for complex tasks, you need visible small gradients for accuracy that allow the model to be trained on (and consequently be followed during inference). For example, if you specify in instructions not to write code but ask coding question, Gemma will still write code. Whereas Gemini/Claude will pick up on that and follow your instructions better.

mitkebes · 2026-06-04T00:14:15 1780532055

It doesn't matter if Large models are undeniably better, if a local model is "good enough" to handle the task. With API costs ramping up, I think a lot of companies are going to want to look into what can be run locally instead, possibly only using larger models when the local models fall short.

dzhiurgis · 2026-06-05T02:45:05 1780627505

"good enough" is a moving target. 3 years ago good enough was gpt3 and copy pasting code.