> And unlike HTTP, most of the other application-layer protocols we use today don't have anything like a standardized way to communicate "user-imposed constraints on how they want you to process their request, while still giving the same result".
This is the wrong approach and what you really want is for LLMs to instead have access to a palette of pre-vetted bugtested commands and options implemented by other applications.
ie think like those python embeds in OpenAI… but instead of building a python script, you should be building an Ansible playbook or a MacOS shortcut that does the task, rather than an LLM banging together shell code directly.
Things like file access or web request etc are just more primitives in this model. Don’t want it to call out to the web? Don’t give it access to the web request primitives. Like this literally has been a solved problem for 30 years - macOS intents and windows COM interfaces allow applications to expose these capabilities in a way that can be programmatically interfaced by other code to build up scripts.
This is HyperCard-era stuff, Unix just won so thoroughly that people don’t consider these capabilities, and everyone assumes everything has to be a single giant unshaped command of CLI garbage piped together.
The Unix mindset is not the right one for LLMs working at a token level. The idiom you need to mimic is MacOS intents or Ansible actions… or at least powershell actions. The amount of unconstrained shell-scripting involved needs to be minimized rigorously. Every time you do it it’s a risk, so why make it write anything more complex than glue code or composable YAML commands?
You're misunderstanding what I was trying to describe.
I wasn't talking about the direct use-case at hand (iTerm), because iTerm communicates directly with ML APIs. But such a "request processing constraint" would not be for use in communicating with ML APIs.
If the client application is directly communicating with an ML API, then the technical measure — a group-policy flag that tells the app to just not do anything that would communicate with those APIs; or further, which alters the app's sandbox to forcibly disable access to any/all known ML API servers on the OS-network-stack level — is enough to prevent that.
Rather, what I was referring to by a "request processing constraint", is for the other case — the case that can't be addressed through purely technical measures. And that's the case where a client application is making a request or syncing its data to a regular, non-ML business-layer server; where that backend then might decide to make requests to third-party ML APIs using the client's submitted data.
Think: the specific REST or gRPC API that most proprietary "service client" apps are paired with; or an arbitrary SMTP server for your email client; or the hard-coded AMQP broker, for an IoT device that relies on async data push to a remote message-queue; or the arbitrary S3-compatible object-store endpoint for your `rclone` or `pgbackrest` invocation.
It is for these systems, that you'd need a way for the client to tell the backend system it is communicating with, that the client is operating under a regime where third-party ML is not to be used.
The spirit of such a "request processing constraint" being communicated to such a backend system, would be: "don't allow any of the data being carried by this request, once it arrives at your [regular non-ML business-layer] backend system, to then be sent to any third-party ML APIs for further processing. Not in the process of resolving the request, and not later on for any kind of asynchronous benefit, either. Skip all those code-paths; and tag this data as tainted when storing it, so all such code-paths will be skipped for this data in the future. Just do what processing you can do, locally to your own backend, and give me the results of that."
(Ideally, there would also be some level of legal regulation, requiring vendors to at least put their "best effort attempt" into creating and maintaining a fallback backend-local [if not client-side!] implementation, for people who opt out of third-party ML processing — rather than just saying "the best we can do is nothing; your data goes unprocessed, and so the app just does nothing at all for you, sorry.")
---
You're also misunderstanding the misapprehensions that the people in the comments section of this post are having to data being passed to ML models.
Most of the people here who are having a visceral reaction to iTerm including ML capabilities, aren't thinking through the security concerns of this specific case of iTerm generating a shell script. I do understand that that's what you were trying to address here by talking about high-level capability-based APIs; but it wasn't what I was trying to suggest fixing, because "iTerm could generate a dangerous shell script" isn't what anyone was really complaining about here.
Rather, what people here were doing, was echoing a cached-thought response they had already long decided on, to the use or capture of their data by third-party ML-model service APIs generally.
The things that make people reject the idea of applications integrating with third-party ML APIs generally, are:
1. that the model would "do its job", but produce biased or censored results, creating unexpected "memory holes" in the resulting data (for recognition / transformer models), or tainting their processed data with arbitrary Intellectual Property captured from others (for generative models); or
2. that the ML company being sent your data would have Terms of Service (that the vendor of the product or service you're using agreed to, but which you never did) that any data submitted to them in a request can be used for model-training purposes. (And where, legally, due to the Terms of Service you consented to with the vendor, whatever minimal right they have over "your data" makes it "their data" just enough for them to be able to consent to the use that data with third parties, without any additional consent on your part.)
Regarding problem 1 — imagine, as I said above, the Evernote use-case. You add a bunch of notes and web clips to an Evernote notebook through their client app. This app syncs with their cloud backend; and then this cloud backend sends each note and clip in turn through several phases of third-party ML-assisted processing during the indexing process — "AI OCR", "AI fulltext keyword extraction", etc. But maybe the models don't know some words they weren't trained on; or worse yet, maybe the creators of those models intentionally trained the AI to not emit those words, but didn't document this anywhere for the vendor to see. Now some of your documents are un-searchable — lost in the depths of your notebook. And you don't even realize it's happening, because the ML-assisted indexing isn't "flaky", it's rock-solid at 100% recognition for all your other documents... it just entirely ignores the documents with certain specific words in them.
Regarding problem 2 — think about ML built into a professional creative tool. A tool like Adobe Illustrator (where it's probably called client-side) — or even Adobe Bridge (where it might just be happening in their "creative cloud", analyzing your Bridge library to suggest stock image assets that might "go well with" your projects.) What if your own Illustrator illustrations, sitting in Bridge, are being fed by Adobe to some third-party image2text model... owned by a company that also does generative image models? In other words, what if your own Illustrator illustrations — done as a work-for-hire for a company that is paying you to create them a unique and cutting-edge brand appearance — are being used to train a generative image model to ape that novel style you created for this client; and that version of the model trained on your images gets published for use; and people end up discovering and using your new style through it, putting out works on the cheap that look just like the thing you were going to charge your client for, before you even present it to them?
A "third-party ML processing opt-out" request-constraint, would be something intended to circumvent these two problems, by disallowing forwarding of "your" data to third-party ML APIs for processing altogether.
This constraint would disallow vendors from relying on third-party models that come with opaque problems of bias or censorship that the vendors were never informed of and so can't work around. (Such vendors, to enable their backend to still function with this constraint in play, would likely either fall back to non-ML heuristics; or to a "worse" but "more transparent" first-party-trained ML model — one trained specifically on and for the dataset of their own vertical, and so not suffering from unknown, generalized biases, but rather domain-specific or training-set specific biases that can be found and corrected and/or hacked around in response to customer complaints.)
And by disallowing vendors from submitting data to third-party model companies at all, this constraint would of course prevent anyone (other than the vendor themselves) from training a model on the submitted data. Which would in turn mean that any applicable "data use" by the vendor, could be completely encapsulated by contract (i.e. by the Terms of Use of the software and accompanying service) that the client can see and consent to; rather than being covered in part by agreements the vendor makes with third parties for use of "the vendor's" data that currently just happens to contain — but isn't legally considered to be — "the user's" data.
This is the wrong approach and what you really want is for LLMs to instead have access to a palette of pre-vetted bugtested commands and options implemented by other applications.
ie think like those python embeds in OpenAI… but instead of building a python script, you should be building an Ansible playbook or a MacOS shortcut that does the task, rather than an LLM banging together shell code directly.
Things like file access or web request etc are just more primitives in this model. Don’t want it to call out to the web? Don’t give it access to the web request primitives. Like this literally has been a solved problem for 30 years - macOS intents and windows COM interfaces allow applications to expose these capabilities in a way that can be programmatically interfaced by other code to build up scripts.
https://developer.apple.com/documentation/appintents
https://learn.microsoft.com/en-us/windows/win32/com/the-comp...
This is HyperCard-era stuff, Unix just won so thoroughly that people don’t consider these capabilities, and everyone assumes everything has to be a single giant unshaped command of CLI garbage piped together.
The Unix mindset is not the right one for LLMs working at a token level. The idiom you need to mimic is MacOS intents or Ansible actions… or at least powershell actions. The amount of unconstrained shell-scripting involved needs to be minimized rigorously. Every time you do it it’s a risk, so why make it write anything more complex than glue code or composable YAML commands?