Years ago, I ran into a similar problem working on a program that was doing named entity recognition to assist humans with data entry. We found that, for our purposes, there seemed to be no (realistic) accuracy threshold beyond which the tool would save clients money, because double-checking the machine-generated output was inherently more work than doing it by hand.
So we pivoted the product to being something you would run on full auto, for situations where you didn't need a high level of quality. I'm not sure if that option is available to programmers, though.
Maybe Copilot could be turned into a context-aware search engine? That is, invoking it would return a list of examples that it thinks do the same thing as what you're trying to do, based on your work-in-progress code.
I honestly think this is where things are going. Man-machine partnership on creative tasks. Not only is it more amenable to current models, it’s higher leverage and less likely to be completely automated away.
So we pivoted the product to being something you would run on full auto, for situations where you didn't need a high level of quality. I'm not sure if that option is available to programmers, though.