Interesting how people sourcing these softwares say China = bad, but Israel = good.
"Trusted by more than 50% of Fortune 100 companies".
You choose to give your most precious data and the keys of infrastructure whose job was to steal information and with people that are still NSA/8200 employees.
Don't be surprised if one day they are compelled to share data or find dirt on people (they protect one well known LLM company).
It doesn't mean they are doing it, but clearly the incentive for it exists, + you are exposed to both US and IL jurisdictions risk.
The founder came from Unit 8200, an Israeli cyberwarfare operation, that’s where the alignment comes from, not simply US foreign policy which is coincidental.
OpenAI found a way to circumvent the exclusivity. The deal was poorly defined by Microsoft. OpenAI had started selling a service on AWS that had a stateful component to it, not purely an API. Obviously Microsoft didn’t like that and confronted Altman, and this is the settlement of that confrontation, OpenAI doesn’t need to do workarounds, Microsoft won’t sue to enforce exclusivity, and Microsoft doesn’t have to pay dev share to OpenAI. AWS is a much bigger market so OpenAI doesn’t care.
I don't think GitHub even set a precedent for this. My understanding is that they don't train on private repositories per se, though if you access a private repository through copilot, the data flow through copilot can be trained on, which pulls in data from the repo.
So a private repo should be safe, as long as you don't use copilot. While Atlassian wants to pull in data from private issue trackers/wikis.
Listen to yourself. Take a moment and try to unpack the mental gymnastics wrangling you just did. Ask yourself, why does the fact that you have a Copilot subscription make it okay to train on all your private repos?
GitHub does not have any of its own models. It routes to partners like OpenAI. Just because some data is from private repos, doesn’t mean all data is flowing nor does it mean it should be trained on just because it’s being inferences on, and there is a difference on the data that was used vs. all the data from that repo, and difference between just that repo vs. all private repos. And they made it all opted in as default. Draconian.
So yes, they did set a precedent and you’re here arguing why it’s okay.
I’d say those CMU researchers are out of touch with the reality. GitHub can easily overhaul this with a much better system than what those researchers recommended but chooses not to.
GitHub has all kinds of private internal metrics that could update the system to show a much higher signal/quality score. A score that is impervious to manipulation. And extremely well correlated with actual quality and popularity and value, not noise.
Two projects could look exactly the same from visible metrics, and one is complete shell and the other a great project.
But they choose not to publish it.
And those same private signals more effectively spot the signal-rich stargazers than PageRank.
Much more important is who starred it. And are they selective about giving out stars or bookmarking everything. Forks is a closer signal to usage than stargazing.
Indeed, GitHub should set up a monthly quota for available stars to give and correlate the account age with it: either make something up like a "trusted-age-factor" that multiplies any given star by that factor, or scale the available quota accordingly by that factor (and let users star repos repeatedly).
GitHub should also introduce a way to bookmark a repo, additional to the existing options of sponsor/watch/fork/star-ing it.
reply