People look at the beta release of the software, and interpret flaws in an early...

TeMPOraL · on July 12, 2021

> People look at the beta release of the software, and interpret flaws in an early version like a critical fundamental problems.

Maybe because they realize the flaws are fundamentally inherent in the very core of the product. They're using a GPT-3 derivative here. DNN models are not the right tool for this job.

kozikow · on July 12, 2021

Why following wouldn't work for licenses: Train 3 models:

1. Only permissive licenses - Only include in the training set repos with permissive licenses - MIT, Apache.

2. Copy-left - Step 1 + GPL, excluding AGPL and other "hardcore copy-left licenses".

3. All - Include all code, even unlicensed and AGPL.

User can choose which version they prefer based on profile of their project and their company? Majority of github repos have LICENSE, so it doesn't seem implausible?

nullc · on July 12, 2021

Almost all permissively licensed code still require preserving copyright notices or other attribution. So where copilot is creating copyright violations, restricting its training to MIT or Apache licensed code will not resolve the issue.

only_as_i_fall · on July 12, 2021

Does Microsoft actually have a repository of permissive licensed code though?

My guess would be that some significant portion of github code published under a permissive license is actually licensed improperly.

Working that out at scale seems intractable, but maybe the training set doesn't need to be as big as I'm assuming.

nullc · on July 12, 2021

There is still something to be said for making a reasonable effort. There are no guarantees in the world.

tyingq · on July 12, 2021

I'd be more optimistic if the beta were crafted with the idea that it might have issues. It so, it would likely have some way of gathering feedback on suggestions that was a little more nuanced than just accepted/rejected.

skohan · on July 12, 2021

This could actually be interesting. If it tuns out that copy-left based code completion is better than other options, it will create a strong incentive to spread it.