> "Yes but this copilot model takes that, adds value and doesn't itself join the...

nightski · on June 23, 2022

That's the whole point. Without the data, it would be worthless. Microsoft is not paying the full cost because it is ripping the data without asking consent. I'm not saying what they are doing is illegal per se, but it's definitely immoral.

Guid_NewGuid · on June 23, 2022

But why is it immoral? All that code is still out there, if I had the time and the resources I could build a language model. Unlike commons in the real world (e.g. land, fresh water, etc) a code commons is purely additive. With the release of Copilot (which I don't intend to pay for or use) nothing has been destroyed, instead we'll get more code for less work where companies do pay for their developers to use it, some might even find its way back into the commons as new open-source code (whether more code of copilot generated quality in general is an unalloyed good is left as an exercise to the reader).

bayindirh · on June 23, 2022

Because copilot is violating the terms I put for my code. My code is GPL. It cannot be put into projects with incompatible licenses. That’s my code, and I share it with strings attached. You can’t just copy my code and sell to other parties no strings attached.

If that’s fine and dandy, Microsoft should also train Copilot on their source code repositories, so we can use that knowledge, too.

ShamelessC · on June 24, 2022

I guess I've just never had to work with GPL code before, but the complaints essentially only seem to be coming from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

I guess you have the right to do this, but it doesn't mesh at all with why I personally contribute (without any expectation of attribution), which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

Sorry if my viewpoint towards your work is apathetic, but the whole field is already infested with academics who only understand citation as a useful metric. Further, the point remains that anyone with enough money could do this - not just Microsoft (Salesforce has released several models for python competitive with Copilot). Times are changing - maybe don't share code anymore? I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

imtringued · on June 24, 2022

>from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

Most "coders" don't publish anything if they don't have to. Using proprietary code is an even worse pain in the ass because you don't have access to it.

The point of the GPL is to force people to share their code.

>which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

>So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

The entire point of the GPL is to force e.g. hardware vendors to share their driver code under the GPL or any other opensource license to be included in the Linux kernel.

>Times are changing - maybe don't share code anymore?

The entire point of the GPL is to force people to share their code.

> I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

What the hell are you talking about? If that is the case then why did humans ever bother with extensively documenting and testing their software if three sentences are enough to encode it? Your perspective is particularly annoying because copilot isn't learning to write its own code, it's entirely reliant on an army of unpaid software engineers publishing code on the internet. If it knows how to recreate a project from just the GitHub description it basically just had the codebase inside its model to begin with and merely pretend that it did everything on its own. That is actually a form of rent seeking.

ShamelessC · on June 24, 2022

> extensively documenting and testing their software if three sentences are enough to encode it

Was just hyperbole for "from plain English specs/requirements".

I'll admit to being uninformed about GPL, but your understanding of large language models is also limited. They actually learn to interpolate between data points meaning they can compose sequences not found in the training data. Further, GitHub added a feature that checks existing code for a match and rejects predictions if any match occurs.

bayindirh · on June 24, 2022

Nobody disputes their ability to interpolate, I think (at least not me), but the problem is the starting points for these interpolations contains GPL licensed code, hence it derives GPL licensed code.

This derivation brings GPL in, and the model doesn't understand this. As a result, every time a GPL training data is mixed into the interpolation, you're converting the code GPL, or if you're not converting your code to GPL, you're violating GPL.

It's plain and simple.

On the other hand, I'm hearing "we'll write the specs, and computer will just auto-generate it" gospel since 2002. This time it won't be different. Human brain, intuition and creativity is beyond algorithmic modeling.

So, no, computer will not autogenerate the code from specs. It might link boilerplate together, which can be already done today.

namose · on June 24, 2022

But GPL owners aren’t seeking rent, so you’re just asking those who believe all code should be open source to unilaterally let large companies use all their code, while they reap no such benefits from the large companies

ShamelessC · on June 24, 2022

Like I said, I understand the premise, just not the emotion behind why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

There are better ways to gain notoriety as a coder than by essentially legally requiring your name is attached to a thing for all time.

I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

bayindirh · on June 24, 2022

It's not an emotion. It's a stance.

> Why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

On the contrary. I donate my code to all human knowledge. Just not to corporate's private code corpus. I intend my code to be open to all humans to run, study, modify and share, forever. I don't give you the freedom to take it to a closed domain, and not share the further knowledge you derived from my code. If your primary intention is to return this knowledge to human kind, GPL is an enabler, not an hinderer.

> I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

I personally don't care whether my code is good enough to be used by a company. If I want to contribute code which can be used by a company, I can contribute to MIT projects (which I also do). I don't have anything to prove.

I release my code with the hope it'd be useful for somebody, and I don't want it to be included in any permissive or closed source base. Doesn't matter it saves your beef for today or not. That's not my problem. Go write a better one, then. I don't care.

bayindirh · on June 24, 2022

When actually using the software means "taking it, adding it to a commercial software and never telling anyone, incl. the developer of the original code, and not giving any attribution whatsoever, and earning money over that piece of code", yes GPL makes it hard. It's by design, and this is why I license anything and everything I put in the open GPLv3+.

If anyone contributes to a GPL software, they're clearly attributed. Moreover, Git makes this attribution irrevocably visible. Before that patches were sent in with mails, and mailing lists were open, so attribution was also visible back then. So, no, GPL makes attribution visible, and irrevocable, by design.

GPL doesn't seek rent over any idea. It forces ideas to stay open, forces you to put your improvements back in the open. You'll be attributed, your code will be in the open all the time, and nobody can grab and run your code and hide into its software to make any kind of unjust profit, which makes "Open Source" coders visibly and literally wince and cringe, because they can't grab and paste a piece of code and make their days easier.

Again, this is by design.

Sorry if my viewpoint towards your view is apathetic, but the whole field is already infested with programmers who only understand being able to copy and paste code left and right to develop software as a useful metric.

It's not about Microsoft, it's just about being honoring a license. A case-tested, lawyer written, trusted license which many developers chose for licensing their work. It's a breach of contract, plain and simple.

As I said elsewhere, some of the code I'm writing is backed by papers. I don't obfuscate my papers to prevent anyone from implementing it, but if I open my reference implementation as GPL, this is because I don't want someone to grab it and run with the code, change it a little, put into a closed source program and call the idea theirs, possibly patenting it in the process.

I have a serious piece of research, my Ph.D. actually, and I'm still developing the code powering the whole idea. I was planning to open it under GPL license, to force its evolution in the open, but I understood that people don't appreciate that. So, probably I won't open the code. Binaries maybe. Highly obfuscated, protected binaries, probably.

Banana699 · on June 23, 2022

You can say the exact same about piracy, when I take a game or a pdf book from a pirate site, nothing is destroyed, nothing is subtracted. The server still owns the data and can copy and share it infinitely, all that changed is that I now have a copy too, and I use it to enrich my own intellectual life.

The argument has 2 main flaws

1- It's not symmetric. The massive corporations with paid armies of lawyers aren't hugging trees and talking about how "Knowledge is - like - just free, man" with dreamy eyes, I would love if they were like that but no. They are constantly on the lookout for anyone remotely using their work. They don't deserve the language of free knowledge and open data, that would be like extending peace to an invading army, or defending a tyrant with the lingo of free speech. He Who Lives By The Sword Dies By The Sword.

2- If the person(s) behind the data or the code lives off their intellectual labor, you are ripping them off by using it without compensation. Sometimes the compensation is as little as simply citing them, just mention their names so that they get visibility and prestige they deserve for toiling in the intellectual field to produce the ideas and brain patterns you use and benefit from.

The whole thing is a huge mine field, digital reproduction of information and abstract structures is an extremely novel phenomenon that breaks tons of human intutions about how ideas and thinking work and spreads. But the involvement of a corporation allows you to shortcut the entire thing by invoking (1), also known as the fundamental theorem of ethics : Do Unto Others As You Wish They Do Unto You. Do corporations allow you to freely take and mix their intellectual produce and sell it back into them ? No ? then they DON'T get to do that either, except maybe among themselves.

What I find strange is how nobody talks about how inherently repulsive and ugly the "Copilot" philosophy is, how it is fundamentally a dead end and how much it betrays a lack of understanding of how programming works on part of those who fund and market it. Code is different from natural language, the fact that we call the symbols we write algorithms in "Programming Languages" is purely a historical incident. Code doesn't have the redundant resilience and error-correcting properties of natural language, removing or modifiying or adding even a tiny bit to correct code can give you atrociously-slow correct code, or full-of-security-holes correct code, or non-correct code, or any of the 3 mixed together with other disasters. If you're going to steal people's open source code, at least do somthing interesting and intelligent with it, don't be a lazy fuck and apply an NLP technique to a highly formal and rigid domain then smile smugly and charge people for it as if this going to end anywhere useful.