AI assistance when contributing to the Linux kernel

(github.com)

173 points | by hmokiguess 7 hours ago

21 comments

qsort 6 hours ago
Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.
That's... refreshingly normal? Surely something most people acting in good faith can get behind.
[-]
- pibaker 13 minutes ago
  I agree this is very sane and boring. What is insane is that they have to state this in the first place.
  I am not against AI coding in general. But there are too many people "contributing" AI generated code to open source projects even when they can't understand what's going on in their code just so they can say in their resumes that they contributed to a big open source project once. And when the maintainer call them out they just blame it on the AI coding tools they are using as if they are not opening PRs under their own names. I can't blame any open source maintainer for being at least a little sceptical when it comes to AI generated contributions.
- galaxyLogic 6 hours ago
  But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?
  [-]
  - jillesvangurp 5 hours ago
    AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright. The work might hypothetically infringe on other people's copyright. But such an infringement does not happen until a human decides to create and distribute a work that somehow integrates that generated code or text.
    The solution documented here seems very pragmatic. You as a contributor simply state that you are making the contribution and that you are not infringing on other people's work with that contribution under the GPLv2. And you document the fact that you used AI for transparency reasons.
    There is a lot of legal murkiness around how training data is handled, and the output of the models. Or even the models themselves. Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work? The legal arguments here will probably take a long time to settle but it seems the fair use concept offers a way out here. You might create potentially infringing work with a model that may or may not be covered by fair use. But that would be your decision.
    For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.
    [-]
    - nitwit005 3 hours ago
      That you can't copyright the AI's output (in the US, at least), doesn't imply it doesn't contain copyrighted material. If you generate an image of a Disney character, Disney still owns the copyright to that character.
    - ninjagoo 4 hours ago
      IANAL; this is what my limited understanding of the matter is. With that caveat: it is easy to forget that copyright is on output- verbatim or exact reproductions and derivatives of a covered work are already covered under copyright.
      So if the AI outputs Starry Night or Starry Night in different color theme, that's likely infringement without permission from van Gogh, who would have recourse against someone, either the user or the AI provider.
      But a starry-night style picture of an aquarium might not be infringing at all.
      >For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.
      I would argue that if it was a verbatim reproduction of a copyrighted piece of software, that would likely be infringing. But if it was similar only in style, with different function names and structure, probably not infringing.
      Folks will argue that some things might be too small to do any different, for example a tiny snippet like python print("hello") or 1+1=2 or a for loop in your example. In that case it's too lacking in original expression to qualify for copyright protection anyway.
    - Lerc 3 hours ago
      >AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright.
      That is a non sequitur. Also, I'm not sure if copyright applies to humans, or persons (not that I have encountered particularly creative corporations, but Taranaki Maunga has been known for large scale decorative works)
    - mcv 3 hours ago
      Didn't a court in the US declare that AI generated content cannot be copyrighted? I think that could be a problem for AI generated code. Fine for projects with an MIT/BSD license I suppose, but GPL relies on copyright.
      However, if the code has been slightly changed by a human, it can be copyrighted again. I think.
      [-]
      - simonw 3 hours ago
        Thaler v. Perlmutter said that an AI system cannot be listed as the sole author of a work - copyright requires a human author.
        US Copyright Office guidance in 2023 said work created with the help of AI can be registered as long as there is "sufficient human creative input". I don't believe that has ever been qualified with respect to code, but my instinct is that the way most people use coding agents (especially for something like kernel development) would qualify.
      - RussianCow 3 hours ago
        > Didn't a court in the US declare that AI generated content cannot be copyrighted?
        No, my understanding is that AI generated content can't be copyrighted by the AI. A human can still copyright it, however.
      - tadfisher 3 hours ago
        No, a court did not declare that. The case involved a person trying to register a work with only the AI system listed as author. The Supreme Court decided that you can't do that, you need to list a human being as author to register a work with the Copyright Office. This stems from existing precedent where someone tried to register a photograph with the monkey photographer listed as author.
        I don't believe the idea that humans can or can't claim copyright over AI-authored works has been tested. The Copyright Office says your prompt doesn't count and you need some human-authored element in the final work. We'll have to see.
        [-]
        papercrane 47 minutes ago
        It's almost a certainty that you can't copyright code that was generated entirely by an AI.
        Copyright requires some amount of human originality. You could copyright the prompt, and if you modify the generated code you can claim copyright on your modifications.
        The closest applicable case would be the monkey selfie.
        https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
        manwe150 58 minutes ago
        I’m curious to see if subscription vs free ends up mattering here. If it is a work for hire, generally it doesn’t matter how the work was produced, the end result is mine, because I contracted and instructed (prompted?) someone to do it for me. So will the copyright office decide it cares if I paid for the AI tool explicitly?
      - singpolyma3 1 hour ago
        Public domain code is GPL compatible
  - afro88 6 hours ago
    Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses
    [-]
    - rzmmm 5 hours ago
      In certain law cases plagiarization can be influenced by the fact if person is exposed to the copyrighted work. AI models are exposed to very large corpus of works..
      [-]
      - cxr 4 hours ago
        Copyright infringement and plagiarism are not the same or even very closely related. They're different concepts and not interchangeable. Relative to copyright infringement, cases of plagiarism are rarely a matter for courts to decide or care about at all. Plagiarism is primarily an ethical (and not civil or criminal) matter. Rather than be dealt with by the legal system, it is the subject of codes of ethics within e.g. academia, journalism, etc. which have their own extra-judicial standards and methods of enforcement.
        [-]
        dekhn 3 hours ago
        I suspect they were instead referring to patents; for example, when I worked at Google, they told the engineers not to read patents because then the engineer might invent something infringing, I think it's called willful infringement. No other employer I've worked for has every raised this as an issue, while many lawyers at google would warn against this.
    - sarchertech 6 hours ago
      How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.
      [-]
      - Andrex 5 hours ago
        Let me introduce you to the concept of submarine patents...
      - shevy-java 6 hours ago
        But the responsible party is still the human who added the code. Not the tool that helped do so.
        [-]
        aargh_aargh 5 hours ago
        The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.
        Cytobit 5 hours ago
        That's not going to shield the Linux organization.
        [-]
        cxr 4 hours ago
        A DCO bearing a claim of original authorship (or assertion of other permitted use) isn't going to shield them entirely, but it can mitigate liability and damages.
        [-]
        sarchertech 1 hour ago
        Can it though? As far as I know this hasn’t been tested.
        [-]
        sarchertech 6 hours ago
        In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.
        [-]
        Chance-Device 5 hours ago
        It’s not about the mechanism: responsibility is a social construct, it works the way people say that it works. If we all agree that a human can agree to bear the responsibility for AI outputs, and face any consequences resulting from those outputs, then that’s the whole shebang.
        [-]
        sarchertech 5 hours ago
        Sure we could change the law. It would be a stupid change to allow individuals, organizations, and companies to completely shield themselves from the consequences of risky behaviors (more than we already do) simply by assigning all liability to a fall guy.
        [-]
        Chance-Device 5 hours ago
        What law exactly are you suggesting needs to be changed? How is this any different from what already happens right now, today?
        [-]
        sarchertech 5 hours ago
        Right now it's very easy not to infringe on copyrighted code if you write the code yourself. In the vast majority of cases if you infringed it's because you did something wrong that you could have prevented (in the case where you didn't do anything wrong, inducement creation is an affirmative defense against copyright infringement).
        That is not the case when using AI generated code. There is no way to use it without the chance of introducing infringing code.
        Because of that if you tell a user they can use AI generated code, and they introduce infringing code, that was a foreseeable outcome of your action. In the case where you are the owner of a company, or the head of an organization that benefits from contributors using AI code, your company or organization could be liable.
        [-]
        Chance-Device 4 hours ago
        It’s a foreseeable outcome that humans might introduce copyrighted code into the kernel.
        I think you’re looking for problems that don’t really exist here, you seem committed to an anti AI stance where none is justified.
        [-]
        sarchertech 4 hours ago
        A human has to willingly violate the law for that to happen though. There is no way for a human to use AI generated that doesn't have a chance of producing copyrighted code though. That's just expected.
        If you don't think this is a problem take a look at the terms of the enterprise agreements from OpenAI and Anthropic. Companies recognize this is an issue and so they were forced to add an indemnification clause, explicitly saying they'll pay for any damages resulting in infringement lawsuits.
        johnisgood 2 hours ago
        > Right now it's very easy not to infringe on copyrighted code if you write the code yourself.
        Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.
        [-]
        sarchertech 1 hour ago
        They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.
        [-]
        johnisgood 1 hour ago
        You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.
        On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.
        LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!
        In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.
        bpt3 5 hours ago
        In this case, the "fall guy" is the person who actually introduced the code in question into the codebase.
        They wouldn't be some patsy that is around just to take blame, but the actual responsible party for the issue.
        [-]
        sarchertech 5 hours ago
        Imagine your a factory owner and you need a chemical delivered from across the country, but the chemical is dangerous and if the tanker truck drives faster than 50 miles per hour it has a 0.001% chance per mile of exploding.
        You hire an independent contractor and tell him that he can drive 60 miles per hour if he wants to but if it explodes he accepts responsibility.
        He does and it explodes killing 10 people. If the family of those 10 people has evidence you created the conditions to cause the explosion in order to benefit your company, you're probably going to lose in civil court.
        Linus benefits from the increase velocity of people using AI. He doesn't get to put all the liability on the people contributing.
        [-]
        bpt3 1 hour ago
        That is a nonsensical analogy on multiple levels, and doesn't even support your own argument.
        [-]
        sarchertech 1 hour ago
        Nice rebuttal.
        [-]
        bpt3 1 hour ago
        Why would I put much effort into responding to a post like yours, which makes no sense and just shows that you don't understand what you're talking about?
        lo_zamoyski 3 hours ago
        Responsibility is an objective fact, not just some arbitrary social convention. What we can agree or disagree about is where it rests, but that's a matter of inference, an inference can be more or less correct. We might assign certain people certain responsibilities before the fact, but that's to charge them with the care of some good, not to blame them for things before they were charged with their care.
        bitwize 3 hours ago
        Because contributions to Linux are meticulously attributed to, and remain property of, their authors, those authors bear ultimate responsibility. If Fred Foobar sends patches to the kernel that, as it turns out, contain copyrighted code, then provided upstream maintainers did reasonable due diligence the court will go after Fred Foobar for damages, and quite likely demand that the kernel organization no longer distribute copies of the kernel with Fred's code in it.
        [-]
        sarchertech 1 hour ago
        Anyone distributing infringing material can be liable, and it’s unlikely that this technicality will actually would shield anyone.
        Anyone who thinks they have a strong infringement case isn’t going to stop at the guy who authored the code, they’re going to go after anyone with deep pockets with a good chance of winning.
    - martin-t 5 hours ago
      As opposed to an irregular person?
      LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).
      A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.
      The rules of copyright allow humans to do certain things because:
      - Learning enriches the human.
      - Once a human consumes information, he can't willingly forget it.
      - It is impossible to prove how much a human-created intellectual work is based on others.
      With LLMs:
      - Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.
      - It's perfectly possible to create a model based only on content with specific licenses or only public domain.
      - It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.
  - noosphr 5 hours ago
    Tab complete does not produce copyrightable material either. Yet we don't require software to be written in nano.
  - panzi 6 hours ago
    If the output is public domain it's fine as I understand it.
    [-]
    - galaxyLogic 6 hours ago
      Makes sense to me. But so anybody can take Public Domain code and place it under GNU Public License (by dropping it into a Linux source-code file) ?
      Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
      [-]
      - robinsonb5 6 hours ago
        > Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
        You're perfectly at liberty to relicense public domain code if you wish.
        The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.
        [-]
        cwnyth 5 hours ago
        This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.
        If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.
        I'm not sure what the hullabaloo is about.
        [-]
        manwe150 38 minutes ago
        If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them? If the output is possible to copyright, then you could claim their prompt is infringement (just like if it reproduced Harry Potter). If it isn’t copyrightable, then the kernel would not have legal standing to enforce the GPL on those lines of code against any future AI reproduction of them. The developers might need to show that the code is licensed under GPL and only GPL, otherwise there is the possibility the same original contributor (eg the AI) did permit the copy. The GPL is an imposed restriction on what the kernel can legally do with any code contributions. That seems legally complicated for some projects—probably not the kernel with the large amount of pre-AI code, but maybe it spells trouble for smaller newer projects if they want to sue over infringement. IANAL.
      - miki123211 5 hours ago
        Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.
        If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.
      - jaggederest 5 hours ago
        The core thing about licenses, in general, is that they only grant new usage. If you can already use the code because it's public domain, they don't further restrict it. The license, in that case, is irrelevant.
        Remember that licenses are powered by copyright - granting a license to non-copyrighted code doesn't do anything, because there's no enforcement mechanism.
        This is also why copyright reform for software engineering is so important, because code entering the public domain cuts the gordian knot of licensing issues.
      - sambaumann 6 hours ago
        Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works
    - martin-t 5 hours ago
      This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.
      LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.
      [-]
      - supern0va 4 hours ago
        >LLM-creation ("training") involves detecting/compressing patterns of the input.
        There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.
        LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.
        They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.
        Kirby Ferguson's video on this is pretty great: https://www.youtube.com/watch?v=X9RYuvPCQUA
        [-]
        martin-t 4 hours ago
        So? Why should it be legal?
        If people find this cool and wanna play with it, they can, just make sure to only mix compatible licenses in the training data and license the output appropriately. Well, the attribution issue is still there, so maybe they can restrict themselves to public domain stuff. If LLMs are so capable, it shouldn't limit the quality of their output too much.
        Now for the real issue: what do you think the world will look like in 5 or 10 years if LLMs surpass human abilities in all areas revolving around text input and output?
        Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded? Or will the rich reap most of the benefit while also simultaneously turning us into beggars?
        Even if you assume 100% of the people doing intellectual work now will convert to manual work (i.e. there's enough work for everyone) and robots don't advance at all, that'll drive the value of manual labor down a lot. Do you have it games out in your head and believe somehow life will be better for you, let alone for most people? Or have yo not thought about it at all yet?
      - timmmmmmay 5 hours ago
        fortunately, you aren't only operating on representations, right? lemme check my Schopenhauer right quick...
- shevy-java 6 hours ago
  But why should AI then be attributed if it is merely a tool that is used?
  [-]
  - lonelyasacloud 3 hours ago
    Having an honesty based tag could be only way to monitor impact or get after a fix in code bases if things go south.
    That is at the moment: - Nobody knows for sure what agents might add and their long term effects on codebases.
    - It's at best unclear that AI content in a codebase can be reliably determined automatically.
    - Even if it's not malicious, at least some of its contributions are likely to be deleterious and pass undetected by human review.
  - hgoel 46 minutes ago
    This is a good point but I'd take it in the opposite direction from the implication, we should document which tools were used in general, it'd be a neat indicator of what people use.
  - plmpsu 5 hours ago
    it makes sense to keep track of what model wrote what code to look for patterns, behaviors, etc.
  - streetfighter64 5 hours ago
    It isn't?
    > AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO).
    They mention an Assisted-by tag, but that also contains stuff like "clang-tidy". Surely you're not interpreting that as people "attributing" the work to the linter?
ninjagoo 4 hours ago
```
  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:
```
Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.
I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.
Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.
ipython 6 hours ago
Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.
[-]
- pixel_popping 6 hours ago
  Literally, insane that some projects blanket-ban AI despite being the human responsibility in the end.
  [-]
  - tom_ 3 hours ago
    It is no more insane than doing the opposite. This whole business has yet to play itself out.
  - KoftaBob 3 hours ago
    It's just a form of sanctimonious virtue-signaling that's trendy right now.
  - daveguy 6 hours ago
    Not insane at all. Just a very useful shortcut. Not everyone wants to move fast and break shit.
    [-]
    - pixel_popping 6 hours ago
      I still think it's insane, why would you care about the "origin" of the code as long as there is a human accountable (that you can ban anyway)?
      [-]
      - 59nadir 6 hours ago
        Because you don't want to deal with people who can't write their own code. If they can, the rule will do nothing to stop them from contributing. It'll only matter if they simply couldn't make their contribution without LLMs.
        [-]
        pixel_popping 5 hours ago
        So tomorrow, if a model genuinely find a bunch of real vulnerabilities, you just would ignore them? that makes no sense.
        [-]
        59nadir 5 hours ago
        An LLM finding problems in code is not the same at all as someone using it to contribute code they couldn't write or haven't written themselves to a project. A report stating "There is a bug/security issue here" is not itself something I have to maintain, it's something I can react to and write code to fix, then I have to maintain that code.
      - jeremyjh 1 hour ago
        Because they aren’t accountable - after it is merged only I am. And why would I want to go back and forth with an LLM through PR comments when I could just talk to the agent myself in real time? Anytime I want to work through a pile of slop I can ask for one, but I don’t work that way. I work with the agent to create plans first and refine them, and the author of a PR who couldn’t do that adds nothing.
      - streetfighter64 5 hours ago
        If your doctor told you he used an ouija board to find your diagnosis, would you care about the origin of the diagnosis or just trust that he'll be accountable for it?
        [-]
        pixel_popping 5 hours ago
        If the Ouija board was powered by Opus, who knows :D
  - pydry 6 hours ago
    And yet it puts a stop to the tsunami of slop and it's pretty much impossible to prove anything of value was lost.
    [-]
    - pixel_popping 6 hours ago
      but why? it's a human making the PR and you can shame/ban that human anyway.
      [-]
      - materielle 1 hour ago
        I think AI bans are more common in projects where the maintainers are nice people that thoughtfully want to consider each PR and provide a reasoned response if rejected.
        That’s only feasible when the people who open PRs are acting in good faith, and control both the quality and volume of PRs to something that the maintainers can realistically (and ought to) review in their 2-3 hours of weekly free time.
        Linux is a bit different. Your code can be rejected, or not even looked at in the first place, if it’s not a high quality and desired contribution.
        Also, it’s not just about PR quality, but also volume. It’s possible for contributions to be a net benefit in isolation. But most open source maintainers only have an hour or so a week to review PRs and need to prioritize aggressively. People who code with AI agents would benefit themselves to ask “does this PR align with the priorities and time availability of the maintainer?”
        For instance, I’m sure we could point AI at many open source projects and tell it to optimize performance. And the agent would produce a bunch of high quality PRs that are a good idea in isolation. But what if performance optimization isn’t a good use of time for a given maintainer’s weekly code review quota?
        Sure, maintainers can simply close the PR without a reason if they don’t have time.
        But I fear we are taking advantage of nice people, who want to give a reasoned response to every contribution, but simply can’t keep up with the volume that agents can produce.
      - podgietaru 4 hours ago
        Volume - things take time to review. If you’re inundated with so many PRs then it’s harder to curate in general
      - yoyohello13 5 hours ago
        > it's a human making the PR
        Is it? Remember when that agent wrote a hit piece about the maintainer because he wouldn't merge it's PR?
        [-]
        pixel_popping 5 hours ago
        That's a different issue actually.
      - Ekaros 2 hours ago
        You are treating humans as reasonable actors. They very often are not. On easy to access platforms like github you can have humans just working as intermediaries between LLM and the github. Not actually checking or understanding what they put in a pull request. Banning these people outright with clear rules is much faster and easier than trying to argue with them.
        Linux is somewhat harder to contribute to and they already have sufficient barriers in place so they can rely on more reasonable human actors.
sarchertech 6 hours ago
This does nothing to shield Linux from responsibility for infringing code.
This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.
It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.
[-]
- SirHumphrey 5 hours ago
  Quite a lot of companies use and release AI written code, are they all liable?
  [-]
  - nitwit005 3 hours ago
    Yep, and honestly it's going to come up with things other than lawsuits.
    I've worked at a company that was asked as part of a merger to scan for code copied from open source. That ended up being a major issue for the merger. People had copied various C headers around in odd places, and indeed stolen an odd bit of telnet code. We had to go clean it up.
  - sarchertech 5 hours ago
    1. Almost definitely if discovered
    2. Infringement in closed source code isn’t as likely to be discovered
    3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.
dataviz1000 6 hours ago
This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]
[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003
dec0dedab0de 6 hours ago
All code must be compatible with GPL-2.0-only
Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?
[-]
- compyman 6 hours ago
  You might be being too pedantic :)
  https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)
- philipov 6 hours ago
  GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.
  [-]
  - kbelder 3 hours ago
    Right, the final hyphen changes the meaning of the sentence.
    "GPL-2.0-only" "GPL-2.0 only"
newsoftheday 6 hours ago
> All code must be compatible with GPL-2.0-only
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
[-]
- philipov 6 hours ago
  You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.
  [-]
  - sarchertech 6 hours ago
    There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.
    The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.
    [-]
    - philipov 6 hours ago
      If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.
      [-]
      - sarchertech 5 hours ago
        That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.
        You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.
        [-]
        philipov 4 hours ago
        This policy effectively punts on the question of what tools were used to create the contribution, and states that regardless of how the code was made, only humans may be considered authors.
        From the foundation's point of view, humans are just as capable of submitting infringing code as AI is. If your argument is sound, then how can Linux accept contributors at all?
        EDIT: To answer my own question:
        Instead of a signed legal contract, a DCO is an affirmation that a certain person confirms that it is (s)he who holds legal liability for the act of sending of the code, that makes it easier to shift liability to the sender of the code in the case of any legal litigation, which serves as a deterrent of sending any code that can cause legal issues.
        This is how the Foundation protects itself, and the policy is that a contribution must have a human as the person who will accept the liability if the foundation comes under fire. The effectiveness of this policy (or not) doesn't depend on how the code was created.
        [-]
        sarchertech 1 hour ago
        Anyone distributing copyrighted material can be liable that DCO isn’t going to stop anyone.
        If that worked any corporation that wanted to use code they legally couldn’t could just use a fork from someone who assumed responsibility and worst case they’d have to stop using it if someone found out.
        empath75 3 hours ago
        The only lawsuits so far have been over training on open source software. You're inventing a liability problem that essentially does not exist.
        [-]
        sarchertech 1 hour ago
        OpenAI and Anthropic added an indemnity clause to their enterprise contracts specifically to cover this scenario because companies wouldn’t adopt otherwise.
      - streetfighter64 5 hours ago
        Yeah, but that's not a useful thing to do because not everybody thinks about that or considers it a problem. If somebody's careless and contributes copyrighted code, that's a problem for linux too, not only the author.
        For comparison, you wouldn't say, "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down", because then of course somebody would be careless enough to build a bridge that falls down.
        Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.
        [-]
        philipov 4 hours ago
        It was already necessary to solve the problem of humans contributing infringing code. It was solved by having contributors assume liability with a DCO. The policy being discussed today asserts that, because AI may not be held legally liable for its contributions, AI may not sign a DCO. A human signature is required. This puts the situation back to what it was with human contributors. What you are proposing goes beyond maintaining the status quo.
        [-]
        sarchertech 1 hour ago
        It’s not solved. It hasn’t been tested in court to my knowledge and in my opinion is unlikely to hold up to serious challenge. You can be held liable for just distributing copyrighted code even if the whole “the Linux foundation doesn’t own anything” holds up.
    - adikso 4 hours ago
      Their position is probably that LLM technology itself does not require training on code with incompatible licenses, and they probably also tend to avoid engaging in the philosophical debate over whether LLM-generated output is a derivative copy or an original creation (like how humans produce similar code without copying after being exposed to code). I think that even if they view it as derivative, they're being pragmatic - they don't want to block LLM use across the board, since in principle you can train on properly licensed, GPL-compatible data.
  - newsoftheday 5 hours ago
    > That means if the AI messes up
    I'm not talking about maintainability or reliability. I'm talking about legal culpability.
- tmp10423288442 6 hours ago
  Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.
KhayaliY 3 hours ago
We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.
So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?
themafia 4 hours ago
> All contributions must comply with the kernel's licensing requirements:
I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.
If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.
baggy_trough 6 hours ago
Sounds sensible.
NetOpWibby 3 hours ago
inb4 people rage against Linux
lowsong 5 hours ago
At least it'll make it easy to audit and replace it all in a few years.
spwa4 5 hours ago
Why does this file have an extension of .rst? What does that even mean for the fileformat?
[-]
- jdreaver 5 hours ago
  https://en.wikipedia.org/wiki/ReStructuredText
  This format really took off in the Python community in the 2000's for documentation. The Linux kernel has used it for documentation as well for a while now.
- adikso 5 hours ago
  reStructuredText. Just like you have .md files everywhere.
martin-t 6 hours ago
This feels like the OSS community is giving up.
LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].
The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which
1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies
2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.
I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.
Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
[0]: https://news.ycombinator.com/item?id=47356000
[1]: http://prize.hutter1.net/
[2]: https://en.wikipedia.org/wiki/ELIZA_effect
[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...
[-]
- ninjagoo 3 hours ago
  > Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
  I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).
  Morally the case gets interesting.
  Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.
  The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
  The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]
  You'll do it for the kudos. [2][3]
```
  *Post-Scarcity Future. 
  [1] https://en.wikipedia.org/wiki/Post-scarcity
  [2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al.
  [3] https://en.wikipedia.org/wiki/Accelerando
```
  [-]
  - martin-t 2 hours ago
    > The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
    Yes.
    I have 2 issues with "post-scarcity":
    - It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.
    - It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.
    ---
    Back to your question:
    I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.
    Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.
    A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.
    What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.
    There have been attempts in this direction[0] but not very successful.
    In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.
    With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.
    [0]: https://json.org/license.html
    [-]
    - ninjagoo 8 minutes ago
      > In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases.
      By definition, that's not a post-scarcity world; and that's already todays world.
      > It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have.
      Do you think that's genetic, or environmental? Either way, maybe it will have been trained out of the kids.
      > it has also been used by people who actively enjoy hurting others, who have caused measurable harm
      Taxes work the same way too. "The Good Place" explores these second-order and higher-order effects in a surprisingly nuanced fashion.
      Control over the actions of others, you have not. Keep you from your work, let it not.
      > What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good
      These are all things necessary in a society with scarcity. Will they be needed in a post-scarcity society that has presumably solved all disorder that has its roots in scarcity?
      > With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself.
      Yes, the futility of our actions can be infuriating, disheartening, and debilitating. Comes to mind the story about the chap that was tossing washed ashore starfish one by one. There were thousands. When asked why do this futile task - can't throw them all back- he answered as he threw the next ones: it matters to this one, it matters to this one, ...
- KK7NIL 5 hours ago
  > I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").
  I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.
  We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.
  [-]
  - martin-t 5 hours ago
    Would you say "assisted by vim" or "assisted by gcc"?
    It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".
    The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)
    > whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming
    It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).
    Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.
    Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?
    [-]
    - zbentley 5 hours ago
      If a linter insists on a weird line of code, I’m probably commenting that line as “recommended by whatever-linter”, yes.
      [-]
      - martin-t 2 hours ago
        I wouldn't but I can see why some people would.
        I can't point out where I draw the line clearly but here's one different I notice:
        A recommendation can be both a thing and an action. A piece of text is a recommendation and it does not matter how it was created.
        Assistance implies some parity in capabilities and cooperative work. Also it can pretty much only be an action, you cannot say "here is some assistance" and point to a thing.
- tmp10423288442 5 hours ago
  On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.
  [-]
  - martin-t 5 hours ago
    The comment says "Opus 4.6 without tool use or web access"
techpulselab 2 hours ago
[dead]
redoh 5 hours ago
[dead]
midnightn 1 hour ago
[dead]
northstar-au 4 hours ago
[dead]
bitwize 6 hours ago
Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.
shevy-java 6 hours ago
Fork the kernel!
Humans for humans!
Don't let skynet win!!!
[-]
- aruametello 4 hours ago
  > Fork the kernel!
  pre "clanker-linux".
  I am more intrigued by the inevitable Linux distro that will refuse any code that has AI contributions in it.
the_biot 5 hours ago
Linux has fallen. Linus Torvalds is now just another vibe coder. I give it less than a year, or maybe a month, until Linux gets vibe-coded patches approved by LLMs.
Open source is dead, having had its code stolen for use by vibe-coding idiots.
Make no mistake, this is the end of an era.
[-]
- _blaise_ 3 hours ago
  Linus is the original vibe coder. He barks orders at cadre of human contributor agents and subsystem maintainer agents until the code looks the way he likes.
  [-]
  - ninjagoo 3 hours ago
    > Linus is the original vibe coder.
    LoL.
    Jesting aside, OpenHub lists Linus Torvalds as having made 46,338 commits. 45,178 for Linux, 1,118 for Git. His most recent commit was 17 days ago. [1]
    That is a far cry from a vibe-coder, no? :-)
    Bit unfair to call his leadership vibe-coding, methinks.
    [1] https://openhub.net/accounts/9897