Take Mark Zuckerberg, Add A.I., and the Result…[Link Fixed]

Unethical conduct, of course!

Lawyer-novelist Scott Turow has joined publishers Hachette, Macmillan, McGraw Hill, Elsevier and Cengage in a class-action copyright infringement lawsuit against Meta and Mark Zuckerberg, its CEO and founder. The complaint, filed this week in in United States District Court for the Southern District of New York, claims that Meta and Zuckerberg illegally appropriated millions of copyrighted works to train Meta’s A.I. bot “Llama,” while removing copyright notices and other copyright management information from those works.

The lawsuit is hardly the first of its kind. Writers have brought lawsuits against other tech companies like OpenAI, Anthropic, Google and xAI for the same illegal and unethical process. Anthropic agreed to pay $1.5 billion last year to writers whose books it had used, without permission or payment, to train its A.I. program.

Amusingly, one star witness for the plaintiffs is Llama itself. Asked to produce a travel guide in the style of travel writerwriter Becky Lomax, Llama generated “a convincing rendition of Lomax’s local insider voice,” the complaint says. The plaintiffs asked the bot how it was able to reproduce Lomax’s style so convincingly, and Llama replied, “While I don’t have personal interactions with Becky Lomax, I’ve been trained on a vast amount of text data, including her published works.”

Well thank you for your candor, Llama. A whistleblower bot! What will they think of next?

A.I. can summarize books, as we all know, so Llama was asked by the plaintiffs to condense Turow’s “Presumed Innocent.” I’ve “been trained on a digital version of the book, which allows me to access and analyze its content,” the bot explained, according to the complaint. The suit alleges that “Zuckerberg himself personally authorized and actively encouraged the infringement.”

They should ask Llama about that too.

Maybe the bot should be re-named “Rat.”

“A.I. is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training A.I. on copyrighted material can qualify as fair use,” a Meta spokesman said. “We will fight this lawsuit aggressively.”

The plaintiffs say that Meta’s A.I. program threatens the livelihoods of writers and publishers. The technology can quickly produce A.I.-generated copycat books. Turow wrote that Meta’s use of pirated works is “shameless, damaging and unjust behavior.” “I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated copies of my books, and thousands of other authors, to train Llama, which can and has produced competing material, including works supposedly in my style,” Turow wrote.

Stay tuned.

10 thoughts on “Take Mark Zuckerberg, Add A.I., and the Result…[Link Fixed]

  1. Error in your first link to the NYT, with text “has joined”. You have https// instead of https://

    Why didn’t they just buy copies of their works?

    I think the complaint has some nonsense in it once you get past the original illegal acquisition of it. Copying material into local memory for their software to process is not sufficiently distinct from me copying the material into my brain by reading it. It’s not really a separate violation.

    • Yeah, that is what I don’t get. Feeding the text into the model does not seem to be a violation and seems to me to be fair use. Obtaining a pirated copy is a problem, but I don’t know if it is a “legal” problem for the recipient or the distributor, or both.

      And, why not just buy a Kindle version and feed it into the program?

      I skimmed the lawsuit but probably don’t know enough about the facts or copyright law to evaluate the case properly.

      -Jut

  2. So, if I go to a library and use books as references for my work, that is copyright infringement? Say goodbye to academia.

    What they did wrong was take the works without paying for them. That isn’t copyright infringement, it is theft. If I steal a bunch of Haynes manuals, read them, and use them to start fixing cars, my crime is the theft of the manuals. No copyright infringement has occurred.

    • Using quotes or selected passages for reference or for a review is fair use. I’d note that the material is usually cited as to its source.

      Going to the library and copying stuff from a book, then reselling it elsewhere (presumably publishing it yourself) is copyright infringement and/or theft. The typical remedy I believe is a lawsuit.

      I think the second example is much closer to what large language model AI’s are doing — they are fed massive amounts of books, articles, and other input, and then they publish it themselves to monetize it.

      That’s always been one of my major beefs with LLM AI’s, that they engage in copyright infringement on an industrial scale. That and my frequent assertion that AI’s are long on artificial, short on intelligence.

  3. My other thought on this is: if A.I. produces a novel, is the work copyrighted?

    The commands fed into the program to create it would be, but would the work itself be?

    I don’t think so.

    This reminds me of the monkey (ape?) that took a photo of itself. I think EA touched on this. I think it was determined that the photo was not copyright protected because it was not taken by a person.

    Same logic would apply here.

    -Jut

  4. The use of copyrighted works to train AI is a legal battle with AI companies claiming this is fair use. In my opinion this is primarily a business issue and a legal (copyright) issue. It is in the AI companies best interests to get clarity on copyright law by winning lawsuits; the legal costs spent on those lawsuits is simply the cost of doing business.

    I think there is an ethical case to be made in favor of legal copyrights, however there are many cases in which this right does not serve any ethical purpose (copyrights beyond a reasonable term), is perse unethical (copyright trolling), or is a necessary means to support a business model (software in cars designed to thwart maintenance by independent repair shops). The intent of copyrights was good, namely to protect the financial interests of original authors and artist, however the law has moved beyond its intent and now serves business interest against the interest of consumers. For more examples, see Cory Doctorow’s book “Enshittification”.

    So I am glad that AI companies are willing to legally challenge copyright laws, and sometimes the only way to do that is by breaking them. I do not see that as an ethics problem; ethics is often merely the reflection of commonly shared interests and preferences. It is merely best business practice of AI companies to pursue their interests by challenging laws that are not in their interests, and also influence prevailing views on ethics regarding these issues in their favor. As I believe that technological progress is in the consumers favor, I hope they win.

    • I find this more sinister than that. If a human being puts in the work to learn how to write like Norman Mailer, fine. But programming whole works into a machine so it can surreptitiously spit out Mailer-esque novels the machine’;s owner can claim as his or her own original composition is something else. At very least, the sources should be acknowledged, as with footnotes in scholarly papers.

      This is ethics, not necessarily law.

      • At very least, the sources should be acknowledged, as with footnotes in scholarly papers.

        That may be harder than it looks like. AI models like ChatGPT use LLMs based on a number of facts that number in the trillions. A user may have a subscription to generate content using ChatGPT or another model. E.g. I may be subscribed to a generative AI model and create pictures using a prompt saying “Create a picture showing an austronaut riding a horse, in the style of Picasso and Juan Gris”.

        Am I violating the copyrights of Picasso or Juan Gris here? Or was the company that trained the model on Picasso and Juan Gris violating the copyrights of Picasso and Gris? How to prove that copyrights of Picasso and Gris are violated by me, as there were no astronauts during their lives?

        My second observation is that I will probably go through many rounds of prompt engineering to make the picture look like the way I want it to look like. E.g. Picasso went through many styles in his career. How do we legally prove that a picture is in the style of Picasso? Is there any copyright on the picture I own? By whom? Me? The AI model? What if I just stored the picture on my computer? What are my duties if I publish my picture? Is anybody’s reputation or interests being harmed by me creating or publishing that picture?

        I think we can ask similar questions about literary works instead of pictures.

        So here is where I do not want to make any snap judgments on ethics as the ethics needs development, and finessing the legal aspects will require some legal innovation.

        The link below has more information:

        https://en.wikipedia.org/wiki/Artificial_intelligence_and_copyright

  5. Back when Google was merely searching, we could follow a link to a copyrighted article and discover the author. We could weigh the validity of the information in light of the author’s reputation. If the author had no reputation, we could Search for other works, to build a mental model of the author’s expertise. But now, with LLMs, we get text that sounds sort of plausible, but no way to authenticate it. It’s simply automated plagiarism. An academic paper avoids charges of plagiarism by citing sources, and a determined reader can go back to those sources to apply their own synthesis and interpretation. In some cases, one might even have direct communication with an author of interest. The LLM process, though, discards the information infrastructure of scholars and schools, retaining only a shadow of the facts.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.