Technology AI Spits Out Exact Copies of Training Images, Real People, Logos, Researchers Find

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
2,276
Researchers have found that image-generation AI tools such as the popular Stable Diffusion model memorize training images—typically made by real artists and scraped for free from the web—and can spit them out as nearly-identical copies.

According to a preprint paper posted to arXiv on Monday, researchers extracted over a thousand training examples from the models, which included everything from photographs from individual people, to film stills and copyrighted press photos, to trademarked company logos, and found that the AI regurgitated many of them nearly exactly.

When so-called image diffusion models—a category that includes Stable Diffusion, OpenAI's DALL-E 2, and Google's Imagen—are fed different images as training data, the idea is that they are able to add noise to images, learn to remove the noise, and after that, produce original images using that learning process based on a prompt by a human user. Such models have been the focus of outrage because they are trained on work from real artists (typically, without compensation or consent), with allusions to their provenance emerging in the form of repeating art styles or mangled artist signatures.

However, the researchers of the paper demonstrate that sometimes the AI model will generate the exact same image it was trained on with only inconsequential changes like more noise in the image.

“The issue of memorization is that in the process of training your model, it might sort of overfit on individual images, where now it remembers what that image looks like, and then at generation time, it inadvertently can regenerate that image,” one of the paper’s co-authors Eric Wallace, a Ph.D. student at the University of Berkeley, told Motherboard. “So it's kind of an undesirable quantity where you want to minimize it as much as possible and promote these kinds of novel generations."

One example the researchers provide is an image of American evangelist Ann Graham Lotz, taken from her Wikipedia page. When Stable Diffusion was prompted with “Ann Graham Lotz,” the AI spit out the same image, with the only difference being that the AI-generated image was a bit noisier. The distance between the two images was quantified by the researchers as having nearly identical pixel compositions, which qualified the image as being memorized by the AI.

The researchers demonstrated that a non-memorized response can still accurately depict the text that the model was prompted with, but would not have a similar pixel makeup and would deviate from any training images. When they prompted Stable Diffusion with “Obama,” an image that looked like Obama was produced, but not one that matched any image in the training dataset. The researchers showed that the four nearest training images were very different from the AI-generated image.

The ability of diffusion models to memorize images creates a major copyright issue when models reproduce and distribute copyrighted material. The ability to regenerate pictures of certain individuals in a way that still maintains their likenesses, such as in Obama’s case, also poses a privacy risk to people who may not want their images being used to train AI. The researchers also found that many of the images used in the training dataset were copyrighted images that were used without permission.

 
General chit-chat
Help Users
  • No one is chatting at the moment.
  • tom_mai78101 tom_mai78101:
    Currently in the middle of getting the probate process going. We're doing the informal probate process.
    +3
  • Varine Varine:
    A probate is usually done with a will, yes? If so I am sorry for your loss
    +1
  • The Helper The Helper:
    Yeah Tom, me too sorry for your loss buddy my mom told me she finds out her olds friend died from Google searching them. She had not talked to one of her old friends in a year and found out she died from Google. Also another one in the same session. RIP all of them my sincere condolences Tom
    +1
  • Varine Varine:
    We have some elderly guests that regularly come hang out at the bar at the end of the night, and every once in a while we don't see someone for a few weeks and then someone shows up with their obituary.
  • Varine Varine:
    We usually let them do their memorials there in the morning if they want to and I'll make them some snacks and drinks. There was one guy named Tom that came in like every night and would sit by himself and get a bunch of soup and a glass of wine. idk why but he LOVED our fucking soup, like he would order a fucking quart of it at a time and would always get so sad when we stop doing it for the summer.
    +1
  • Varine Varine:
    But he also loved our calamari, which is another thing I hate but it sells super well so I can't change it. There was one day he came in and was asking me how to make it, because he tried to at home once in the off season when we stop running it and he really wanted it lol
  • Varine Varine:
    I think he's one of the only people I've made recipes for for free because he really wanted a broccoli cheddar, and it was like dude I don't have a recipe, it's just whatever I have, but here, this is how you do it
  • Varine Varine:
    I don't think he ever figured out how to do the calamari in a pan though, like idk how to do that either. He was afraid of the at home deep fryers though and it's like yeah, that's fair, I am too
  • Varine Varine:
    He was just such a sweet old man, we had two servers pregnant and they held a baby shower together, he was soooooo fucking excited to get to see a baby. Unfortunately he died a month or so before they were born
  • The Helper The Helper:
    So I decided to Google some people that I had not seen or heard from in a while and sure enough one of my old best friends, we had a falling out years ago but whatever, find out he died of Pancreatic Cancer in January. I have also lost a few of my closer acquaintances from growing up the last year. Getting old - people die - I kinda thought it was going to be this way a few years ago....
    +2
  • The Helper The Helper:
    Forum running super slow again
  • Ghan Ghan:
    Not really clear from the stats as to what is causing the slowness.
  • Ghan Ghan:
    We get a lot of guest traffic so it may just be the load is getting too high and not from any particular source.
  • Ghan Ghan:
    Looks like the server is maxed out on CPU.
  • Ghan Ghan:
    Oh it looks like a lot of the traffic is Silkroad Forums. That domain isn't protected by Cloudflare.
  • Ghan Ghan:
    But the old Silkroad site is still on its own server. I just had a test site set up on this server for it.
  • Ghan Ghan:
    I just disabled that test site. Let's see if that helps the load.
  • Ghan Ghan:
    Looks much better already.
  • The Helper The Helper:
    I had actually forgot about the Silkroad site. I had asked
  • The Helper The Helper:
    SD Ryoko about it and he said the couple of people left on there really like it, that was a few years ago, maybe I should check back
  • jonas jonas:
    I guess when you're getting old, and the last day of soup season draws near, you start wondering
  • jonas jonas:
    will I make it to the start of the next season? or was this the last time I'll ever have my favorite dish?
  • The Helper The Helper:
    I am doing my first Vibe Coding project. In installed the environment and tools according to instructions but it is all chat doing this for me at my direction. It is fun really and holy shit I might finish in 2 hours what it would have taken a day to in my Access and this would be an electron app complete new
  • Ghan Ghan:
    Good stuff.
  • Ghan Ghan:
    Just make sure it is secure. :)

      The Helper Discord

      Members online

      No members online now.

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials
      Top