Number 2 is not a tractable problem because AI simply doesn't work like the kind of stochastic auto-collage people have in mind. Once an input goes into that black box, it dissolves into the mathematics of pattern recognition. I've not yet seen an image generator that can so much as generate a satisfactory Mona-Lisa, the most frequently …
Number 2 is not a tractable problem because AI simply doesn't work like the kind of stochastic auto-collage people have in mind. Once an input goes into that black box, it dissolves into the mathematics of pattern recognition. I've not yet seen an image generator that can so much as generate a satisfactory Mona-Lisa, the most frequently reproduced painting in history.
The AI doesn't know what a hand is - it uses fancy math to suggest what pattern differentiates abstract noise from every image it has seen with a label involving a hand according to some mathematical approximation of what a word is. That's why AI is terrible at drawing hands. The mathematical "gist" of billions of examples labeled according to no coherent system is not what us who have and use hands would recognize to be a hand.
That's why you can make up artist names and get works for those prompts. The works of "Minamato no Ichigo von Bismark III of Transylvania" are to be found in the generator because, mathematically, that nonsense name necessarily has a relationship to real names. The same thing happens if you put in the name of an artist who does exist but wasn't in the training set, and give results for that name that look nothing like the real artist. The generator will give you results for a prompt like "SFNS$^F&NSIFsj&is[jrf9830343" for that matter.
Given that the machines produce false positives, and false negatives and can't distinguish them from works identifiable as resembling the output in the style of an actual artist, there's no metric for the level of inspiration involved. If you tried to define one, you'd end up with the same labeling problems you get everywhere in AI - where the machine mistakes a banana for cat.
There's also the issue that there are billions of images in, for instance, the anime style. Ai relies a lot on the generic commonness between inputs like that. If you could pry anything useful out of the black box - what do you do with a metric suggesting that thousands of people draw basically the same way and aren't special little snowflakes? What do you do when prompting for an artist clearly gives you work in the style of fans of that artist, suggesting most of it's "inspiration" was itself plagiarized? From what I've seen, AI isn't even that good at mimicking art styles, because it reduces everything to a generic mass - Gustav Klimt and Egon Schiele, Claude Monet and Pierre-Auguste Renoir, Edgar Degas and Henri de Toulouse-Lautrec, Piet Mondrian and Kazimir Malevich, Vincent van Gogh and Paul Gauguin, Andy Warhol and Roy Lichtenstein, Frida Kahlo and Diego Rivera: as far as the AI is concerned, these kinds of pairs might as well be a single artist who works under different names.
If that's the case, then properly attributing where art is taken from IS going to be a problem.
I do suspect that this data is findable in the models though. I remember reading during the initial hype-phase that AI-techs had to tweak the models so as not to create perfect copies of art, but merely approximations. Meaning that connecting creators and works was doable once, before it got gaussian blurred into oblivion.
Of course, the current fluid state of the Internet being what it is, I can no longer find any reference to this claim.
But I believe the right amount of pressure applied to the AI companies in the form of regulation will eventually get us there again.
In the old days of StyleGAN type models and so on, "over-fitting" was indeed a problem. If you trained a big model on seven images, it would pretty much spit out those same seven images. The models themselves were often larger than the dataset, so you ended up re-encoding those images much the same way that a bitmap can be converted into an imperfect, compressed JPEG. With much larger datasets, this was generally not as much of a problem because the "latent space" ended up full of interpolations that mixed the thousands or even millions of those images, together, all at once, in various ratios. You'd give the it lots of dogs, lots of cats, and get mostly dog-cats. You often, but not always could find a facsimile of an original picture by probing the latent space for generated images that mathematically approximated it - but, oddly, you could often give it similar images it had never seen before and give a pretty good approximation: I gave my own photo to a StyleGAN trained on photographs of faces and, sure enough, it "found" a facsimile of my face with some slight alterations to my hair and ears, and so on. It's a bit like how law enforcement can now DNA fingerprint criminals based on DNA a cousin or the like uploaded to one of those ancestry websites (see the case of Joseph James DeAngelo Jr. for instance).
Diffusion-based and transformer based models are quite a bit different. One works on the principal of entirely "hallucinating" results out of noise, and the other's based on a similar principal to a Markov Chain text-generator any programmer could have written you 50 years ago where you randomly predict the next word based on previous words: it could produce an input, but in the same way that a million monkeys on a million typewriters could eventually write Shakespeare.
I think the training set is much more fertile ground for holding tech accountable, especially because that's information the companies need to keep well-maintained for the sake of their models. But the fact that a computer can recognize my face, or my DNA, or my voice, without me even being in the training set is much, much creepier. Even without being in the training set, there's a good chance the models could act as a replacement for you anyway if enough people opt in, or make their work open source, or enough of the big evil companies who own most artist's work decide to sell it for training purposes.
Number 2 is not a tractable problem because AI simply doesn't work like the kind of stochastic auto-collage people have in mind. Once an input goes into that black box, it dissolves into the mathematics of pattern recognition. I've not yet seen an image generator that can so much as generate a satisfactory Mona-Lisa, the most frequently reproduced painting in history.
The AI doesn't know what a hand is - it uses fancy math to suggest what pattern differentiates abstract noise from every image it has seen with a label involving a hand according to some mathematical approximation of what a word is. That's why AI is terrible at drawing hands. The mathematical "gist" of billions of examples labeled according to no coherent system is not what us who have and use hands would recognize to be a hand.
That's why you can make up artist names and get works for those prompts. The works of "Minamato no Ichigo von Bismark III of Transylvania" are to be found in the generator because, mathematically, that nonsense name necessarily has a relationship to real names. The same thing happens if you put in the name of an artist who does exist but wasn't in the training set, and give results for that name that look nothing like the real artist. The generator will give you results for a prompt like "SFNS$^F&NSIFsj&is[jrf9830343" for that matter.
Given that the machines produce false positives, and false negatives and can't distinguish them from works identifiable as resembling the output in the style of an actual artist, there's no metric for the level of inspiration involved. If you tried to define one, you'd end up with the same labeling problems you get everywhere in AI - where the machine mistakes a banana for cat.
There's also the issue that there are billions of images in, for instance, the anime style. Ai relies a lot on the generic commonness between inputs like that. If you could pry anything useful out of the black box - what do you do with a metric suggesting that thousands of people draw basically the same way and aren't special little snowflakes? What do you do when prompting for an artist clearly gives you work in the style of fans of that artist, suggesting most of it's "inspiration" was itself plagiarized? From what I've seen, AI isn't even that good at mimicking art styles, because it reduces everything to a generic mass - Gustav Klimt and Egon Schiele, Claude Monet and Pierre-Auguste Renoir, Edgar Degas and Henri de Toulouse-Lautrec, Piet Mondrian and Kazimir Malevich, Vincent van Gogh and Paul Gauguin, Andy Warhol and Roy Lichtenstein, Frida Kahlo and Diego Rivera: as far as the AI is concerned, these kinds of pairs might as well be a single artist who works under different names.
Astute assessment.
If that's the case, then properly attributing where art is taken from IS going to be a problem.
I do suspect that this data is findable in the models though. I remember reading during the initial hype-phase that AI-techs had to tweak the models so as not to create perfect copies of art, but merely approximations. Meaning that connecting creators and works was doable once, before it got gaussian blurred into oblivion.
Of course, the current fluid state of the Internet being what it is, I can no longer find any reference to this claim.
But I believe the right amount of pressure applied to the AI companies in the form of regulation will eventually get us there again.
In the old days of StyleGAN type models and so on, "over-fitting" was indeed a problem. If you trained a big model on seven images, it would pretty much spit out those same seven images. The models themselves were often larger than the dataset, so you ended up re-encoding those images much the same way that a bitmap can be converted into an imperfect, compressed JPEG. With much larger datasets, this was generally not as much of a problem because the "latent space" ended up full of interpolations that mixed the thousands or even millions of those images, together, all at once, in various ratios. You'd give the it lots of dogs, lots of cats, and get mostly dog-cats. You often, but not always could find a facsimile of an original picture by probing the latent space for generated images that mathematically approximated it - but, oddly, you could often give it similar images it had never seen before and give a pretty good approximation: I gave my own photo to a StyleGAN trained on photographs of faces and, sure enough, it "found" a facsimile of my face with some slight alterations to my hair and ears, and so on. It's a bit like how law enforcement can now DNA fingerprint criminals based on DNA a cousin or the like uploaded to one of those ancestry websites (see the case of Joseph James DeAngelo Jr. for instance).
Diffusion-based and transformer based models are quite a bit different. One works on the principal of entirely "hallucinating" results out of noise, and the other's based on a similar principal to a Markov Chain text-generator any programmer could have written you 50 years ago where you randomly predict the next word based on previous words: it could produce an input, but in the same way that a million monkeys on a million typewriters could eventually write Shakespeare.
I think the training set is much more fertile ground for holding tech accountable, especially because that's information the companies need to keep well-maintained for the sake of their models. But the fact that a computer can recognize my face, or my DNA, or my voice, without me even being in the training set is much, much creepier. Even without being in the training set, there's a good chance the models could act as a replacement for you anyway if enough people opt in, or make their work open source, or enough of the big evil companies who own most artist's work decide to sell it for training purposes.
Compensating artists for using their works in training sets is definitely a good place to start.