Weirder in that it gets better at “photorealism” (textures, etc) but subjects might be nonsensical. Only teaching it how to avoid automated detection will not teach it to understand what scenes mean.
I believe most image generating models are too small (like only 4GB RAM). Deepseek R1 is 1.5TB ram (or half or quarter that at reduced precision) to get some semblance of “general knowledge”. So to get the “semantics” of an image right, not just the “syntax” you’d need bigger models and probably more data describing images. Of course, do we really want that?
Weirder? Interesting, like how for example?
Weirder in that it gets better at “photorealism” (textures, etc) but subjects might be nonsensical. Only teaching it how to avoid automated detection will not teach it to understand what scenes mean.
I believe most image generating models are too small (like only 4GB RAM). Deepseek R1 is 1.5TB ram (or half or quarter that at reduced precision) to get some semblance of “general knowledge”. So to get the “semantics” of an image right, not just the “syntax” you’d need bigger models and probably more data describing images. Of course, do we really want that?