Text to Image AI Generation

GM3 · Dec 14, 2023

-Matt- said:
It is quite easy to demonstrate the current lack of comprehension.

Using this promt:

Code:

Create an image showing the line drawing on a page of paper that would result from the following sequence of actions: 1) Press the pencil tip down onto page. 2) Move the tip of the pencil right by 5cm. 3) Move the pencil tip down the page by 5cm. 4) Move the pencil tip left by 5cm. 5) Move the pencil tip up the page by 5cm. 6) Lift the pencil away from the page.

Microsoft Bing Image Creator (Dall.E 3) creates this:
View attachment 334221

Ok scratch my previous response... ChatGPT does a lot better:

Question: if you created a drawing using the following steps, what kind of result would you get? Here are the steps: 1) Press the pencil tip down onto page. 2) Move the tip of the pencil right by 5cm. 3) Move the pencil tip down the page by 5cm. 4) Move the pencil tip left by 5cm. 5) Move the pencil tip up the page by 5cm. 6) Lift the pencil away from the page.

answer:
If you follow the given steps, you would create a square with sides measuring 5cm each. The sequence of movements forms a square shape by moving the pencil horizontally to the right, vertically downward, horizontally to the left, and vertically upward. The starting point and ending point coincide, creating a closed shape, specifically a square, as a result of these movements.

Question:
Great! How did you figure out that answer? Are pen, paper and drawings concepts you understand?

Answer:
Yes, I understand the concepts of pen, paper, and drawings. While I don't have personal experiences or physical senses like a human, I've been trained on a diverse range of internet text, which includes information about various topics, including art and basic geometric concepts.
In your question, the steps described a series of movements with a pencil on a piece of paper, forming a square. I used my understanding of spatial relationships and geometric shapes to interpret the sequence of actions. The descriptions of movements—moving right, down, left, and up by 5cm—correspond to the creation of a square shape. This interpretation is based on the common understanding of how drawing instruments interact with a surface, such as paper.

Question:
how long before txt2img in open source programs such as stable diffusion give similar results? It appears like image generation AIs like stable diffusion aren't very smart and don't really understand even simple sentences.

Answer:
As of my last knowledge update in January 2022, the field of AI, including text-to-image synthesis and image generation, has been advancing rapidly. Open-source projects like Stable Diffusion are continuously evolving, and improvements are being made to address the limitations of earlier models.
However, the development and release of new models or updates to existing ones depend on the research community and the organizations working on these projects. The timeline for when a specific open-source program will achieve better performance in understanding and generating images from text can vary.
If there have been updates or new releases since my last update, I recommend checking the official sources, documentation, or community discussions associated with the particular open-source project for the latest information on improvements and capabilities. Additionally, ongoing research and advancements in the AI field contribute to the development of more sophisticated models over time.

So yeah, a bit like my previous comment inferred, if you ask a question within the bounds of the training/dataset, it can answer; and it appears like it understand certain concepts. But, unknown contexts, such as; evolution of software / AI, then it can't really answer

by your estimate, given Stable Diffusion improvements, how long would that be?

I don't have the ability to predict specific future developments or provide estimates for advancements in AI models like Stable Diffusion. The pace of progress in the field of artificial intelligence can be rapid, but predicting when specific improvements will occur is challenging.
Improvements in AI models depend on a variety of factors, including research breakthroughs, computational resources, and the priorities of the organizations and researchers involved. It's recommended to stay updated with the official channels of the project or research community associated with Stable Diffusion to get the latest information on any advancements or improvements to the model.
If there are specific research papers, conference proceedings, or official announcements related to Stable Diffusion, those would be valuable sources to monitor for updates on the model's capabilities.

So as long as you stay within its bounds, it does fine, but that line isn't very clear, and if you go over this line, you either get nothing or garbage... Exactly like in Stable Diffusion; terms it knows work well, terms it doesn't doesn't work; if not programmed/trained for it, zero.

GM3 · Dec 14, 2023

I'm sure SD, Dall-E, etc., will all get better over time, and then we'll get better prompt understanding. Tech is there apparently, all that would be missing is a merging of the two tech. Maybe ChatGPT AI could gain understanding of the Prompt & network of the text2img 'AI', and use its own capabilities to generate an image by creating its own super precise prompts, or different types of tools (regional prompt, open pose, etc.). But testing it a bit more, it doesn't really seem to be understanding the concepts of a line on a piece of paper....

let's start all over. Let's say you're drawing on a piece of paper and do the following:
- Press the pencil tip down onto the page.
- Move the tip of the pencil right by 5cm.
-Move the pencil tip down the page by 5cm.
- Move the pencil tip left by 5cm.
- Move the pencil tip up the page by 5cm.
-Lift the pencil away from the page.

And then you:
- erase the second and fourth line.
- draw a line between the 1st point of the 1st line and 1st point of the 2nd line

Answer
Got it, let's go through the steps again:

Press the pencil tip down onto the page.

Move the tip of the pencil right by 5cm.

Move the pencil tip down the page by 5cm.

Move the pencil tip left by 5cm.

Move the pencil tip up the page by 5cm.

Lift the pencil away from the page.

After these initial steps:

Erase the second line (the rightward movement).

Erase the fourth line (the leftward movement).

Draw a line between the first point of the first line and the first point of the second line.

With these modifications, you'd be erasing the horizontal movements and connecting the starting points of the original horizontal and vertical lines. The result would be a vertical line (from the initial downward movement) connected to a diagonal line (from the initial rightward movement). The final shape would resemble the letter "L" on the paper.

Not quite...

How can you erase a movement? I'm talking about the lines you drew on the piece of paper.

I apologize for the confusion in my responses. My earlier statements about "erasing" movements were not accurate; you cannot erase movements on a physical piece of paper. I appreciate your clarification.
Let's revise the description based on your instructions:

Press the pencil tip down onto the page.

Move the tip of the pencil right by 5cm.

Move the pencil tip down the page by 5cm.

Move the pencil tip left by 5cm.

Move the pencil tip up the page by 5cm.

Lift the pencil away from the page.

Now, considering your instructions:

Erase the second line (the rightward movement).

Draw a line between the first point of the first line and the first point of the second line.

With these adjustments, the result would be a shape similar to an inverted letter "L," formed by the vertical line and the diagonal line connecting the starting points of the original lines.

Maybe v4 would do better, 3.5 doesn't seems like it should be able to solve any complex problem given the above lol It doesn't seem to understand the concept of a line drawn on a piece of paper.

If your last answer was correct, if you just followed the first 6 steps and didn't erase any line, which line would be the first line?
ChatGPT

I appreciate your patience, and I apologize for any confusion in my responses. Let's clarify:

If you follow the first 6 steps without erasing any line:

Press the pencil tip down onto the page.
Move the tip of the pencil right by 5cm.
Move the pencil tip down the page by 5cm.
Move the pencil tip left by 5cm.
Move the pencil tip up the page by 5cm.
Lift the pencil away from the page.

In this case, the first line you drew would be the vertical line created in step 3 when you moved the pencil down the page by 5cm.

---------------------------------------------------------
User
So in step #2, you didn't draw any line?
ChatGPT

I appreciate your patience, and I apologize for the oversight in my responses. Let's revisit the steps:

Press the pencil tip down onto the page.
Move the tip of the pencil right by 5cm.
Move the pencil tip down the page by 5cm.
Move the pencil tip left by 5cm.
Move the pencil tip up the page by 5cm.
Lift the pencil away from the page.

I made an error in my previous responses. In step #2, you indeed draw a line when you move the pencil right by 5cm. So, the correct statement is:

In step #2, you draw a horizontal line by moving the pencil right by 5cm.

I appreciate your diligence in clarifying, and I apologize for any confusion caused by the previous inaccuracies.

-----------------------------------------
User
ok, so if you follow the first 6 steps, then add these steps:

7. Erase the second line.
8. Erase the fourth line.
9. Draw a line between the first point of the first line and the first point of the second line.

What kind of shape or geometric form would you get?
ChatGPT

Let's go through the revised steps:

Press the pencil tip down onto the page.
Move the tip of the pencil right by 5cm.
Move the pencil tip down the page by 5cm.
Move the pencil tip left by 5cm.
Move the pencil tip up the page by 5cm.
Lift the pencil away from the page.
Erase the second line (the rightward movement).
Erase the fourth line (the leftward movement).
Draw a line between the first point of the first line and the first point of the second line.

With these steps, you'd be left with a vertical line (from step 3), and a horizontal line connecting to the vertical line (from step 9). So, the resulting shape would be an "L" formed by a vertical line and a horizontal line connected at their starting points.

I give up... lol

[edit] Ok I didn't give up... I tried to teach it what is 1st and 2nd line and all of that, bit like here but it can't seem to learn anything... Again, 3.5. 4.0 is hopefully better, otherwise, I can't understand how it could any sort of complex problem.. (well that it wasn't trained or doesn't have an understanding of at least...)

that's not correct. You'd get a vertically flipped 'Z', the second and fourth line are the vertical lines of the square. So if they are erased, you have two horizontal parallel lines left: The top one is the first line, and the other remaining line is the 3rd line drawn initially, now the 2nd line in the drawing. Does that make sense? Or I am making a mistake?
ChatGPT

I appreciate your clarification, and you are correct. I apologize for any confusion caused by my previous responses. Let's reevaluate the steps:

Press the pencil tip down onto the page.
Move the tip of the pencil right by 5cm.
Move the pencil tip down the page by 5cm.
Move the pencil tip left by 5cm.
Move the pencil tip up the page by 5cm.
Lift the pencil away from the page.
Erase the second line (the rightward movement).
Erase the fourth line (the leftward movement).
Draw a line between the first point of the first line and the first point of the second line.

With these steps, as you correctly pointed out, the result is a vertically flipped 'Z' shape. The top line is the initial horizontal line, and the bottom line is the initial vertical line.

I appreciate your patience and diligence in clarifying this, and I apologize for any confusion caused by the earlier inaccuracies in my responses.

=========================
User
Ok, so given the above, if you take the last result; the flipped Z, and you:
- you erase the center line,
- draw a line between the first point of the first line and the first point of the second line,

what kind of shape do you get?
ChatGPT

Let's go through the steps:

Press the pencil tip down onto the page.
Move the tip of the pencil right by 5cm.
Move the pencil tip down the page by 5cm.
Move the pencil tip left by 5cm.
Move the pencil tip up the page by 5cm.
Lift the pencil away from the page.
Erase the second line (the rightward movement).
Erase the fourth line (the leftward movement).
Draw a line between the first point of the first line and the first point of the second line.
Erase the center line (the remaining horizontal line).

With these steps, you'd be left with two diagonal lines meeting at a common point, forming a "V" shape. The result is a shape resembling an arrow pointing upward.

antcollinet · Mar 11, 2024

GM3 said:
just a very large dataset and some randomness. No thinking, just an extremely large index of data, which it can somewhat reliably contextualize to an input.

There is an argument:

What you've just described is pretty much what the human brain is doing. Concepts such as "Thinking" and "Consciousness" are simply emergent properties of the much larger dataset, and interconnection complexity in the brain.

Or, think of it in another way. How far down the complexity of the animal kingdom would you have to get for the behaviour of the animal brain to be less capable than that of the AI - taking into account the different targeted objectives.

Or anothert way. Imagine the following.

A llama in a pirate ship drinking from a bottle of rum.

If you imagine this image, you are using all your lifetime experience of what llamas look like, what pirate ships look like, and what someone on such a ship might look like drinking from a rum bottle, and putting that knowledge together to create the image in your head. You are doing that without any effort or difficulty - your brain just does it - almost without thinking. If you read "a pig on a motorbike riding down the highway" - an image appears in your mind almost involuntarily.

How different is that from what bing image creator is doing - other than that you are conscious of the process, and it is not? (we think

)

anmpr1 · Mar 11, 2024

amirm said:
I think we are on a path for the technology to pass the Turing Test in many areas. Language comprehension is already there now with the latest developments.

Basic AI language translation is actually pretty good. Especially within similarly constructed languages. And is generally acceptable for logographic (non-alphabetic) script.

Where it falls down is mostly within the vernacular, especially slang or 'sayings' that everyone knows, but cannot be parsed literally.

For instance, on the Chinese blogs you often encounter the translated into English phrase, 'melon-eaters'. Roughly meaning on-lookers or gawkers reacting to a public scene. Below is from the free MS Bing (where do they come up with these names?) Translator, which again is a pretty useful tool, but not yet fully operational semantically. I'd say it's about at 80%.

In English we have phrases such as 'fish or cut bait', usually having nothing to do with fishing, but which could within a certain context-- the machine would have to be capable of distinguishing intention, which is not an easy thing to accomplish, for reasons.

Right now visual AI is on the level of cartoon images--at least at the consumer level. The big push with companies such as Nvidia, MS, et al. will soon make it almost impossible for the average person to reliably tell what is actual with what is virtual, for sure.

dasdoing · Mar 12, 2024

antcollinet said:
There is an argument:

What you've just described is pretty much what the human brain is doing. Concepts such as "Thinking" and "Consciousness" are simply emergent properties of the much larger dataset, and interconnection complexity in the brain.

Or, think of it in another way. How far down the complexity of the animal kingdom would you have to get for the behaviour of the animal brain to be less capable than that of the AI - taking into account the different targeted objectives.

Or anothert way. Imagine the following.

A llama in a pirate ship drinking from a bottle of rum.

If you imagine this image, you are using all your lifetime experience of what llamas look like, what pirate ships look like, and what someone on such a ship might look like drinking from a rum bottle, and putting that knowledge together to create the image in your head. You are doing that without any effort or difficulty - your brain just does it - almost without thinking. If you read "a pig on a motorbike riding down the highway" - an image appears in your mind almost involuntarily.

How different is that from what bing image creator is doing - other than that you are conscious of the process, and it is not? (we think )

View attachment 355552

it's an interesting argument, but I tell you why AI can't be considered intelligent yet: it isn't capable of really evaluating the info. If you feed it with flat earther propaganda, then all the info of the world sans hard evidence the earth is round it will tell you it is flat.

antcollinet · Mar 12, 2024

dasdoing said:
it's an interesting argument, but I tell you why AI can't be considered intelligent yet: it isn't capable of really evaluating the info. If you feed it with flat earther propaganda, then all the info of the world sans hard evidence the earth is round it will tell you it is flat.

I don't think that argument works well in the way you are thinking it will, because... so will people.

Evidenced by the fact that flat earthers exist, in significant numbers, and seem to believe in what they say.

So that is just an argument that people are not really that intelligent either. Our own "NI" will spew out garbage if that is what we train ourselves on by, for example, participating in internet echo chambers. We even have built in social biases that make it more likely we will do that. (Fitting in with the (our) crowd)

See also how many people now believe in ridiculous conspiracy theories.

Perhaps we are much closer to Chat GPT than we think

dasdoing · Mar 12, 2024

antcollinet said:
I don't think that argument works well in the way you are thinking it will, because... so will people.

Evidenced by the fact that flat earthers exist, in significant numbers, and seem to believe in what they say.

So that is just an argument that people are not really that intelligent either. Our own "NI" will spew out garbage if that is what we train ourselves on by, for example, participating in internet echo chambers. We even have built in social biases that make it more likely we will do that. (Fitting in with the (our) crowd)

See also how many people now believe in ridiculous conspiracy theories.

Perhaps we are much closer to Chat GPT than we think

I see your point, but it should be able to conclude that the earth is round by analyzing the (soft) evidence (as we as humanity did way before seeing it).....but it is not; you have to tell it.
You could answer that it is still at an early stage of its intelligence, but if it is then it means it should become more intelligent (and rapidly); does it?
I remember 20 years ago or so they would say that once we have AI, it would create new, better AI....stuff like that. so its evolution would be at inhuman crazy, and exponential speed. I don't see a database analyzing algorithm being capable of doing that.
EDIT2: And maybe that's the main point: It isn't really creating anything, it it says it does it is just copying. it can't take info a and info b and create info c. it doesn't even understand the concept of "creating c"

Blumlein 88 · Mar 12, 2024

If AI could advance at the rate of Nvidia's stock price it would really improve quickly. Humans would be obsolete by next year.

antcollinet · Mar 12, 2024

dasdoing said:
I see your point, but it should be able to conclude that the earth is round by analyzing the (soft) evidence (as we as humanity did way before seeing it).....but it is not; you have to tell it.
You could answer that it is still at an early stage of its intelligence, but if it is then it means it should become more intelligent (and rapidly); does it?
I remember 20 years ago or so they would say that once we have AI, it would create new, better AI....stuff like that. so its evolution would be at inhuman crazy, and exponential speed. I don't see a database analyzing algorithm being capable of doing that.
EDIT2: And maybe that's the main point: It isn't really creating anything, it it says it does it is just copying. it can't take info a and info b and create info c. it doesn't even understand the concept of "creating c"

I am not arguing that it is intelligent - although if you look at the definition of intelligence as applied to animals (an ability to learn and solve problems) you could conclude that it is. A significant difference though, is that it is so far only specifically intelligent rather than generally intelligent. IE Specific systems are only able to solve specific problems.

My argument is that the way it is working is not too dissimilar to the way the autonomous parts of our brain works.

And yes - it is creating. Take the image I posted above. I'm gonna bet there is no image in the training set of a Llama drinking rum on a pirate ship while wearing a pirate hat.

So it has taken the concepts
A - a llama
B - a pirate ship
C - pirate attire
D - a bottle of rum
E - the way a person drinks from a bottle.

And put those together into Z - the posted image. If you want to put it into terms that fit the definition of intelligence - it has solved the problem of creating an image fitting the text prompt.

Then add in all the odd detail, like the various llama like but not quite Llama animals dotted around - and at different sizes. It looks like an image from a fever dream. Is Dall-E dreaming? EDIT : Definitely should have put an emoji in here.....

Again - I am not arguing for intelligence here. But I think we sometimes blur the line between the word "Intelligence" and the word "Consciousness" and we vastly overestimate the way consciousness influences the way our brain actually works. Most of what it does is subconscious.

IOW the functionality of the brain - the way it processes and handles information - and then directs how the owner of the brain will respond to the information - evolved before consciousness did.

GM3 · Mar 12, 2024

AI can certainly fool people into thinking it's intelligent... But is a calculator intelligent because it can do math? What about a really complex calculator that can do really really complicated math problems?

What most people mean by intelligence, is more akin to this definition: "The ability to acquire, understand, and use knowledge.". It kinda infers sentience and autonomy. Current AI involves no real intelligence, it's little more than a pre-programmed calculator, just with a huge dataset and randomness to its answers. Is that similar to human intelligence; how our brain functions? Maybe... Would that mean that's similar to intelligence, or that it's actual intelligence?

There was the case of the human dog, if it looks like a dog and barks like a dog, certainly it's a dog? In nature today, we do see the entire spectrum of evolution of intelligence; from bacteria, to monkeys that can type on keyboards, but what's the line for 'true' intelligence and sentience? How do you tell if something is intelligent / conscious? There's the mirror test, but if you program a computer to react to a mirror as if it was sentient, does that make the 'AI' sentient?

If humanity and AI had more time, I'm sure that at one point a true AI would have emerged. But we're not just there yet... IMHO, current image and chat AI, no real intelligence; certainly the appearance of intelligence, as it can accomplish complex tasks, but not 'real' intelligence. Just like a fancy calculator.

Types of Artificial Intelligence | IBM

Early iterations of the AI applications we interact with most today were built on traditional machine learning models. These models rely on learning algorithms that are developed and maintained by data scientists.

www.ibm.com

antcollinet · Mar 12, 2024

GM3 said:
but not 'real' intelligence. Just like a fancy calculator.

I agree with the first part of that. But not the second.

A fancy calculator is still deterministic. As is a convintional algorithm. Same input, same output.

But put the same text prompt into the image generator, over and over again - you'll get a different image every time. Many of them will be dramatically different.

GM3 · Mar 13, 2024

antcollinet said:
I agree with the first part of that. But not the second.

A fancy calculator is still deterministic. As is a convintional algorithm. Same input, same output.

But put the same text prompt into the image generator, over and over again - you'll get a different image every time. Many of them will be dramatically different.

View attachment 355988

Not true! Take Stable Diffusion, you can set seed # to a specific number and get the same result every time. You have to use the same parameters/algorithm; can't go from Euler to DPM 2 Karras, or 20 vs 30 steps, denoising 0.7 vs 0.75, etc. otherwise yes images will differ. But you can send seed # and prompt + values to a friend and he'll generate 100% of the time the exact same image on a different computer if he's using the same model.

The others are likely similar, with seed # hidden. Or, maybe their noise generation, if they use similar techniques as SD, is always random. But anyhow, the process itself (algorithm) is fully deterministic! There's really no 'intelligence' which would make the process vary with say time. It's really like a calculator, you give it an input, it 'crunches' the data, through a very specific predeterministic process, and poops out the answer.

And practically, it can become a bit bothersome if the model was trained with few photos of a particular subject. Often time, you can generate say 100 Superman images, and will notice blatant similarities between them, because the model used maybe 50 images to train for the subject, and these 50 images are its 'base' for generating images; fewer reference images, the less variety you'll get. So the more specific thing you're trying to generate, the less variety you tend to get.

Ex; 'cat', probably thousands of cat images, so not very likely to get the 'same' cat everytime, but if you ask for, I dunno, Leo Buttkins, 5th year college teacher at Blargh school, well ok, it won't recognize the name, but if it did and there was only 5 pictures of the guy in the dataset, your 100 generated images of the guy would all each look like 1 or a mix maybe of his 5 images.. Really depends on the subject & model.

And if you ask very complex things that are not built-in; not part of its predetermined answer, it'll just result in absolute garbage. Like I said earlier, these AI often recognize some words and not others, so at least with SD, it can be a challenge to get what you want to generate... One day, there will be 'real' AI working on image generation, likely like a mix of chat AI & image AI, but anyway, not there yet! SD at least is very 'calculator'-like in its 'intelligence'...

Pretty darn sure it's the same for chatbots, some randomness to answer, but it's just that; randomness built-in so that you get different answers. There's really not like one 'mind' that decides to give an answer one day and another the next. Take away the random number generator, same input would result in the same answer all the time, exactly like a calculator!

Hiten · Mar 13, 2024

Why AI misjudged the proportional size of llama with respect to ship ?

JaMaSt · Mar 13, 2024

GM3 said:
Pretty darn sure it's the same for chatbots, some randomness to answer, but it's just that; randomness built-in so that you get different answers.

Yes. There are HITL (Humans-In-The-Loop) behind all Chatbots. No weighting is taking place not set by humans.

What is ChatGPT Doing?

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far, where by “reasonable” we mean “what one might expect someone to write after seeing what people have written on billions of webpages, etc.”

So let’s say we’ve got the text “The best thing about AI is its ability to”. Imagine scanning billions of pages of human-written text (say on the web and in digitized books) and finding all instances of this text—then seeing what word comes next what fraction of the time. ChatGPT effectively does something like this, except that (as I’ll explain) it doesn’t look at literal text; it looks for things that in a certain sense “match in meaning”. But the end result is that it produces a ranked list of words that might follow, together with “probabilities”:

And the remarkable thing is that when ChatGPT does something like write an essay what it’s essentially doing is just asking over and over again “given the text so far, what should the next word be?”—and each time adding a word. (More precisely, as I’ll explain, it’s adding a “token”, which could be just a part of a word, which is why it can sometimes “make up new words”.)

But, OK, at each step it gets a list of words with probabilities. But which one should it actually pick to add to the essay (or whatever) that it’s writing? One might think it should be the “highest-ranked” word (i.e. the one to which the highest “probability” was assigned). But this is where a bit of voodoo begins to creep in. Because for some reason—that maybe one day we’ll have a scientific-style understanding of—if we always pick the highest-ranked word, we’ll typically get a very “flat” essay, that never seems to “show any creativity” (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a “more interesting” essay. [Emphasis on "we" - meaning humans determine the weighting]

The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. And, in keeping with the idea of voodoo, there’s a particular so-called “temperature” parameter that determines how often lower-ranked words will be used, and for essay generation, it turns out that a “temperature” of 0.8 seems best. (It’s worth emphasizing that there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice. And for example the concept of “temperature” is there because exponential distributions familiar from statistical physics happen to be being used, but there’s no “physical” connection—at least so far as we know.)

antcollinet · Mar 13, 2024

Hiten said:
Why AI misjudged the proportional size of llama with respect to ship ?

Who knows.

GM3 said:
Not true! Take Stable Diffusion, you can set seed # to a specific number and get the same result every time.

Was not aware of that - thanks.

somebodyelse · Mar 13, 2024

Hiten said:
Why AI misjudged the proportional size of llama with respect to ship ?

Why pick that one out? It doesn't understand the relationship between physical things in general, and there are a lot more examples in there. Whatever it is on the end of the knotted rope floating in space. The ship's wheel at an odd angle, its non-matching spindles, and with another rope that just disappears into it somehow. The rigging that doesn't match when passing behind the flag and hat. The last spindle on the rail when compared to the rest of them. Liquid flow at the bottle neck.

To be fair, human artists often make a few similar mistakes when not understanding something mechanical, or when patching things together in photoshop.

GM3 · Mar 13, 2024

somebodyelse said:
Why pick that one out? It doesn't understand the relationship between physical things in general, and there are a lot more examples in there. Whatever it is on the end of the knotted rope floating in space. The ship's wheel at an odd angle, its non-matching spindles, and with another rope that just disappears into it somehow. The rigging that doesn't match when passing behind the flag and hat. The last spindle on the rail when compared to the rest of them. Liquid flow at the bottle neck.

To be fair, human artists often make a few similar mistakes when not understanding something mechanical, or when patching things together in photoshop.

Bicycle anyone? LOL https://www.mensjournal.com/adventure/people-asked-to-draw-bicycles-from-memory-fail-miserably

According to Gimini, the way many of the people drew bicycles correlated to whether the person was male or female.

"Some diversities are gender driven," wrote Gimini. "Nearly 90 percent of drawings in which the chain is attached to the front wheel (or both to the front and the rear) were made by females. On the other hand, while men generally tend to place the chain correctly, they are more keen to over-complicate the frame when they realize they are not drawing it correctly."

(more human drawn sketches here)

"A side shot of a man on a bicycle. The man is wearing shorts and a t-shirt, and has a bicycle helmet, and bicycle shoes. The man is sweating. "

(Stable Diffusion; SD 1.0)

The AI is very impressive though, bottom row, center image. Actually managed to accurately portray a cyclist on his phone!

But it seems that on average, AI seems better at drawing the general shape of bicycles than human. But it really doesn't 'understand' the concept of a bicycle though.... But neither do many people too. But also, it doesn't spatially have any understanding of anything. It has no comprehension of human anatomy, how a person should sit on a bike, etc...

GM3 · Mar 13, 2024

At one point, AI should understand concepts; faces, bodies, etc., and drawing would be more like drawing a basic body (arm, legs, etc.), then assigning specific shape (thin, muscular, fat, etc.), then a face, etc., but right now, there's really like zero intelligence in drawing images... It's just de-noising noise using (the result of) a very large dataset..

SSD1B (SDXL 'successor'; faster, newer) does evenr worse it looks like. But like I said earlier, results will always vary given dataset, the SD model likely had more or better bike representation, leading to better results. I'm sure someone could make a model dedicated for bicycles, like some do for anime, comics, fantasy, realistic people, etc., and it would give better results. But yeah, 100%, currently, for TXT2IMG, AI really isn't intelligent...

Still, it correctly set the context; in a street, trees, etc., which is very impressive, and would denote some intelligence, but... It`s just because the reference pictures of bicycles are in that setting, so... Sure... Kinda intelligent, but not really? *shrug*

But like I mentioned earlier, just for the bike and results, you can somewhat detect the underlying images; like the t-shirt in 1st picture set often gets overrode by a bike shirt, you can tell that bike pictures seemed to include more older people, the SSD1B has in contrast younger people, which for some reason are bearded in many cases, not 1 old dude in there. The shorts are also vastly different... It's the same algorithm for both .. Everything is really in the dataset; so some models are good for some things and bad for others; all about the data the model was trained on.

(all of the above for Stable Diffusion, others txt2img technologies could be different, not sure!)

somebodyelse · Mar 13, 2024

Oddly enough it was thinking of some of the 'bicycles' people have drawn that made me add the qualifier. I didn't know anyone had done any research on it as a phenomenon though - thanks for pointing it out. AI managed about as well as I would expect too.

GM3 · Mar 17, 2024

Other interesting aspects about AI is what I'd call pollution.

- First is humans polluting AI. You would assume that AI being an AI (no feelings, no bias, etc.,) would be impartial, but already, AIs are polluted by humans as humans have decided that AI cannot be allowed to be impartial, so they've built-in their own biases and 'isms' in AIs... To the point of making some AIs very skewed/biased/unusable/etc., this can be a real threat on a human level; many will and do rely on AI as a source of information, so that sort of built-in human bias can influence history and all https://www.aljazeera.com/news/2024/3/9/why-google-gemini-wont-show-you-white-people

- Second aspect, images on the internet. As there's more and more images generated by AI on the internet, and these images can be hard to distinguish from real images (drawn art, paintings, photography), etc., means that many of these AI generated images are likely to be used as references in newer models, as they scour the internet for images. Watermarks could have helped, but already today, AI images are everywhere, and basically indistinguishable from real images... So challenge to have clean reference images, and challenge for us humans to know whether an image is real or AI generated (fake?)...

- Third aspect; It's same for comments, messages, reviews, articles, etc., now that AI can easily generate such post, anything you read, even from multiple people, could be AI. So like images, already today, the internet is polluted with AI content, which can hardly be distinguished from human content. Ex; I could be an AI, how would you know? So any text, comment, tweet, like, dislike, etc., becomes unreliable. At least there's videos right? Wrong...

- Fourth aspect; With deep fakes, it's already the same thing for audio/video... It's crazy how just a few short years ago a video could be said to be reliable evidence, but even today, a video of doing something or saying something can't even be said to be reliable, given deep fakes.

What a time to be alive! (sarcasm) AI is actually quite a bit like fossil fuels, computers, the internet, cell phones, etc.; in the sense that it might seem like a small innovation, but its repercussion will be huge; far bigger effect than most people can imagine, and in ways few people also can imagine... Whether will have more negative effects than benefits, only the future will tell... But personally, I fear it will be rather disastrous for humanity, see Orwell's 1984... Just another way we are f***'d as some would say...

Text to Image AI Generation

Active Member

Active Member

Master Contributor

Major Contributor

Major Contributor

Master Contributor

Major Contributor

Grand Contributor

Master Contributor

Active Member

Master Contributor

Active Member

Senior Member

Senior Member

Master Contributor

Major Contributor

Active Member

Active Member

Major Contributor

Active Member

Similar threads