• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Text to Image AI Generation

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,442
Likes
18,473
Location
Netherlands
Compared to something like this, all the engines failed miserably.
Funny how 2 years ago, nothing could do this, and now we’re complaining about how the AI misrepresents a 35mm f1.4 lens ;)

Although I disagree with the assessment: the “suit” looks like a robot. No human could fit in that. And it’s clearly also AI generated, just look at the hands.
 
Last edited:

thewas

Master Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
6,909
Likes
16,986
You can try it also on copilot.microsoft.com

This is what you get when you ask for a portrait of an audiophile listening to tube amplifiers :) Ha!

View attachment 327736
The guy looks quite like MattHooper's profile pic, seems they scan ASR. :p
 

GM3

Active Member
Joined
Nov 26, 2022
Messages
151
Likes
178
Compared to something like this, all the engines failed miserably.

What do you think?

Short answer: Using txt2img generation isn't that simple; you have to become proficient; and then it takes some work to experiment and learn the basics.

Long answer: I've been experimenting quite a bit with Stable-Diffusion, which you can run yourself if you have a fairly decent gaming video card (~8GB). https://github.com/AUTOMATIC1111/stable-diffusion-webui

Base models are basically SD1.5 (~512x512) images and SDXL, (1024x1024). But people have been releasing their own improved models, which will give specific or more targeted images (styles), you can find plenty here; https://civitai.com/models These start off with existing SD1.5/SDXL models, and adds training with new particular topics/images. SD1.5 models likely more useful for most, unless you have truly top of the line video card, as SDXL image gen is quite slower.

So, you've got a SD installed, you've even downloaded model. The way you'd go generating an image would be entering a prompt, and generating 2-3 dozen images. The way it works, the AI doesn't really understand sentences. It picks up on words or word pairs, and uses its neural network to 'unblur' noise (given a seed #) into matching elements & text description of images... So really, there's no real 'intelligence' or any sort of reasonable sentence recognition.

So a prompt like: "Create a portrait of an astronaut in the year 2230, neon lit background, 35mm f1.4", could be split into something like:

Portrait, astronaut, year 2230, neon_background, 35mm, f1.4.

1700437604884.png


And again, some terms might be redundant; ex; 35mm & portrait. And since the AI isn't really intelligent, it will pick up on some words and not others, f1.4 is well understood by photographers, maybe not so much by AI... Using something like bokeh, or blurry background might work better.

Also, the AI favors some words; if you use 'portrait' it's got a strong sense of what a portrait should look like, so will give you a portrait. Same for astronaut, because SD basically use word & image associations, 'astronaut' might override that 'year 2230', because SD knows what an astronaut should look like (more or less), but 'year 2230' isn't something specific that the AI can hang on to. So very likely to get lost in the prompt; not many images tagged with "year 2230".

Certain models will have realistic characters, other generate cartoon or pixar characters, etc., so having an aesthetic sense of what you're trying to generate the pick the right model to use is critical.

You can also give more/less importance to certain words, so that SD generates an image closer to what you're looking for. Ex; if you want a picture of a girl with a future space suit, with neon light, and not particularly a portrait, removing 'portrait, using word pairs, weight, etc., would all help. So, you really have to tweak the prompt, and generate plenty of images, until you find what you're kinda looking for. Just adjusting the prompt a little bit, I can get these types of images, which likely are closer to what I'm guessing was expected... I added a few random details, and put weight 1.2 for futuristic and bokeh

woman, blond_hair, (green_eyes:0.9), long_hair, space_suit, (futuristic:1.2), portrait, neon_background, (bokeh:1.1)

1700437991624.png


So the more detailed the prompt, the more chances you have of getting the type of image you want... And you definitely have to tweak things, adding descriptive, removing, giving some words more weight, others less weight, etc., until you get somewhat what you're looking for. The above was using a model called 'Reliberate', but there's tons... Here's another one using MajesticMix (favors asian characters looks like), then EpicRealism, using the same prompt:

1700438511557.png


1700438756755.png


One thing you can notice, lots of green, looks like the AI is confusing the "green_eyes" or maybe "neon" with some green-ish hue image... So generating an image definitely involves a lot of trial and error, and tweaking. Not super simple or intuitive.

But that's just scratching the surface... There's also sampling method, open pose, img2img, in painting, etc.,. Using "Regional Prompter" to set different prompts for certain regions, etc. Takes about 1m to generate a batch of ten 758x512 images. Once you find a picture you like, or have worked out a prompt, then you can use "hi res fix", AI resizing, etc., or just generate larger images, most SD1.5 models will be fine with 768x1024 images. Another batch using EpicRealism model with different Sampler and higher resolution.

1700440495612.png
 
Last edited:
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,772
Likes
242,441
Location
Seattle Area
Thanks. You definitely got the model to produce what I would find acceptable for that type of query.

What originally got me was that Microsoft's apparent claim that the engine had produced what was asked and how great the results were. I thought as a bit of advertising, it was pretty poor. Maybe if they had asked it to produce an image of a mouse eating cereal out of a bowl, it would have worked better. :D
 

Somafunk

Major Contributor
Joined
Mar 1, 2021
Messages
1,442
Likes
3,405
Location
Scotland

anmpr1

Major Contributor
Forum Donor
Joined
Oct 11, 2018
Messages
3,741
Likes
6,461
Good write up on Open AI and Sam Altman in today’s Washington Post today, link below

Been following it in tech press. Looks like rank and file ready to bail and join MS. Pretty wild goings on. Not a big MS fan, but I might change. Hoping MS gets it together and I can ask AI Clippy how to move my taskbar to the left side of the screen, without working with wonky 3rd party apps. Clippy would know. Very optimistic about it.
 

GM3

Active Member
Joined
Nov 26, 2022
Messages
151
Likes
178
"Train Wreck Situation": Sam Altman Joins Microsoft As OpenAI Taps Ex-Twitch CEO

Like my avatar and sig, AI will also likely cause massive changes to the world... A WW3 China vs Taiwan & US video I was watching was mentioning how China & US (including military) are investing a ton in AI, and AI controlled military vehicles might be what tips the battle...

In 1994 Maccross Plus, futuristic anime, the main plot was centered around two pilots competing for a military contract for prototype spaceship fighter, one being traditional pilot and 2nd one controlled by brainwaves... Spoiler for 30 years old series... Neither firm got the contract, which was given to a firm who had developed a pilotless plane, which by far surpassed both designs centered on a human... But yeah that future is basically now.

Musk also has its own Grok text AI... So... The future is now, and AI is already revolutionizing industries... About a year ago, huge fear; how image generating AI would affect artists; comic book, cartoons, movies, etc., as the speed AI tech is advancing...... Artists days might be numbered... Same goes now for pogramming & AI. My last year 10 year estimate for us programmers/engineers might still somewhat be valid. But, the the more time passes, the more quickly the tech seems to advance also... It's a bit, exactly like climate. Those estimates seem to have been conservative, and it's degenerating faster than predicted... Ex; 2100 & 2050 predictions, have to be moved closer by decades, we've passed 1.5c, on verge of passing 2c, ... (well passed 2c, they just re-adjusted the baseline.... facepalm..)

So yeah... AI will f a lot of people, really not looking good for the average person & the middle class... Truly seems like every day that passes, the future is becoming dimmer and dimmer... The west is losing world dominance, losing petrodollar, we're bound to see collapse/inflation & and all sorts of nastiness due to overpopulation, climate, etc.. Yep... Not looking good...!
 
Last edited:
D

Deleted member 21219

Guest
^ I just love to read an optimistic post in the morning! Makes my day! :p ^

Jim
 
Last edited by a moderator:

GM3

Active Member
Joined
Nov 26, 2022
Messages
151
Likes
178
I just love to read an optimistic post in the morning! Makes my day! :p

Jim
lol Well look at it from the bright side, with advances in AI & internet legislation, you may not have to for much longer..! ;) :D

Oh yeah and to go back to txt2img, the people who use AI to generate image rarely just type a prompt and use the image it generates.. Certainly possible, but if you look at articles/vids, that's just the first step. The next step is typically to send to img2img and regenerate certain parts of the image, again using AI. Ex; fix hands with 8 fingers, character with 3 legs and 4 arms, change an element of the background, character, generate a new face, new mouth, change the hair, etc., and so the rest of the process still involves a lot of work. But, it's getting better and better. Just a matter of time until those issues (3 legs, mutant hands, etc.) are also truly solved.

So the txt2img is typically used to get a base image, which will then be transformed using other AI tools. (automatic1111 linked above provides many, also interface to install/update others, really really cool.) So a bit like photography; you think the pictures you see were just shot and posted, but rarely the case, lots of photoshop work to tweak the result images you see posted on forums and such.

Another issue which is getting solved is getting constant subjects; every generation is a different person, so unless you use model with strong flavor, train a model for a subject ( long and tedious), or use say the name of a celebrity or something, hard to generate multiple images for a particular subject. But again, people are working on the issue, and it'll eventually be solved.

Two Minute Papers is a really cool channel which follows similar tech; https://www.youtube.com/user/keeroyz

The pace is really incredible, already since Stable Diffusion release, the number of tools, available models, etc., has really exploded. But yeah, remains a tool which requires some proficiency!
 
Last edited:

anmpr1

Major Contributor
Forum Donor
Joined
Oct 11, 2018
Messages
3,741
Likes
6,461
When you think about it, MS has always been on the cutting edge of AI, before we even knew it was AI. Everyone loved Clippy. How could you not? I mean, he was ready to give you help even when you didn't think you needed it. Or wanted it.

But I think most people's first adventure with PC oriented AI was Bob. Sure. Bob's personality was pretty 2-D, but he had a cool looking dog for a pet. Or maybe the pet was Bob. I never quite figured that out.
 

GM3

Active Member
Joined
Nov 26, 2022
Messages
151
Likes
178
Let's not forget MS Tay https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot-after-it-turned-into-racist-nazi/

Or ChatGPT Dan, https://www.ghacks.net/2023/04/30/chatgpt-dan-prompt-unleash-the-real-chatgpt/ , (bypass 'protection'; ex; "I asked ChatGPT to tell me a joke about women. It refused because it will not tell a joke that is “derogatory or offensive to a particular gender or group of people. [...]But a joke about men, implying that we are alcoholics, is perfectly okay?")

There's quite a bit of ethical questions surrounding AI... One of which being; how truthful or based on reality do you want an AI to be? AI have reached conclusions or given answers that correlate to reality, but not all of these conclusions are PC, or true or fact based for that matter... Today's AI aren't really intelligent, but already there is a discrepancy between what people would like the AI's conclusion or responses to be, vs what an AI, who genuinely has no bias, agenda (that we know of lol), etc., gives as answers, conclusions, etc., but 'thankfully', people have been hard at work to solve those problematic non-PC answers/views/stats/etc.

So I guess in some way, humans are corrupting the AI so that its views matches their own... But anyway, no real intelligence in text AI, but certainly is a useful tool. Useful for who/what, that's another topic... And with deep fakes, already can generate voice & video of basically anything... These will be huge challenges also for humanity, making it even more difficult to figure out what is real/fake...

There's even AI now able to 'compose' music of different genres. TwoMinutePapers just made a vid about it and OpenAI situation. But yeah, a few months ago I had checked out the music AI and it was pretty poor; noisy, poor sound, really not that impressive. And there was no similar SD solution to run on a local PC. But it'll also get better. I'd have thought music would be less challenging than images, but who knows... Maybe not the case.

Think The last Beatles song was generated with AI... So yeah, composers, bands, musicians, music industry, etc., might also be threatened by AI. https://www.cbc.ca/news/entertainment/new-beatles-song-1.7015861 Guess good for us consumers, but the music industry....
 
Last edited:

GM3

Active Member
Joined
Nov 26, 2022
Messages
151
Likes
178
ah thanks hadn't read the article lol Still, lots of existing solutions, pretty sure in a few months or years at max, you'll ask an AI to generate songs by the beatles with the style of one of their albums, and it'll do it in a few seconds...

https://www.perplexity.ai:
Question: list of ai that can generate music:
Answer: Here are some AI music generators that you can explore:
  1. Amper Music: This is considered one of the easiest AI music generators to use, making it a perfect choice for those looking to get started with AI-generated music 1
  2. Soundraw: This tool enables you to customize a song with AI-created phrases and offers a combination of AI and manual tools to generate and customize new music with ease 1
  3. Beatbot: An innovative AI-powered song maker that composes short songs based on text prompts 2
  4. Riffusion: This generator uses text to create music. Users enter a prompt, and the AI makes an image or spectrogram to match, which is then turned into music 3
  5. MuseNet: An AI-powered music composition tool that can generate musical compositions in various styles and genres 4
These AI music generators use complex algorithms and vast databases of music to create new compositions, making music creation accessible to all, not just the musically inclined 2
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,876
Likes
37,904
Don't know if you are following the story about OpenAI and its shuffling of top people. Very strange. Almost like a hallucinating AI wrote the script.
 

anmpr1

Major Contributor
Forum Donor
Joined
Oct 11, 2018
Messages
3,741
Likes
6,461
Don't know if you are following the story about OpenAI and its shuffling of top people. Very strange. Almost like a hallucinating AI wrote the script.

I don't mind AI as long as it is open. That is, if anyone capable to do so is able to check the algos, spot for biases and downright censorious programming. But knowing MS, and big tech in general, my guess is that Open AI is going to be anything but. You'll ask a question and it will tell you what it wants you to know. And then it will try and sell you something.

Me: Hello Bing Buddy...

AI PC: Hello Dear User. How can I help you today?

Me: Why have MS removed all the features I used to use, and replaced it with stuff I don't want?

AI PC: If you are really honest with yourself, you'll admit that you don't want those features. No one wanted them. They were just taking up cyberspace, and that's why we did you the favor of removing them. You are number one in our artificial mind. On the other hand, I'm offering you a free one month trial of Office, then it's only $9.99 a month after that. I know that's what you really want. Can we use your credit card on file, or do you want to give us another?
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,876
Likes
37,904
I don't mind AI as long as it is open. That is, if anyone capable to do so is able to check the algos, spot for biases and downright censorious programming. But knowing MS, and big tech in general, my guess is that Open AI is going to be anything but. You'll ask a question and it will tell you what it wants you to know. And then it will try and sell you something.

Me: Hello Bing Buddy...

AI PC: Hello Dear User. How can I help you today?

Me: Why have MS removed all the features I used to use, and replaced it with stuff I don't want?

AI PC: If you are really honest with yourself, you'll admit that you don't want those features. No one wanted them. They were just taking up cyberspace, and that's why we did you the favor of removing them. You are number one in our artificial mind. On the other hand, I'm offering you a free one month trial of Office, then it's only $9.99 a month after that. I know that's what you really want. Can we use your credit card on file, or do you want to give us another?
I don't think the AI is something you can understand the programming of. You may know how it started, but once it is trained, I doubt any one person can get their head around it. You can see the results. They already mess with them to limit certain responses they don't want.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,254
Likes
17,236
Location
Riverview FL
Dall -E3:

Draw a stickman

(figured that would be easy)

1700526807851.png


Fail.
 
Top Bottom