Compared to something like this, all the engines failed miserably.
What do you think?
Short answer: Using txt2img generation isn't that simple; you have to become proficient; and then it takes some work to experiment and learn the basics.
Long answer: I've been experimenting quite a bit with Stable-Diffusion, which you can run yourself if you have a fairly decent gaming video card (~8GB).
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Base models are basically SD1.5 (~512x512) images and SDXL, (1024x1024). But people have been releasing their own improved models, which will give specific or more targeted images (styles), you can find plenty here;
https://civitai.com/models These start off with existing SD1.5/SDXL models, and adds training with new particular topics/images. SD1.5 models likely more useful for most, unless you have truly top of the line video card, as SDXL image gen is quite slower.
So, you've got a SD installed, you've even downloaded model. The way you'd go generating an image would be entering a prompt, and generating 2-3 dozen images. The way it works, the AI doesn't really understand sentences. It picks up on words or word pairs, and uses its neural network to 'unblur' noise (given a seed #) into matching elements & text description of images... So really, there's no real 'intelligence' or any sort of reasonable sentence recognition.
So a prompt like: "Create a portrait of an astronaut in the year 2230, neon lit background, 35mm f1.4", could be split into something like:
Portrait, astronaut, year 2230, neon_background, 35mm, f1.4.
And again, some terms might be redundant; ex; 35mm & portrait. And since the AI isn't really intelligent, it will pick up on some words and not others, f1.4 is well understood by photographers, maybe not so much by AI... Using something like bokeh, or blurry background might work better.
Also, the AI favors some words; if you use 'portrait' it's got a strong sense of what a portrait should look like, so will give you a portrait. Same for astronaut, because SD basically use word & image associations, 'astronaut' might override that 'year 2230', because SD knows what an astronaut should look like (more or less), but 'year 2230' isn't something specific that the AI can hang on to. So very likely to get lost in the prompt; not many images tagged with "year 2230".
Certain models will have realistic characters, other generate cartoon or pixar characters, etc., so having an aesthetic sense of what you're trying to generate the pick the right model to use is critical.
You can also give more/less importance to certain words, so that SD generates an image closer to what you're looking for. Ex; if you want a picture of a girl with a future space suit, with neon light, and not particularly a portrait, removing 'portrait, using word pairs, weight, etc., would all help. So, you really have to tweak the prompt, and generate plenty of images, until you find what you're kinda looking for. Just adjusting the prompt a little bit, I can get these types of images, which likely are closer to what I'm guessing was expected... I added a few random details, and put weight 1.2 for futuristic and bokeh
woman, blond_hair, (green_eyes:0.9), long_hair, space_suit, (futuristic:1.2), portrait, neon_background, (bokeh:1.1)
So the more detailed the prompt, the more chances you have of getting the type of image you want... And you definitely have to tweak things, adding descriptive, removing, giving some words more weight, others less weight, etc., until you get somewhat what you're looking for. The above was using a model called 'Reliberate', but there's tons... Here's another one using MajesticMix (favors asian characters looks like), then EpicRealism, using the same prompt:
One thing you can notice, lots of green, looks like the AI is confusing the "green_eyes" or maybe "neon" with some green-ish hue image... So generating an image definitely involves a lot of trial and error, and tweaking. Not super simple or intuitive.
But that's just scratching the surface... There's also sampling method, open pose, img2img, in painting, etc.,. Using "Regional Prompter" to set different prompts for certain regions, etc. Takes about 1m to generate a batch of ten 758x512 images. Once you find a picture you like, or have worked out a prompt, then you can use "hi res fix", AI resizing, etc., or just generate larger images, most SD1.5 models will be fine with 768x1024 images. Another batch using EpicRealism model with different Sampler and higher resolution.