Comparing and Contrasting Three Artificial Intelligence Text-to-Art Tools: Stable Diffusion, Midjourney, and DALL-E 2 (Plus a Tantalizing Preview of AI Text-to-Video Editing!)

HOUSEKEEPING NOTE: Yes, I know, I know—I’m off on yet another tangent on this blog! Please know that I will continue to post “news and views on social VR, virtual worlds, and the metaverse” (as the tagline of the RyanSchultz.com blog states) in the coming months! However, over the next few weeks, I will be focusing a bit on the exciting new world of AI-generated art. Patience! 😉

Artificial Intelligence (AI) tools which can create art from a natural-language text prompt are evolving at such a fast pace that it is making me a bit dizzy. Two years ago, if somebody had told me that you would be able to generate a convincing photograph or a detailed painting from a text description alone, I would have scoffed! Many felt that the realm of the artist or photographer would be among the last holdouts where a human being was necessary to produce good work. And yet, here we are, in mid-2022, with any number of public and private AI initiatives which can be used by both amateurs and professionals to generate stunning art!

In a recent interview by The Register‘s Thomas Claburn of David Holz (the former co-founder of augmented reality hardware firm Magic Leap, who founded Midjourney), there’s a brief explanation of how this burst of research and development activity got started:

The ability to create high-quality images from AI models using text input became a popular activity last year following the release of OpenAI’s CLIP (Contrastive Language–Image Pre-training), which was designed to evaluate how well generated images align with text descriptions. After its release, artist Ryan Murdock…found the process could be reversed – by providing text input, you could get image output with the help of other AI models.

After that, the generative art community embarked on a period of feverish exploration, publishing Python code to create images using a variety of models and techniques.

“Sometime last year, we saw that there were certain areas of AI that were progressing in really interesting ways,” Holz explained in an interview with The Register. “One of them was AI’s ability to understand language.”

Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [models using Generative Adversarial Networks]. “The one that really struck my eye personally was the CLIP-guided diffusion,” he said, developed by Katherine Crawson…

If you need a (relatively) easy-to-understand explainer on how this new diffusion model works, well then, YouTube comes to your rescue with this video with 4 explanations at various levels of difficulty!


Before we get started, a few updates since my last blogpost on A.I.-generated art: After using up my free Midjourney credits, I decided to purchase a US$10-a-month subscription to continue to play around with it. This is enough credit to generate approximately 200 images per month. Also, as a thank you for being among the early beta testers of DALL-E 2, the AI art-generation tool by OpenAI, they have awarded me 100 free credits to use. You can buy additional credits in 115-generation increments for US$15, but given the hit-or-miss nature of the results returned, this means that DALL-E 2 is among the most expensive of the artificial intelligence art generators. It will be interesting to see if and how OpenAI will adjust their pricing as the newer competitors start to nip at their heels in this race!

And I can hardly believe my good fortune, because I have been accepted into the relatively small beta test group for a third AI text-to-art generation program! This new one is called Stable Diffusion, by Stability AI. Please note that if you were to try to get into the beta now, it’s probably too late; they have already announced that they have all the testers they need. I submitted my name 2-3 weeks ago, when I first heard about the project. Stable Diffusion is still available for researcher use, however.

Like Midjourney, Stable Diffusion uses a special Discord server with commands (instead of Midjourney’s /imagine, you use the prompt !dream, followed by a text description of what you want to see, plus you can add optional parameters to set the aspect ratio, the number of images returned, etc.). However, the Stable Diffusion team has already announced that they plan to move from Discord to a web-based interface like DALL-E 2 (we will be beta-testing that, too). Here’s a brief video glimpse of what the web interface could look like:


Given that I am among the relatively few people who currently have access to all three of the top publicly-available AI art-generation tools, I thought it would be interesting to create a chart comparing and contrasting all three programs. Please note that I am neither an artist nor an expert in artificial intelligence, just a novice user of all three tools! Almost all of the information in this chart has been gleaned from the websites of the projects, and online news reports, as well as the active subreddit communities for all three programs on Reddit, where users post pictures and ask questions. Also, all three tools are constantly being updated, so this chart might go very quickly out-of-date (although I will make an attempt to update it).

Name of ToolDALL-E 2MidjourneyStable Diffusion
CompanyOpenAIMidjourneyStability AI
AI Model UsedDiffusionDiffusionDiffusion
# Images Used
to Train the AI
400 millon“tens of millions”2 billion
User InterfacewebsiteDiscordDiscord (moving to website)
Cost to Usecredit system (115 for US$15)subscription (US$10-30 per month)currently free (beta)
Uses Text Promptsyesyesyes
Can Add Optional Argumentsnoyesyes
Non-Square Images?noyesyes
In-tool Editing?yesnono
Uncropping?yesnono
Generate Variations?yesyesyes (using seeds)
A comparison chart of three AI text-to-art tools: DALL-E 2, Midjourney, and Stable DIffusion

I have already shared a few images from my previous testing of DALL-E 2 and Midjourney here, here, and here, so I am not going to repost those images, but I wanted to share a couple of the first images I was able to create using Stable Diffusion (SD). To make these, I used the text prompt “a thatched cottage with lit windows by a lake in a lush green forest golden hour peaceful calm serene very highly detailed painting by thomas kinkade and albrecht bierstadt”:

I must admit that I am quite impressed by these pictures! I had asked SD for images with a height of 512 pixels and a width of 1024 pixels, but to my surprise, the second image was a wider one presented neatly in a white frame, which I cropped using my trusty SnagIt image editor! Also, it was not until after I submitted my prompt that I realized that the second artist’s name is actually ALBERT Bierstadt, not Albrecht! It doesn’t appear as if my typo made a big difference in the final output; perhaps for well-known artists, the last name alone is enough to indicate a desired art style?

Here are a few more samples of the kind of art which Stable Diffusion can create, taken from the pod-submissions thread on the SD Discord server:

Text prompt: “a beautiful landscape photography of Ciucas mountains mountains a dead intricate tree in the foreground sunset dramatic lighting by Marc Adamus”
Text prompt: “incredible wide screenshot ultrawide simple watercolor rough paper texture katsuhiro otomo ghost in the shell movie scene backlit distant shot”
Text prompt: “an award winning wallpaper of a beautiful grassy sunset clouds in the sky green field DSLR photography clear image”
Text prompt: “beautiful angel brown skin asymmetrical face ethereal volumetric light sharp focus”
Painting of people swimming (no text prompt shared)

You can see many more examples over at the r/StableDiffusion subreddit. Enjoy!

If you are curious about Stable Diffusion and want to learn more, there is a 1-1/2 hour podcast interview with Emad Mostaque, the founder of Stability AI (highly recommended!). You can also visit the Stability AI website, or follow them on social media: Twitter or LinkedIn.


I also wanted to submit the same text prompt to each of DALL-E 2, Midjourney, and Stable Diffusion, to see how the AI models in each would respond. Under each prompt you will see three square images: the first from DALL-E 2, the second from Midjourney, and the third from Stable Diffusion. (Click on each thumbnail image to see it in its full size on-screen.)

Text prompt: “the crowds at the Black Friday sales at Walmart, a masterpiece painting by Rembrandt van Rijn”

Note that none of the AI models are very good at getting the facial details correct for large crowds of people (all work better with just one face in the picture, like a portrait, although sometimes they struggle with matching eyes or hands). I would say that Midjourney is the clear winner here, although a longer, much more detailed prompt in DALL-E 2 or Stable Diffusion might have created an excellent picture.

Text prompt: “stunning breathtaking photo of a wood nymph with green hair and elf ears in a hazy forest at dusk. dark, moody, eerie lighting, brilliant use of glowing light and shadow. sigma 8.5mm f/1.4”

When I tired to generate a 1024-by-1024 image in Stable Diffusion, it kept giving me more than one wood nymph, even when I added words like “single” or “alone”, which is a known bug in the current early state of the program. I finally gave up and used a 512×512 image. The clear winner here is DALL-E 2, which has a truly impressive ability to mimic various camera styles and settings!

Text prompt: “a very highly detailed portrait of an African samurai by Tim Okamura”

In this case, the clear winner is Stable Diffusion with its incredible detail, even though, once again, I could not generate a 1024×1024 image because it kept giving me multiple heads! The DALL-E 2 image is a too stylized for my taste, and the Midjourney image, while nice, has eyes that don’t match (a common problem with all three tools).

And, if you enjoy this kind of thing, here’s a 15-minute YouTube video with 21 more head-to-head comparisons between Stable Diffusion, DALL-E 2, and Midjourney:


As I have said, all of this is happening so quickly that it is making my head spin! If anything, the research and development of these tools is only going to accelerate over time. And we are going to see this technology applied to more than still images! Witness a video shared on Twitter by Patrick Esser, an AI research scientist at Runway, where the entire scene around a tennis player is changed simply by editing a text prompt, in real time:


I expect I will be posting more later about these and other new AI art generation tools as they arise; stay tuned for updates!

To Teleport or Not to Teleport: Teleporting Versus Walking in the Metaverse

Ever wish you could teleport in real life?
(Photo by Chris Briggs on Unsplash)

Earlier this week, I had a guided tour of the blockchain-based social VR platform Somnium Space, where I was informed by my tour guide that the virtual world had just implemented teleporting. Scattered throughout the one large, contiguous virtual landscape which comprises Somnium Space were teleporter hubs, where you could pull up a map, click on the teleporter hub you wanted to travel to, press a button, et voilà! You were instantly transported to your destination.

A teleporter hub in the central city square of Somnium Space (at night)
The red arrows indicate the location of teleporter hubs on the map

What makes Somnium Space unusual among metaverse platforms is that you cannot simply teleport from one place to another distant location; you either must make use of the provided teleporters, or walk/run/fly/swim to your destination. (Of course, you can certainly “short hop” using a limited form of teleporting, but that is only for shorter distances, not for instantly getting from one end of a large, contiguous landmass to another.)

In other words, the teleporter hubs of the Somnium Transportation System are set up much like a modern urban subway system, where you can only travel to a particular, pre-built subway station that is situated the nearest to your intended destination, and then walk the rest of the way. Many people might remember that in the very earliest days of Second Life, there were also teleporter hubs in the days before avatars could instantly teleport themselves from one location to another!

Another thing that sets Somnium Space apart from other social VR platforms is that there are only going to be so many “public” teleporter hubs. In face, some of these hubs are going to be auctioned off as NFTs (Non-Fungible Tokens), and the successful bidders with such a teleporter hub on their properties will be able to charge a cryptocurrency fee in order to use their teleporters! (In other words, they would operate much the same as a real-life toll road or highway.)

Closely intertwined with the idea of teleporting vs. walking is the layout of a metaverse platform. Is it one large contiguous landmass, like Somnium Space, Decentraland, Cryptovoxels, and (to a certain extent) Second Life? Or is it a collection of smaller worlds, like VRChat, Rec Room, Sansar, and Sinespace? If it is the former, then means of transportation (and ease of access to transportation) becomes more important. If it is the latter, then another tool which many of the newer social VR platforms offer is the ability to create a portal—either temporary or permanent— between two worlds. (Of course, you could consider a teleporter hub a portal.)

So, keeping all this in mind (particularly the distinction between SHORT HOP teleporting and teleporting to a DISTANT location), we can create a chart outlining the transportation affordances of the various metaverse platforms:

Name of Platform (Layout)Walk/Run? *Distance
Teleport?
**
Create Portals?
Second Life (mostly one contiguous landmass, with private islands)YESYESYES
Sinespace (separate worlds)YESNOYES
Sansar (separate worlds)YESNO (but you can create teleport hubs)YES
VRChat (separate worlds)YESNOYES
Rec Room (separate worlds)YESNOYES
AltspaceVR (separate worlds)YESNOYES
NeosVR (separate worlds)YESNOYES
Cryptovoxels (one contiguous landmass with some islands) YESNO (you can add coordinates to a URL, though)YES
Decentraland (one contiguous landmass) YESYES (/goto X,Y)NO
Somnium Space (one contiguous landmass)YESNO (but there are teleport hubs)NO (unless you count teleport hubs)
* – Can a user walk/run/fly/swim from one location to another? This includes SHORT HOP teleporting.
** – Can a user personally choose to teleport from one location to a second, DISTANT location?
† – Can a user create a temporary or permanent portal from one location to another?

Obviously, all metaverse platforms offer some form of personal locomotion for your avatar (walk, run, fly, swim, short-hop teleporting, etc.). This is standard.

It is also clear from this table that the metaverse platforms which consist of many smaller worlds (Sinespace, Sansar, VRChat, Rec Room, AltspaceVR, and NeosVR) all prefer the creation of temporary and permanent portals to allowing users to teleport great distances on their own steam. On the other hand, all the social VR platforms and virtual worlds which consist of one contiguous landmass tend to allow some form of teleportation across great distances.

You will notice that Cryptovoxels uses a rather brute-force method of “teleporting”, which consists of appending the coordinates to the end of the URL you enter into your web browser client (which are much the same as the coordinates which form part of the SLURLs used in Second Life, but not nearly as convenient in my opinion).

Transportation affordances are yet another way to classify metaverse platforms in my continuing effort to create a taxonomy of social VR platforms and virtual worlds.

So, what do you think? Have I made an error in my table? Do you have an opinion about the benefits of teleporting and portals versus walking around and exploring the landscape? I’d love to hear your opinions, so please leave a comment, thank you!

UPDATED! Comparing Clubhouse with Twitter Spaces: A Chart Comparing the Features of the Two Leading Drop-In Audio Chat Social Apps for Mobile Devices

Clubhouse (photo by Erin Kwon on Unsplash)
Twitter Spaces (source)

I don’t know what lucky star I was born under, but as of very early this morning, Thursday, March 4th, 2021, I am now part of not one but two beta tests of competing drop-in audio chat apps: Clubhouse (which I have been on for a little over a week), and the newer Twitter Spaces, which I was invited to join today, after participating in my first-ever Twitter Spaces group chat that lasted into the wee hours of this morning!

This morning, I tried out my new abilities, setting up Twitter Spaces chatrooms to talk with various people one-on-one, like Michael Zhang, Kent Bye, Will Burns and Andy Fidel. With those chats, and last night’s group chat, under my belt, I now feel confident enough to compile a comparison chart between the two platforms.

Please note that the situation is evolving rapidly (for example, the press have reported that Twitter Spaces works for Android, but in trying to connect with an Android user, she reported that she received a message that it’s not available yet for Android). So this chart will age rapidly, and I will NOT be keeping it up to date; consider it just a current snapshot of the race between the two social audio companies! (And yes, you can bet your bottom dollar that Facebook is feverishly working on a competing drop-in audio chat app to dominate the nascent marketplace*.)

(I apologize for the somewhat messy dimensions of this table; I was unable to find an easy way to make the columns all the same size! I need to brush up on my HTML/CSS.)

Features/DetailsCLUBHOUSETWITTER SPACES
CompanyAlpha Exploration Company, founded in April 2020 by Rohan Seth and Paul Davison, funded by venture capitalist Andreessen HorowitzTwitter, founded by by Jack Dorsey, Noah Glass, Biz Stone, and Evan Williams in March 2006
Current Number of Users10 million users (and growing quickly!)Unknown number of users since its private beta launch in late December 2020, mostly iOS (Twitter itself has 330 million users)
Supported Mobile DevicesiOS onlyiOS only; the press has already reported that Android support has just launched, but I have had a least one report of an Android user who could not get in, and one report of someone who could, so…
Current Growth ModelInvite only (You have to have someone text you an invitation)Invite only (Twitter seems to be selecting the longest-standing accounts first)
Number of Rooms You Can CreateAs many as you like (three kinds: open, public/followers only, or closed/invite only)It appears to be just one, reusable room linked to your Twitter profile (you can retitle the room every time you spin it up, though)
Number of Clubs (Recurring Rooms) You Can CreateYou need to ask Clubhouse to set up a club for you, but soon they plan to launch the ability for you to create your own clubs There does not appear to be a regularly-scheduled room or club feature yet (but it’s early days!)
Number of People You Can Invite into a RoomSeems to have no upper limit (the Elon Musk interview room had over 6,000 people)UPDATE: It would appear you can invite as many Twitter users and lists of users as you like (thanks, Navah!). You can also send out a general invitation tweet to your Twitter feed, or generate a special link to post to places like Discord (I tested both and they do indeed work).
EmojisEncouraged in user profiles and searchable, but when you are in a room, and not speaking, you are limited to clicking your microphone button repeatedly (similar to clapping), or changing your user icon and PTR (Pull To Refresh) the screen.Yes (but the selection is limited to only 5 emojis). Of course, you can also use emojis in your Twitter profiles and tweets!
Direct MessagingNo (you must use Instagram or Twitter to send direct messages, although you could create a private room for just the two of you to chat)Yes, built-in from the start
CostThe platform is free to all users and doesn’t yet offer any kind of premium plan or method of charging users, nor is it ad-supported. They plan to monetize by adding ways for users to pay other users, which will provide an opportunity for Clubhouse to take a cut for its services.Free (Twitter makes its money through advertising and data licensing)

And if you want to ping me on either Clubhouse or Twitter, my handle on both is the same: @quiplash. Quiplash is short for “quipster whiplash”, because I am very well known for my snappy comebacks 😉 (and no, I am not named after the Quiplash game). Hit me up if you want to experience Twitter Spaces and perhaps we can schedule a group discussion, and I’d like to extend the same invitation for Clubhouse (if you can get an invite; I might be able to you out there, too, if you join my Patreon).

Feel free to give me a shout! (photo by Jason Rosewell on Unsplash)

UPDATE 4:13 p.m.: Well, I have been testing out Twitter Spaces with small groups of three to five people; thanks to Navah Berg and my European social VR blogger counterpart Niclas Johansson, and to Thomas for helping me test! (I’m sorry but given the problems I report below, I was unable to add Thomas as a friend, and I didn’t catch his last name.)

Unfortunately, this afternoon, the Twitter Spaces app performed horribly, muting my microphone at one point and forcing me to use the very limited set of 5 emojis to express myself (like some sad mime!), and at another point, slowing down to the point that it took me several painful minutes to search for a username, waiting 5-10 seconds for each and every key press to register, and then, not once but twice in a row, actually crashing me out of the app and causing my iPhone to lock up completely! I haven’t had that happen in a while… So, after four tries, I gave up.

So I would very strongly recommend that you wait a day or two before trying Twitter Spaces, even if you have been invited to participate as a host today. It seems to be buckling under the load, and in my opinion, it’s just not ready for prime time. Very buggy, very beta. (Sorry, Twitter!)

Navah, who says she had been on Spaces for a couple of weeks now and that she prefers Twitter Spaces to Clubhouse, told us that her pervious days’ performance was much better, and she suggested that all these serious problems are happening to us today because Twitter launched Spaces for Android users today, and they are getting hammered with Android device traffic (which makes sense to me).

UPDATE 8:31 p.m.: Well, things are looking up! Navah is hosting a Twitter Space this evening with approximately 55 people present, with only occasional audio issues. One of the features I do quite like about Twitter Spaces is the ability for someone either (host or speaker) to share a tweet with everybody in the room. Somebody posted a copy of my tweet of this blogpost to tonight’s meeting!

UPDATE 8:43 p.m.: Aaaand the room crashed again! Back to the drawing board, Twitter…

*UPDATE March 6th, 2021: Well, surprise, surprise… word has leaked out that Facebook is working on adding audio chat rooms to Instagram:

Here’s a link to the tweet and resulting comment thread if you’re interested.

My Projects for November

Have you joined the RyanSchultz.com Discord yet? You’re invited to be a part of the first ever cross-worlds discussion group, with over 460 people participating from every single social VR platform and virtual world! More details here


I tried.

I mean, I really, really tried, people.

My vow today was to spend the entire day (a vacation day) cleaning up both my spectacularly messy apartment and Vanity Fair’s overstuffed inventory, and assiduously avoiding any social media and any news media for any snippet of U.S. election news, good or bad.

My resolve lasted an hour. First, I peeked at my Twitter, just to see what hashtags were trending. Then, I opened up Google News, just to check the coronavirus headlines. After that, the floodgates were wide open. It looks like I, like so many other people, are going to be glued to their news media today and tomorrow, just to find out what happens.

*sigh* Oh well.

Image by Lena Helfinger from Pixabay

You should know that I do have two projects to work on over my holidays.

First, it is time—far past time—for me to reorganize and categorize my popular Comprehensive List of Social VR Platforms and Virtual Worlds. It’s waaay overdue. (And I’m curious to see what projects and platforms have thrived or folded.)

It’s also time for my annual November update of my Comparison Chart of Popular Social VR Platforms (and yes, I know, “Popular” is subjective). I do plan to draw on the readers of my blog and the 460-plus members of the RyanSchultz.com Discord server to crowdsource a lot of the information contained in the updated comparison chart. (Expect a separate, more detailed blogpost on this topic later this week.)

I will also have to rely on others to help me fill in all the details in the updated comparison chart for Facebook Horizon, as I intend to continue my personal boycott of all Facebook/Oculus products and services (as protest against the company forcing Oculus VR device users to set up accounts on the Facebook social network).

I am not naïve; I full well realize that the Oculus Quest 2 is gonna sell like hotcakes anyway, and no doubt I will continue to feel pressure (both from myself and from my readers) to cave in and buy one, just so I can report directly on the social VR platforms that will inevitably find fertile ground on the headset. I have zero doubt that, much like vibrant communities like Bray’s Place which have sprung up in Second Life over the seventeen years of its existence, healthy communities will spring up within Facebook Horizon (in face, Facebook is counting on that fact).