UPDATE Sept. 14th, 2023: I have updated the comparison chart above to version 1.2, because somebody informed me that Kalhene is now charging L$99 to join the Erika group (it used to be L$50). Please note that the Flickr snapshots of the chart below have not been updated.
Note that I have deliberately excluded some mesh bodies, for example, the free Altamura bodies you can pick up at various locations (because you cannot change the skin, and you cannot use Bakes on Mesh with them). I have also left out those bodies which have poor or even non-existent third-party designer support. An example of this would be the Ultra Vixen mesh body, which is now only free to avatars under 30 days old and—as far as I am aware—only has clothing that fits it, which is made and sold by the body’s creator.
Looking forward to hearing your comments and suggestions!
HOUSEKEEPING NOTE: Yes, I know, I know—I’m off on yet another tangent on this blog! Please know that I will continue to post “news and views on social VR, virtual worlds, and the metaverse” (as the tagline of the RyanSchultz.com blog states) in the coming months! However, over the next few weeks, I will be focusing a bit on the exciting new world of AI-generated art. Patience! 😉
Artificial Intelligence (AI) tools which can create art from a natural-language text prompt are evolving at such a fast pace that it is making me a bit dizzy. Two years ago, if somebody had told me that you would be able to generate a convincing photograph or a detailed painting from a text description alone, I would have scoffed! Many felt that the realm of the artist or photographer would be among the last holdouts where a human being was necessary to produce good work. And yet, here we are, in mid-2022, with any number of public and private AI initiatives which can be used by both amateurs and professionals to generate stunning art!
The ability to create high-quality images from AI models using text input became a popular activity last year following the release of OpenAI’s CLIP (Contrastive Language–Image Pre-training), which was designed to evaluate how well generated images align with text descriptions. After its release, artist Ryan Murdock…found the process could be reversed – by providing text input, you could get image output with the help of other AI models.
After that, the generative art community embarked on a period of feverish exploration, publishing Python code to create images using a variety of models and techniques.
“Sometime last year, we saw that there were certain areas of AI that were progressing in really interesting ways,” Holz explained in an interview with The Register. “One of them was AI’s ability to understand language.”
Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [models using Generative Adversarial Networks]. “The one that really struck my eye personally was the CLIP-guided diffusion,” he said, developed by Katherine Crawson…
If you need a (relatively) easy-to-understand explainer on how this new diffusion model works, well then, YouTube comes to your rescue with this video with 4 explanations at various levels of difficulty!
Before we get started, a few updates since my last blogpost on A.I.-generated art: After using up my free Midjourney credits, I decided to purchase a US$10-a-month subscription to continue to play around with it. This is enough credit to generate approximately 200 images per month. Also, as a thank you for being among the early beta testers of DALL-E 2, the AI art-generation tool by OpenAI, they have awarded me 100 free credits to use. You can buy additional credits in 115-generation increments for US$15, but given the hit-or-miss nature of the results returned, this means that DALL-E 2 is among the most expensive of the artificial intelligence art generators. It will be interesting to see if and how OpenAI will adjust their pricing as the newer competitors start to nip at their heels in this race!
And I can hardly believe my good fortune, because I have been accepted into the relatively small beta test group for a third AI text-to-art generation program! This new one is called Stable Diffusion, by Stability AI. Please note that if you were to try to get into the beta now, it’s probably too late; they have already announced that they have all the testers they need. I submitted my name 2-3 weeks ago, when I first heard about the project. Stable Diffusion is still available for researcher use, however.
Like Midjourney, Stable Diffusion uses a special Discord server with commands (instead of Midjourney’s /imagine, you use the prompt !dream, followed by a text description of what you want to see, plus you can add optional parameters to set the aspect ratio, the number of images returned, etc.). However, the Stable Diffusion team has already announced that they plan to move from Discord to a web-based interface like DALL-E 2 (we will be beta-testing that, too). Here’s a brief video glimpse of what the web interface could look like:
Given that I am among the relatively few people who currently have access to all three of the top publicly-available AI art-generation tools, I thought it would be interesting to create a chart comparing and contrasting all three programs. Please note that I am neither an artist nor an expert in artificial intelligence, just a novice user of all three tools! Almost all of the information in this chart has been gleaned from the websites of the projects, and online news reports, as well as the active subreddit communities for all three programs on Reddit, where users post pictures and ask questions. Also, all three tools are constantly being updated, so this chart might go very quickly out-of-date (although I will make an attempt to update it).
Name of Tool
AI Model Used
# Images Used to Train the AI
“tens of millions”
Discord (moving to website)
Cost to Use
credit system (115 for US$15)
subscription (US$10-30 per month)
currently free (beta)
Uses Text Prompts
Can Add Optional Arguments
yes (using seeds)
A comparison chart of three AI text-to-art tools: DALL-E 2, Midjourney, and Stable DIffusion
I must admit that I am quite impressed by these pictures! I had asked SD for images with a height of 512 pixels and a width of 1024 pixels, but to my surprise, the second image was a wider one presented neatly in a white frame, which I cropped using my trusty SnagIt image editor! Also, it was not until after I submitted my prompt that I realized that the second artist’s name is actually ALBERT Bierstadt, not Albrecht! It doesn’t appear as if my typo made a big difference in the final output; perhaps for well-known artists, the last name alone is enough to indicate a desired art style?
Here are a few more samples of the kind of art which Stable Diffusion can create, taken from the pod-submissions thread on the SD Discord server:
I also wanted to submit the same text prompt to each of DALL-E 2, Midjourney, and Stable Diffusion, to see how the AI models in each would respond. Under each prompt you will see three square images: the first from DALL-E 2, the second from Midjourney, and the third from Stable Diffusion. (Click on each thumbnail image to see it in its full size on-screen.)
Text prompt: “the crowds at the Black Friday sales at Walmart, a masterpiece painting by Rembrandt van Rijn”
Note that none of the AI models are very good at getting the facial details correct for large crowds of people (all work better with just one face in the picture, like a portrait, although sometimes they struggle with matching eyes or hands). I would say that Midjourney is the clear winner here, although a longer, much more detailed prompt in DALL-E 2 or Stable Diffusion might have created an excellent picture.
Text prompt: “stunning breathtaking photo of a wood nymph with green hair and elf ears in a hazy forest at dusk. dark, moody, eerie lighting, brilliant use of glowing light and shadow. sigma 8.5mm f/1.4”
When I tired to generate a 1024-by-1024 image in Stable Diffusion, it kept giving me more than one wood nymph, even when I added words like “single” or “alone”, which is a known bug in the current early state of the program. I finally gave up and used a 512×512 image. The clear winner here is DALL-E 2, which has a truly impressive ability to mimic various camera styles and settings!
Text prompt: “a very highly detailed portrait of an African samurai by Tim Okamura”
In this case, the clear winner is Stable Diffusion with its incredible detail, even though, once again, I could not generate a 1024×1024 image because it kept giving me multiple heads! The DALL-E 2 image is a too stylized for my taste, and the Midjourney image, while nice, has eyes that don’t match (a common problem with all three tools).
And, if you enjoy this kind of thing, here’s a 15-minute YouTube video with 21 more head-to-head comparisons between Stable Diffusion, DALL-E 2, and Midjourney:
As I have said, all of this is happening so quickly that it is making my head spin! If anything, the research and development of these tools is only going to accelerate over time. And we are going to see this technology applied to more than still images! Witness a video shared on Twitter by Patrick Esser, an AI research scientist at Runway, where the entire scene around a tennis player is changed simply by editing a text prompt, in real time:
I expect I will be posting more later about these and other new AI art generation tools as they arise; stay tuned for updates!
You might remember that I was one of the lucky few who received an invitation to be part of the closed beta test (or “research preview”, as they called it) of DALL-E 2, a new artificial intelligence tool from a company called OpenAI, which can create art from a natural-language text prompt. (I blogged about it, sharing some of the images I created, here and here.)
Here are a few more pictures I generated using DALL-E 2 since then (along with the prompt text in the captions):
Meanwhile, other DALL-E 2 users have generated much better results than I could, by skillful use of the text prompts. Here are just a few examples from the r/dalle2 subReddit community of AI-generated images which impressed and sometimes even stunned me, with a direct link to the posts in the caption underneath each picture:
As you can see by the last two images, you can get very detailed and technical in your text prompts, even including the model of camera used! (However, also note that in the fourth picture, DALL-E 2 ignored some specific details in the prompt.)
Yesterday, OpenAI sent me an email to annouce that DALL-E 2 was moving into open beta:
Our goal is to invite 1 million people over the coming weeks. Here’s relevant info about the beta:
Every DALL·E user will receive 50 free credits during their first month of use, and 15 free credits every subsequent month. You can buy additional credits in 115-generation increments for $15.
You’ll continue to use one credit for one DALL·E prompt generation — returning four images — or an edit or variation prompt, which returns three images.
We welcome feedback, and plan to explore other pricing options that will align with users’ creative processes as we learn more.
As thanks for your support during the research preview we’ve added an additional 100 credits to your account.
Before DALL-E 2 announced their new credits system, I had spent most of one day’s free prompts during the research preview to try and generate some repeating, seamless textures to apply to full-permissions mesh clothing I had purchased from the Second Life Marketplace. Most of my attempts were failures, pretty designs but not 100% seamless. However, I did manage to create a couple of floral patterns that worked:
So, instead of purchasing texture packs from without and outside of Second Life, I could, theoretically, generate unique textile patterns, apply them to mesh garments, and sell them, because according to the DALL-E 2 beta announcement I received:
Starting today, you get full rights to commercialize the images you create with DALL·E, so long as you follow our content policy and terms. These rights include rights to reprint, sell, and merchandise the images.
You get these rights regardless of whether you used a free or paid credit to generate images, and this includes images you’ve created before today during the research preview.
Will I? Probably not, because it took me somewhere between 20 and 30 text prompts to generate only two useful seamless patterns, so it’s just not cost effective. However, once AI art tools like DALL-E 2 learns how to generate seamless textures, it’s probably going to have some sort of impact on the texture industry, both within and outside of Second Life! (I can certainly see some enterprising soul set up a store and sell AI-generated art in a virtual world; SL is already full of galleries with human-generated art.)
Another cutting-edge AI art-generation program, called Midjourney (WARNING: ASCII art website!), has also announced an open beta. I had signed up to join the waiting list for an invitation several weeks ago, and when I checked my email, lo and behold, there it was!
We’re excited to have you as an early tester in the Midjourney Beta!
To expand the community sustainably, we’re giving everyone a limited trial (around 25 queries with the system), and then several options to buy a full membership.
Full memberships include; unlimited generations (or limited w a cheap tier), generous commercial terms and beta invites to give to friends.
Although both DALL-E 2 and Midjourney use human text prompts to generate art, they operate differently. While DALL-E 2 uses a website, Midjourney uses a special Discord server, where you enter your prompt as a special command, generating four rough thumbnail images, which you can then choose to upscale to a full-size image, or use as the basis for variations.
I took some screen captures of the process, so you can see how it works. I typed in “/imagine a magnificent sailing ship on a stormy sea”, and got this back:
The U buttons will upscale one of the four thumbnails, adding more details, while the V buttons generate variations, using one of the four thumbnails as a starting point. I choose thumbnail four and generated four variations of that picture:
Then, I went back and picked one of my original four images to upscale. You can actually watch as Midjourney slowly adds details to your image, it’s fascinating!
I then clicked on the Upscale to Max button, to receive the following image:
Now, I am not exactly satisfied with this first attempt (that sailing ship looks rather spidery to me), but as with DALL-E 2, you get much better results with more specific, detailed text prompts. Here are a few examples I took from the Midjourney subReddit community (with links back to the posts in the captions):
So, as you can see, you can get some pretty spectacular results, with incredible levels of detail! And unlike DALL-E 2, you can set the aspect ratio of your pictures (as was done in the fourth image generated). You do this with a special “–ar” command in your text prompt to Midjourney, e.g. “–ar 16:9” (here’s the online documentation explaining the various commands you can use).
And one area in which Midjourney appears to excel is horror:
You can see many more examples of depictions of horror in the postings to the Midjourney SubReddit; some are much creepier than these!
So, in comparing the two tools, I think that Midjourney offers more parameters to users (e.g. setting an aspect ratio), which DALL-E currently lacks. Midjourney also seems to produce much more detailed images than DALL-E 2 does, whereas DALL-E 2 is often astoundingly good at a much wider variety of tasks. For example, how about some angry bison logos for your football team?
I think these images are all very good! (Note that DALL-E 2 still struggles with text! Midjourney does too, but it gets the text correct more often than DALL-E 2 does at present. But note that might change over time as both systems evolve.)
So, the good news is that both DALL-E 2 and Midjourney are now in open beta, which means that more people (artists and non-artists alike) will get an opportunity to try them out. The bad news is that both still have long waiting lists, and with the move to beta, both DALL-E 2 and Midjourney have put limits in place as to how many free images you can generate.
Midjourney gives you a very limited trial period (about 25 prompts), and then urges you to pay for a subscription, with two options:
Basic membership gives you around 200 images per month for US$10 monthly; standard membership gives you unlimited use of Midjourney for US$30 a month.
For now, OpenAI has decided to set DALL-E 2’s pricing based on a credit system (similar to their GPT-3 AI text-generation tool), as described in the first quote in this blogpost. There’s no option for unlimited use of DALL-E 2 at any price, just options for buying credits in different amounts (and there are no volume discounts for purchasing larger amounts of credits at one time, either). The most you can by at once is 5,750 credits, which is US$750. So, yes, it can get quite expensive! (As far as I am aware, your unused credits carry over from one month to the next.)
In my experience, using Dall-E 2 to generate concept arts for our next project, it takes me between 10 to 20 attempts to get something close to what I want (and I never got exactly what I was asking for)…
Dall-E 2, at this point, is not a professional tool. It’s not viable as one, unless you produce exactly the type of content the AI can produce instantly just the way you want it.
Dall-E 2, at this point, IS A TOY! And that’s OpenAI’s mistake right now. You can’t sell a toy the way you sell a professional service! I’m ready to pay for it because I’m experimenting with it. I’m having fun with it and, when it works, it provides me with images I can also use for professional project. However, I wont EVER spend hundreds of dollars on this just for fun, and I certainly wont pay that amount for it as a tool until it can provide me with better and more consistent results!
OpenAI is going after the WRONG TARGET! OpenAI should be seeling it at a much lower price for everyday people and enthusiasts who want to experiment with it because this is literally the only people who can be 100% satisfied with it at this point and these people wont pay hundreds of dollars per month to keep playing when there are other shiny toys out there, cheaper and more open, existing or about to.
Several commenters said that they will be moving from DALL-E 2 to Midjourney because of their more favourable pricing model, but of course it’s still early days. Also, there are any number of open-source AI art-generation projects in the works, and competition will likely mean more features (and better results!) at less cost over time. One thing is certain: we can anticipate an acceleration in improvement of these tools over time.
The future looks to be both exciting and scary! Exciting in the ability to generate art in a new way, which up until now has been restricted to experienced artists or photographers, and scary in that we can no longer trust our eyes that a photograph is real, or has been generated by artificial intelligence! Currently, both systems have rules in place to prevent the creation of deepfake images, but in future, things could get Black Mirror weird, and the implications to society could be substantial. (Perhaps now you will understand the first three DALL-E 2 text prompts I used, at the top of this blogpost!)
P.S. Fun fact: the founding CEO of Linden Lab (the makers of Second Life), Philip Rosedale, is one of the advisors to Midjourney, according to their website. Philip gets around! 😉
UPDATE July 22nd, 2022: Of course, the images generated by DALL-E 2 and Midjourney can then be used in other AI tools, such as WOMBO and Reface (please click the links to see all the blogposts I have written about these mobile apps).
What you see here is an AI-generated image, “animated” using another deep learning tool. This is a tantalizing glimpse into the future, where artificial intelligence can not only create still images, but eventually, video!
Earlier this week, I had a guided tour of the blockchain-based social VR platform Somnium Space, where I was informed by my tour guide that the virtual world had just implemented teleporting. Scattered throughout the one large, contiguous virtual landscape which comprises Somnium Space were teleporter hubs, where you could pull up a map, click on the teleporter hub you wanted to travel to, press a button, et voilà! You were instantly transported to your destination.
What makes Somnium Space unusual among metaverse platforms is that you cannot simply teleport from one place to another distant location; you either must make use of the provided teleporters, or walk/run/fly/swim to your destination. (Of course, you can certainly “short hop” using a limited form of teleporting, but that is only for shorter distances, not for instantly getting from one end of a large, contiguous landmass to another.)
In other words, the teleporter hubs of the Somnium Transportation System are set up much like a modern urban subway system, where you can only travel to a particular, pre-built subway station that is situated the nearest to your intended destination, and then walk the rest of the way. Many people might remember that in the very earliest days of Second Life, there were also teleporter hubs in the days before avatars could instantly teleport themselves from one location to another!
Another thing that sets Somnium Space apart from other social VR platforms is that there are only going to be so many “public” teleporter hubs. In face, some of these hubs are going to be auctioned off as NFTs (Non-Fungible Tokens), and the successful bidders with such a teleporter hub on their properties will be able to charge a cryptocurrency fee in order to use their teleporters! (In other words, they would operate much the same as a real-life toll road or highway.)
Closely intertwined with the idea of teleporting vs. walking is the layout of a metaverse platform. Is it one large contiguous landmass, like Somnium Space, Decentraland, Cryptovoxels, and (to a certain extent) Second Life? Or is it a collection of smaller worlds, like VRChat, Rec Room, Sansar, and Sinespace? If it is the former, then means of transportation (and ease of access to transportation) becomes more important. If it is the latter, then another tool which many of the newer social VR platforms offer is the ability to create a portal—either temporary or permanent— between two worlds. (Of course, you could consider a teleporter hub a portal.)
So, keeping all this in mind (particularly the distinction between SHORT HOP teleporting and teleporting to a DISTANT location), we can create a chart outlining the transportation affordances of the various metaverse platforms:
Name of Platform (Layout)
Distance Teleport? **
Create Portals? †
Second Life (mostly one contiguous landmass, with private islands)
Sinespace (separate worlds)
Sansar (separate worlds)
NO (but you can create teleport hubs)
VRChat (separate worlds)
Rec Room (separate worlds)
AltspaceVR (separate worlds)
NeosVR (separate worlds)
Cryptovoxels (one contiguous landmass with some islands)
NO (you can add coordinates to a URL, though)
Decentraland (one contiguous landmass)
YES (/goto X,Y)
Somnium Space (one contiguous landmass)
NO (but there are teleport hubs)
NO (unless you count teleport hubs)
* – Can a user walk/run/fly/swim from one location to another? This includes SHORT HOP teleporting. ** – Can a user personally choose to teleport from one location to a second, DISTANT location? † – Can a user create a temporary or permanent portal from one location to another?
Obviously, all metaverse platforms offer some form of personal locomotion for your avatar (walk, run, fly, swim, short-hop teleporting, etc.). This is standard.
It is also clear from this table that the metaverse platforms which consist of many smaller worlds (Sinespace, Sansar, VRChat, Rec Room, AltspaceVR, and NeosVR) all prefer the creation of temporary and permanent portals to allowing users to teleport great distances on their own steam. On the other hand, all the social VR platforms and virtual worlds which consist of one contiguous landmass tend to allow some form of teleportation across great distances.
You will notice that Cryptovoxels uses a rather brute-force method of “teleporting”, which consists of appending the coordinates to the end of the URL you enter into your web browser client (which are much the same as the coordinates which form part of the SLURLs used in Second Life, but not nearly as convenient in my opinion).
So, what do you think? Have I made an error in my table? Do you have an opinion about the benefits of teleporting and portals versus walking around and exploring the landscape? I’d love to hear your opinions, so please leave a comment, thank you!