Editorial: Artificial Intelligence and the Metaverse (And a Look at an AI-Assisted Social VR Platform, Riff XR)

I created this image using OpenAI’s DALL-E generative AI art generation tool, using the text prompt “artificial intelligence in the metaverse” (source)

Housekeeping Note: I first started writing this editorial back in April, and from time to time I have picked up the draft, tinkered with it a bit more, added a bit more to it—and then promptly filed it away again as a draft, because I still wasn’t satisfied with it, and I always felt that I had something more to say.

Enough. I finally decided that the perfect was the enemy of the good, and I decided today to just go ahead and publish what I already had, and then write follow-up blogposts on the topic of AI in general, and AI in the metaverse in particular. And I do expect that I will return to this topic often! So please stay tuned.

I have written before on this blog about artificial intelligence (AI) applications, such as the image manipulation and animation tools WOMBO and Reface, the text-to-art creation programs DALL-E 2, Midjourney, and Stable Diffusion, and most recently, the AI-powered chatbot Replika and the text-generation app ChatGPT. Most people, myself included, treated them as toys, mere curiosities (I entertained myself for hours making my Second Life and Sansar avatars “come alive” using WOMBO). John Hermann, in a recent article for New York magazine titled The AI Magic Show (original; archived version), wrote:

In 2022, artificial-intelligence firms produced an overwhelming spectacle, a rolling carnival of new demonstrations. Curious people outside the tech industry could line up to interact with a variety of alluring and mysterious machine interfaces, and what they saw was dazzling.

The first major attraction was the image generators, which converted written commands into images, including illustrations mimicking specific styles, photorealistic renderings of described scenarios, as well as objects, characters, textures, or moods. Similar generators for video, music, and 3-D models are in development, and demos trickled out.

Soon, millions of people encountered ChatGPT, a conversational bot built on top of a large language model. It was by far the most convincing chatbot ever released to the public. It felt, in some contexts, and especially upon first contact, as though it could actually participate in something like conversation. What many users suggested felt truly magical, however, were the hints at the underlying model’s broader capabilities. You could ask it to explain things to you, and it would try — with confident and frequently persuasive results. You could ask it to write things for you — silly things, serious things, things that you might pass off as work product or school assignments — and it would.

As new users prompted these machines to show us what they could do, they repeatedly prompted us to do a little dirty extrapolation of our own: If AI can do this already, what will it be able to do next year?

As Charlie Wurzel writes in The Atlantic, in a recent article titled What Have We Just Unleashed? (original; archived version), not even the AI experts know exactly what will come next:

Over the past few weeks, I’ve put questions like these to AI researchers, academics, entrepreneurs, and people who are currently building AI applications. I’ve become obsessive about trying to wrap my head around this moment, because I’ve rarely felt less oriented toward a piece of technology than I do toward generative AI. When reading headlines and academic papers or simply stumbling into discussions between researchers or boosters on Twitter, even the near future of an AI-infused world feels like a mirage or an optical illusion. Conversations about AI quickly veer into unfocused territory and become kaleidoscopic, broad, and vague. How could they not?

The more people I talked with, the more it became clear that there aren’t great answers to the big questions. Perhaps the best phrase I’ve heard to capture this feeling comes from Nathan Labenz, an entrepreneur who builds AI video technology at his company, Waymark: “Pretty radical uncertainty.”


You might be surprised to learn that we librarians are already preparing for the AI onslaught (perhaps not surprising, as some of the users we serve have already come to us with fake, AI-generated citations to information resources which do not exist!). I have seen no less than three events for Canadian librarians on this subject in the past couple of weeks, and at my employer, the University of Manitoba in Winnipeg, a fellow librarian has launched an AI Community of Practice (CoP), which will meet regularly to discuss ChatGPT and other AI apps and topics.

Librarians meeting via Zoom at the inaugural meeting of the brand-new University of Manitoba Libraries’ AI Community of Practice (I’m in the top row, second from the left!)

You.com is one of the newer entrants in the AI-powered search engine marketplace, which also offers AI-generated text and images as well. You can actually try out YouWrite (their AI-generated text generator) for free ten times before you have to shell out for the service, so I decided to give it a whirl, asking the site to create a professional-sounding blogpost about the impact of artificial intelligence on the metaverse:

And here’s the blogpost YouWrite spit back, exactly as generated, with zero edits:

The advent of artificial intelligence (AI) has revolutionized the way we interact with technology. AI is now being used to create virtual worlds, or “metaverses”, that are populated by digital avatars and objects that can be manipulated in real-time. As AI continues to evolve, it will have a profound impact on the metaverse and its inhabitants.

One of the most significant impacts of AI on the metaverse will be increased realism for users interacting within these virtual worlds. With advanced machine learning algorithms and natural language processing capabilities, computer-generated characters can become more lifelike than ever before – responding naturally to user input in real time while exhibiting realistic emotions such as joy or sadness depending on their environment. This could potentially lead to an unprecedented level of immersion for those engaging with these digital realms as they become indistinguishable from reality itself!

Another major advantage brought about by artificial intelligence is improved efficiency when it comes to managing large numbers of players at once – something which becomes particularly important when dealing with massive multiplayer online games (MMOs). By utilizing intelligent agents capable of autonomous decision making based off data collected from other players’ actions, developers can ensure smooth gameplay even during peak hours without sacrificing any quality due performance issues caused by too many people playing at once!

Finally, one area where AI could really shine in regards to improving user experience within a metaverse would be through automated content creation tools powered by deep learning models trained using vast amounts existing assets found across different platforms such as video games or movies etcetera… Such systems would allow game designers/developers quick access powerful resources allowing them craft unique experiences much faster than was previously possible – thus providing gamers exciting new adventures every time they log into their favourite world!

In conclusion then; It’s clear see how Artificial Intelligence stands revolutionize our concept what constitutes an immersive experience both inside out Metaverses – giving us greater control over our interactive environments alongside richer storylines filled dynamic characters all thanks advances this cutting edge technology brings us closer achieving true Virtual Reality today tomorrow alike…

That last paragraph is absolutely dreadful (almost as if the AI had a stroke!), but the rest reads well, and the text contains a few somewhat plausible, if borderline-fanciful ideas, but if I could sum it all up in one word, that word would be: bullshit. This is computer-generated bullshit, trained on a large language model to sound like an actual human expert, but it’s just parroting human writing, without any grasp of the knowledge it is talking about! (I can’t remember who said it first, but somebody once memorably and hilariously referred to AI-generated text apps like ChatGPT as “mansplaining as a service.” 😜 In fact, I would go so far as to state that generative AI tools like ChatGPT offer white, cisgender, mansplaining as a service! All the biases in the mountains of data—scraped off the greater unwashed internet—used to train these tools sometimes comes out in their responses, despite the best efforts of the companies building them to eradicate these biases.)

Despite appearances, Chat GPT doesn’t really understand the world the way a human brain, with all of its lived experiences, does; it only understands how to generate plausible-sounding sentences and assemble them in coherent paragraphs! It’s a narrowly-defined problem, not general AI that is good at a variety of tasks, and certainly not a rival to humans.


Hermann, in his New York magazine article, paints a somewhat disquieting picture of what could happen in the future, as the AI wave accelerates:

Models trained on flawed, biased, and often secret sets of data will be used to attempt to perform an assuredly ambitious range of tasks, jobs, and vital economic and social processes that affect the lives of regular people. They will depend on access to massive amounts of computing power, meaning expensive computer hardware, meaning rare minerals, and meaning unspeakable amounts of electricity. These models will be trained with the assistance of countless low-paid labourers around the world who will correct bogus statistical assumptions until the models produce better, or at least more desirable, outputs. They will then be passed on for use in various other workplaces where their outputs and performances will be corrected and monitored by better-paid workers trying to figure out if the AI models are helping them or automating them out of a job, while their bosses try to figure out something similar about their companies. They will shade our constant submissions to the vast digital commons, intentional or consensual or mandatory, with the knowledge that every selfie or fragment of text is destined to become a piece of general-purpose training data for the attempted automation of everything. They will be used on people in extremely creative ways, with and without their consent.

Charlie Warzel goes even further, likening the potential impact of artificial intelligence to that of nuclear fission and nuclear war:

Trying to find the perfect analogy to contextualize what a true, lasting AI revolution might look like without falling victim to the most overzealous marketers or doomers is futile. In my conversations, the comparisons ranged from the agricultural revolution to the industrial revolution to the advent of the internet or social media. But one comparison never came up, and I can’t stop thinking about it: nuclear fission and the development of nuclear weapons.

As dramatic as this sounds, I don’t lie awake thinking of Skynet murdering me—I don’t even feel like I understand what advancements would need to happen with the technology for killer AGI [Artificial General Intelligence] to become a genuine concern. Nor do I think large language models are going to kill us all. The nuclear comparison isn’t about any version of the technology we have now—it is related to the bluster and hand-wringing from true believers and organizations about what technologists might be building toward. I lack the technical understanding to know what later iterations of this technology could be capable of, and I don’t wish to buy into hype or sell somebody’s lucrative, speculative vision. I am also stuck on the notion, voiced by some of these visionaries, that AI’s future development might potentially be an extinction-level threat.

ChatGPT doesn’t really resemble the Manhattan Project, obviously. But I wonder if the existential feeling that seeps into most of my AI conversations parallels the feelings inside Los Alamos in the 1940s. I’m sure there were questions then. If we don’t build it, won’t someone else? Will this make us safer? Should we take on monumental risk simply because we can? Like everything about our AI moment, what I find calming is also what I find disquieting. At least those people knew what they were building.

The point these authors are making is that, with AI, we are dealing with something which has the potential to dramatically impact (and, in some cases, up-end) our current society, in ways which might not be readily apparent at first.

Amy Castor and David Gerrard, who have been busy dissecting and critiquing the ongoing three-ring circus that is blockchain, crypto, and NFTs, have turned their attention to artificial intelligence, in a two-part series (part one; part two). I strongly suggest you read both blogposts, but here’s a sample:

Much like crypto, AI has gone through booms and busts, with periods of great enthusiasm followed by AI winters whenever a particular tech hype fails to work out.

The current AI hype is due to a boom in machine learning — when you train an algorithm on huge datasets so that it works out rules for the dataset itself, as opposed to the old days when rules had to be hand-coded.

ChatGPT, a chatbot developed by Sam Altman’s OpenAI and released in November 2022, is a stupendously scaled-up autocomplete. Really, that’s all that it is. ChatGPT can’t think as a human can. It just spews out word combinations based on vast quantities of training text — all used without the authors’ permission.

The other popular hype right now is AI art generators. Artists widely object to AI art because VC-funded companies are stealing their art and chopping it up for sale without paying the original creators. Not paying creators is the only reason the VCs are funding AI art.

Do AI art and ChatGPT output qualify as art? Can they be used for art? Sure, anything can be used for art. But that’s not a substantive question. The important questions are who’s getting paid, who’s getting ripped off, and who’s just running a grift.

OpenAI’s AI-powered text generators fueled a lot of the hype around AI — but the real-world use case for large language models is overwhelmingly to generate content for spamming. [Vox]

The use case for AI is spam web pages filled with ads. Google considers LLM-based ad landing pages to be spam, but seems unable or unwilling to detect and penalize it. [MIT Technology Review; The Verge

The use case for AI is spam books on Amazon Kindle. Most are “free” Kindle Unlimited titles earning money through subscriber pageviews rather than outright purchases. [Daily Dot

The use case for AI is spam news sites for ad revenue. [NewsGuard]

The use case for AI is spam phone calls for automated scamming — using AI to clone people’s voices. [CBS]

The use case for AI is spam Amazon reviews and spam tweets. [Vice]

The use case for AI is spam videos that advertise malware. [DigitalTrends]

The use case for AI is spam sales sites on Etsy. [The Atlantic, archive]

The use case for AI is spam science fiction story submissions. Clarkesworld had to close submissions because of the flood of unusable generated garbage. The robot apocalypse in action. [The Register]

You can confidently expect the AI-fueled shenanigans to continue.


Riff XR: Artificial Intelligence in the Metaverse

However, there have some rather interesting specific applications of AI to the metaverse. A brand-new social VR platform called Riff XR offers a tantalizing (if still somewhat buggy) glimpse of the AI-assisted metaverse of the future.

Among the AI-assisted features of Riff XR are NPC (non-playing characters, i.e. bots) with whom you can have surprisingly open-ended conversations, as well as a “cutting-edge Stable Diffusion-powered Generative Art System”:

Now, I have not visited Riff XR myself (yet), but a good friend of mine, metaverse videographer Carlos Austin, has, and he posted a video of his explorations on this new metaverse platform, including verbal conversations with a number of NPCs using generative AI to “listen” and “respond” to his spoken sentences.

One was a constable droid roaming the night-time central plaza in Riff XR, a scene straight out of Ready Player One; another played the role of Vincent Van Gogh in an exhibition of AI-generated artworks in a museum just off the plaza; a third was a woman, named Molly Millions, working at the back bar in a cyber-disco with pulsating music and gyrating NPCs of various kinds, with whom Carlos had a surprisingly in-depth conversation about cocktails!

Carlos demonstrated that you could even speak to these NPCs in different languages including German, Japanese, and Spanish (although let me just add, that the faux Van Gogh’s German accent was absolutely atrocious!). Here’s his full video (please fast-forward through all the technical bugs and mishaps; Riff XR is still quite buggy!). Carlos’ conversation with Molly Millions is nearer the end of this video:

We can expect to see more such applications of artificial intelligence coming soon (and perhaps sooner than we might expect!) to a virtual world or social VR platform near you. And you can expect more blogposts from me on this topic in future, as the technology continues to develop and evolve over time. Stay tuned!


Many thanks To Jim Carnicelli (a.k.a Galen from Sansar), with whom I had a couple of wide-ranging online discussions via Discord on the topic of AI while I was working on this blogpost over the summer! While I did not use many of the ideas we talked about, they did give me much food for thought (and possible topics for future blog posts!). You can visit Jim’s store selling his AI-generated artwork here: Snuggle Hamster Designs.

Comparing and Contrasting Three Artificial Intelligence Text-to-Art Tools: Stable Diffusion, Midjourney, and DALL-E 2 (Plus a Tantalizing Preview of AI Text-to-Video Editing!)

HOUSEKEEPING NOTE: Yes, I know, I know—I’m off on yet another tangent on this blog! Please know that I will continue to post “news and views on social VR, virtual worlds, and the metaverse” (as the tagline of the RyanSchultz.com blog states) in the coming months! However, over the next few weeks, I will be focusing a bit on the exciting new world of AI-generated art. Patience! 😉

Artificial Intelligence (AI) tools which can create art from a natural-language text prompt are evolving at such a fast pace that it is making me a bit dizzy. Two years ago, if somebody had told me that you would be able to generate a convincing photograph or a detailed painting from a text description alone, I would have scoffed! Many felt that the realm of the artist or photographer would be among the last holdouts where a human being was necessary to produce good work. And yet, here we are, in mid-2022, with any number of public and private AI initiatives which can be used by both amateurs and professionals to generate stunning art!

In a recent interview by The Register‘s Thomas Claburn of David Holz (the former co-founder of augmented reality hardware firm Magic Leap, who founded Midjourney), there’s a brief explanation of how this burst of research and development activity got started:

The ability to create high-quality images from AI models using text input became a popular activity last year following the release of OpenAI’s CLIP (Contrastive Language–Image Pre-training), which was designed to evaluate how well generated images align with text descriptions. After its release, artist Ryan Murdock…found the process could be reversed – by providing text input, you could get image output with the help of other AI models.

After that, the generative art community embarked on a period of feverish exploration, publishing Python code to create images using a variety of models and techniques.

“Sometime last year, we saw that there were certain areas of AI that were progressing in really interesting ways,” Holz explained in an interview with The Register. “One of them was AI’s ability to understand language.”

Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [Holz pointed to developments like transformers, a deep learning model that informs CLIP, and diffusion models, an alternative to GANs [models using Generative Adversarial Networks]. “The one that really struck my eye personally was the CLIP-guided diffusion,” he said, developed by Katherine Crawson…

If you need a (relatively) easy-to-understand explainer on how this new diffusion model works, well then, YouTube comes to your rescue with this video with 4 explanations at various levels of difficulty!


Before we get started, a few updates since my last blogpost on A.I.-generated art: After using up my free Midjourney credits, I decided to purchase a US$10-a-month subscription to continue to play around with it. This is enough credit to generate approximately 200 images per month. Also, as a thank you for being among the early beta testers of DALL-E 2, the AI art-generation tool by OpenAI, they have awarded me 100 free credits to use. You can buy additional credits in 115-generation increments for US$15, but given the hit-or-miss nature of the results returned, this means that DALL-E 2 is among the most expensive of the artificial intelligence art generators. It will be interesting to see if and how OpenAI will adjust their pricing as the newer competitors start to nip at their heels in this race!

And I can hardly believe my good fortune, because I have been accepted into the relatively small beta test group for a third AI text-to-art generation program! This new one is called Stable Diffusion, by Stability AI. Please note that if you were to try to get into the beta now, it’s probably too late; they have already announced that they have all the testers they need. I submitted my name 2-3 weeks ago, when I first heard about the project. Stable Diffusion is still available for researcher use, however.

Like Midjourney, Stable Diffusion uses a special Discord server with commands (instead of Midjourney’s /imagine, you use the prompt !dream, followed by a text description of what you want to see, plus you can add optional parameters to set the aspect ratio, the number of images returned, etc.). However, the Stable Diffusion team has already announced that they plan to move from Discord to a web-based interface like DALL-E 2 (we will be beta-testing that, too). Here’s a brief video glimpse of what the web interface could look like:


Given that I am among the relatively few people who currently have access to all three of the top publicly-available AI art-generation tools, I thought it would be interesting to create a chart comparing and contrasting all three programs. Please note that I am neither an artist nor an expert in artificial intelligence, just a novice user of all three tools! Almost all of the information in this chart has been gleaned from the websites of the projects, and online news reports, as well as the active subreddit communities for all three programs on Reddit, where users post pictures and ask questions. Also, all three tools are constantly being updated, so this chart might go very quickly out-of-date (although I will make an attempt to update it).

Name of ToolDALL-E 2MidjourneyStable Diffusion
CompanyOpenAIMidjourneyStability AI
AI Model UsedDiffusionDiffusionDiffusion
# Images Used
to Train the AI
400 millon“tens of millions”2 billion
User InterfacewebsiteDiscordDiscord (moving to website)
Cost to Usecredit system (115 for US$15)subscription (US$10-30 per month)currently free (beta)
Uses Text Promptsyesyesyes
Can Add Optional Argumentsnoyesyes
Non-Square Images?noyesyes
In-tool Editing?yesnono
Uncropping?yesnono
Generate Variations?yesyesyes (using seeds)
A comparison chart of three AI text-to-art tools: DALL-E 2, Midjourney, and Stable DIffusion

I have already shared a few images from my previous testing of DALL-E 2 and Midjourney here, here, and here, so I am not going to repost those images, but I wanted to share a couple of the first images I was able to create using Stable Diffusion (SD). To make these, I used the text prompt “a thatched cottage with lit windows by a lake in a lush green forest golden hour peaceful calm serene very highly detailed painting by thomas kinkade and albrecht bierstadt”:

I must admit that I am quite impressed by these pictures! I had asked SD for images with a height of 512 pixels and a width of 1024 pixels, but to my surprise, the second image was a wider one presented neatly in a white frame, which I cropped using my trusty SnagIt image editor! Also, it was not until after I submitted my prompt that I realized that the second artist’s name is actually ALBERT Bierstadt, not Albrecht! It doesn’t appear as if my typo made a big difference in the final output; perhaps for well-known artists, the last name alone is enough to indicate a desired art style?

Here are a few more samples of the kind of art which Stable Diffusion can create, taken from the pod-submissions thread on the SD Discord server:

Text prompt: “a beautiful landscape photography of Ciucas mountains mountains a dead intricate tree in the foreground sunset dramatic lighting by Marc Adamus”
Text prompt: “incredible wide screenshot ultrawide simple watercolor rough paper texture katsuhiro otomo ghost in the shell movie scene backlit distant shot”
Text prompt: “an award winning wallpaper of a beautiful grassy sunset clouds in the sky green field DSLR photography clear image”
Text prompt: “beautiful angel brown skin asymmetrical face ethereal volumetric light sharp focus”
Painting of people swimming (no text prompt shared)

You can see many more examples over at the r/StableDiffusion subreddit. Enjoy!

If you are curious about Stable Diffusion and want to learn more, there is a 1-1/2 hour podcast interview with Emad Mostaque, the founder of Stability AI (highly recommended!). You can also visit the Stability AI website, or follow them on social media: Twitter or LinkedIn.


I also wanted to submit the same text prompt to each of DALL-E 2, Midjourney, and Stable Diffusion, to see how the AI models in each would respond. Under each prompt you will see three square images: the first from DALL-E 2, the second from Midjourney, and the third from Stable Diffusion. (Click on each thumbnail image to see it in its full size on-screen.)

Text prompt: “the crowds at the Black Friday sales at Walmart, a masterpiece painting by Rembrandt van Rijn”

Note that none of the AI models are very good at getting the facial details correct for large crowds of people (all work better with just one face in the picture, like a portrait, although sometimes they struggle with matching eyes or hands). I would say that Midjourney is the clear winner here, although a longer, much more detailed prompt in DALL-E 2 or Stable Diffusion might have created an excellent picture.

Text prompt: “stunning breathtaking photo of a wood nymph with green hair and elf ears in a hazy forest at dusk. dark, moody, eerie lighting, brilliant use of glowing light and shadow. sigma 8.5mm f/1.4”

When I tired to generate a 1024-by-1024 image in Stable Diffusion, it kept giving me more than one wood nymph, even when I added words like “single” or “alone”, which is a known bug in the current early state of the program. I finally gave up and used a 512×512 image. The clear winner here is DALL-E 2, which has a truly impressive ability to mimic various camera styles and settings!

Text prompt: “a very highly detailed portrait of an African samurai by Tim Okamura”

In this case, the clear winner is Stable Diffusion with its incredible detail, even though, once again, I could not generate a 1024×1024 image because it kept giving me multiple heads! The DALL-E 2 image is a too stylized for my taste, and the Midjourney image, while nice, has eyes that don’t match (a common problem with all three tools).

And, if you enjoy this kind of thing, here’s a 15-minute YouTube video with 21 more head-to-head comparisons between Stable Diffusion, DALL-E 2, and Midjourney:


As I have said, all of this is happening so quickly that it is making my head spin! If anything, the research and development of these tools is only going to accelerate over time. And we are going to see this technology applied to more than still images! Witness a video shared on Twitter by Patrick Esser, an AI research scientist at Runway, where the entire scene around a tennis player is changed simply by editing a text prompt, in real time:


I expect I will be posting more later about these and other new AI art generation tools as they arise; stay tuned for updates!