NOTICE: In this blogpost, I go into sometimes great detail about how these three generative AI tools work, comparing them in two ways:
– comparing how these tools work with the exact same text prompt; and
– comparing how they worked in August 2025 versus February 2026.
There’s an executive summary (Section 4) at the very bottom of this long, loooong blog post if you just want to skip to the highlights, and ranking.
If you need an introduction or a refresher, you might want to read this blogpost first: An Introduction to Artificial Intelligence in General, and Generative AI in Particular, which includes slides from lectures I gave on the topic in November and December of 2025.
SECTION 1: Introduction
In his 2024 book Co-Intelligence (still my go-to layperson’s guide to generative AI), Ethan Mollick says that one of the best ways to determine how well a particular generative AI tool works is to ask it questions about a subject that you already are an expert in. Why? Because it will be much easier for you, the human expert in the topic, to find errors and hallucinations in the answers.
Since last summer, I have been typing the exact same prompt into the “big three” general-purpose GenAI tools Ethan recommends: OpenAI’s ChatGPT, Anthropic’s Claude, and Google Gemini. I have been meaning to write a blogpost about my experiences with this first round of testing since September, but I have been too occupied with my paying job as an academic librarian to find an opportunity to do so—until now. (Please note that I have been using an em-dash, correctly, for many years before generative AI came along!)
So, today I decided to redo my original text prompt, using the latest versions of these three GenAI tools as outlined by Ethan in the latest edition of his AI Guide, which has posted to his Substack newsletter on Feb. 17th, 2026 (here’s a link).

I consider his advice to be quite valuable, as he seems to spend a lot of time working with the most popular and powerful GenAI tools, and keeping on top of the changes and advances in the technology. In this newest edition of his AI Guide, he discusses the shift from chatbots (where you have a conversation with the tool) to agents (where you give a specific, defined task with instructions to the tool, and it goes away and does the task and returns with results).
In all cases, the initial text prompt is the following:
What are some characteristics common to all metaverse platforms? How do these characteristics apply to social VR platforms? Please give me a chart comparing these characteristics for the most popular social VR platforms.
Please note that I have deliberately given the task of defining “popular,” and picking the social VR platforms, over to the generative AI tool (and I got some rather interesting results back!). Because I consider myself an expert on social VR and the metaverse, I should be able to spot inaccuracies, errors, or outright hallucinations in the responses I get back from these GenAI tools. In the next section (section 2), I compare and contrast the results I received from the above text prompt from:
- Claude by Anthropic
- ChatGPT by OpenAI
- Gemini by Google
All three of these tools come with different versions. In all cases, I will use the most powerful version recommended by Ethan Mollick in his latest AI Guide I linked to above (but please note that in at least one case, I had made a mistake and not selected the correct option, as you will see below with Claude in Sections 2 and 3):
- Claude Opus 4.6 Extended Thinking
- ChatGPT 5.2 Thinking
- Gemini 3.0 Pro Deep Research
In addition, in section 3 of this long blogpost, I will very briefly compare and contrast the results I received when I first ran this text prompt through all three GenAI tools on August 7th, 2025, with what I received when I ran them again on Feb. 18th, 2026.
All comparison charts in the February 2026 results in sections 2 and 3 will include some quick stats in a small table under each generative AI tool discussed, namely:
- the number of characteristics common to all metaverse platforms (and their names); and
- the number of social VR platforms in the comparison chart (and their names).
Section 4, the final section, contains my overall thoughts after spending a day working with these tools, and a ranking of how well I think these GenAI tools accomplished the given task.
SECTION 2: Comparing Searches Done Feb. 18th, 2026
Feb. 18th, 2026: Claude Opus 4.6 (and Cowork)
First up is Claude. I did this prompt two ways: once via the chatbot interface on the Claude website, and a second time using the Claude app and the new Cowork agent feature. (I was prompted to download and install the Claude app on my Mac, and authenticate using my email address.) First, the chatbot version:

This first report I got back compared eight metaverse characteristics between eight platforms:
| 8 Metaverse Characteristics | 8 Social VR Platforms |
| Persistent Virtual Environments Real-Time Interactivity User Identity/Avatars Social Presence & Co-Experience User-Generated Content Virtual Economy Cross-Platform Accessibility Interoperability | VRChat Rec Room Meta Horizon Worlds Resonite Second Life Spatial ChilloutVR NeosVR |
Well, right off the bat, I see some problems. First, Second Life is not social VR. Second, it included both Resonite and NeosVR (although Claude told me, “I included both since NeosVR still has historical relevance, but noted it as legacy since the core team transitioned to Resonite”). However, that isn’t a good enough reason to include it in the table.
Then, I turned to the Claude app (which was suggested to me when I did the first text prompt above, so I downloaded and installed it on my MacBook Pro). Then I selected the Cowork (agent) tab along the top three tabs as suggested by Ethan, and I entered the exact same text ptompt:

After beavering away for a few minutes, it gave me the following result:

And when I click on the Open in Firefox button, I get this neatly formatted table (I’m not crazy about the chosen colour scheme, but that’s a minor quibble). It looks good at first:

However, the output, which might look impressive at first, is only as good as the quality of the sources used in its research. If the good information is locked behind a paywall (and therefore, not able to be scraped to add to its knowledge base), then the GenAI tool will use freely-available sources on the web, which can vary quite a bit in quality! There is an acronym in computer science called GIGO: Garbage In, Garbage Out, and I am reminded of this when I decide to take a closer, more critical look at the six sources listed.
All of them were non-academic sources, mostly generic market overviews from websites that I had never heard of before. The six sources included my own list of metaverse platforms on this blog (which is just a list, and doesn’t give any details about the platforms). While I’m flattered they included me, I expected something…more. And I absolutely hated that they mentioned cryptocurrencies, blockchain, DAOs, and NFTs, and included Somnium Space and Decentraland in the resulting table. While Somnium Space is social VR, Decentraland in absolutely not, and I have made my opinions on blockchain-based metaverse platforms very clear in the past on this blog.
| 8 Metaverse Characteristics | 6 Social VR Platforms |
| Persistence Immersion & Presence User-Generated Content Built-In Economy Social Interaction Interoperability Digital Ownership Decentralized Governance | VRChat Meta Horizon Worlds Rec Room Engage VR Decentraland Somnium Space |
In fact, I was so dissatisfied with this report that I went back into the Claude Cowork app, and added a qualifier, and made sure that I had turned on Extended Thinking! (I’m almost positive I did that the first time around, but maybe I forgot, and unfortunately, once you’ve done your prompt, the results don’t tell you what modes you used in asking the original question.)

Only to get pretty much the same result: a pretty table with only six websites listed as sources! So much for being more specific and asking for Extended Thinking.

| 10 Metaverse Characteristics | 6 Social VR Platforms |
| Persistence Immersive 3D Environments User Identity & Avatars Real-Time Social Interaction User-Generated Content Economy & Monetization Cross-Platform Access Scalability & Concurrency Safety & Moderation Interoperability | VRChat Rec Room Meta Horizon Worlds Resonite ChilloutVR Engage VR |
While better thatn the previous round, I am actually disappointed in the results I received from Claude Cowork. But read on; in section 3, I have an update on what I think went wrong here!
Feb. 18th, 2026: ChatGPT 5.2 Thinking
Next, I turned to OpenAI’s ChatGPT, using the ChatGPT 5.2 Thinking mode suggested by Ethan:

And I got back the following table. comparing six social VR platforms on ten metaverse characteristics:

While the resulting table might not be as pretty as the one produced by Claude Opus 4.6 Cowork, I appreciate that there are actual citations which you can hover over and click through to actually see the source material behind the comparison chart entries (and not just a list of websites checked, tacked on to the end). Also, ChatGPT seems to have checked a lot more sources than Claude, and made some sort of attempt to find authoritative sources (often, from the metaverse product’s own online documentation, as shown in this example).

| 10 Metaverse Characteristics | 6 Social VR Platforms |
| Shared Multi-User Spaces Avatars/Embodied Identity Real-Time Voice/”Hangout” Core Loop Persistence (Account, Inventory) User-Generated Worlds In-World Creation Tools Scripting Economy & Monetization Cross-Platform Access Safety Governance | VRChat Rec Room Meta Horizon Worlds Bigscreen Beta Spatial Resonite |
Overall, I think that ChatGPT 5.2 Thinking gave me a better answer than Claude…but as we will see later on, it doesn’t compare to the best results I got from my day of testing and retesting. Let’s move on to the third of Ethan Mollick’s recommended, general-purpose GenAI tools, Google’s Gemini:
Feb. 18th, 2026: Gemini 3 Pro (first without, and then with, Deep Research)
The first go-round, I selected Gemini 3 Pro mode, as Ethan suggested:

And I got a resulting table comparing three social VR platforms across seven characteristics:

| 7 Metaverse Characteristics | 3 Social VR Platforms |
| Core Philosophy Visual Style Creation Tools Hardware Access Target Audience Economy “Metaverse” Strength (?!) | VRChat Rec Rooom Meta Horizon Worlds |
I was so unhappy with this first Gemini result that I redid the prompt, this time making sure that I turned on the Deep Thinking mode, just to see if I would get better results, or even some actual citations to sources used:

Wow, what a difference!!

This time around, the task took a lot longer than either Claude or ChatGPT, and it included what appears to be extremely detailed feedback on what was happening behind the scenes (this seems to be turned on by default, and I’m not certain if this mode could have been enabled on Claude or ChatGPT):

And the report I got back was worth the longer wait:

And, at the end, not one but three comparison charts!

Here’s the quick stats, from all three tables in the final report (and notice how technical many of these “metaverse characteristics” are, compared to the other results!):
| 12 Metaverse Characteristics | 5 Social VR Platforms |
| Engine Core Scripting Language Persistence Type Asset Pipeline Audio Engine Economic Model Currency Identity System Tracking Support Instance Cap Network Model Culling Tech | VRChat Rec Room Roblox Meta Horizon Worlds Resonite (only mentioned in one table) |
SECTION 3: Comparing August 2025 Prompt Results with the February 2026 Ones
I also wanted to compare the results I when I did the testing last year (August 7th, 2025) with the results I got today (Feb. 18th, 2026) with all three GenAI tools. This was very enlightening.
Then Versus Now: Claude
You will understand why I was so disappointed with today’s results, when you see what the results were when I did the same prompt last year (dated August 7th, 2025):

The report I got back was extremely detailed, with actual citations to sources! I still don’t understand why I got such dramatically different—and worse—results. The difference is so astounding to me that I began to wonder if I had done something wrong this time around.
It was then that I realized that I had literally forgotten to turn on Research mode in the left-hand drop-down menu (previously, I had only had Web Search mode turned on):

So I went to check the Claude app, to see if there was that option available, and, of course, it was there—but under the Chat tab, not the Cowork tab!! So perhaps Cowork still has some user interface bugs to work out. Perhaps sending everything to an agent isn’t the better option; certainly, not in this case!!
Once I had selected both Research and Web Search from the left drop-down menu, and Opus 4.6 Extended from the right drop-down menu, I hit send and waited…until I got a message that I had used up all my credits on my $20-a-month plan!!!
AAAAAAAAAAAAAARGH!!!!

By this point, I was so frustrated with Claude that I simply exited the app. I had had enough frustration for one day.
The next morning, February 19th, 2026, after my daily credits reset at 6:00 pm, I once again tried my prompt with Claude Opus 4.6 Extended Thinking, with both Research and Web Search turned on (using the Gemini app I had installed on my Mac, as opposed to the web version; they appear to be identical in terms of features).

Right off the bat, I got a better response (and Claude even remembered that I was going to working on an OER about the metaverse!):


Again, similar to Google Gemini, I had a bit of wait while Claude did its thing. I actually preferred that Gemini actually gave better descriptions of what it was doing while it was going about its task, as opposed to…well, no updates from Gemini other than me sitting and staring at an animated cursor:
Ten minutes later, I got the detailed report I wanted in the first place, and which Claude Cowork stubbornly refused to give me:

The response back included a concise summary taken from the sources examined:

The final report included citations to the academic literature (which I could hover over and click on to go to the source, see the red arrow below), and it cited experts in the field such as Matthew Ball and Tim Sweeney. It’s pretty much all I wanted, and it compares quite favourably to the similarly detailed report from Google Gemini, in the previous section. I am happy.

And this was the only report which had a listing of metaverse characteristics, separate from the ones used in the social VR platforms comparison chart:

Here’s the quick stats from the comparison chart. As you can see, there are some problems here, with the inclusion of platforms which are clearly not social VR (e.g. Second Life) and platforms that no longer exist (Altspace shut down on March 10th, 2023). These sort of mistakes make we wonder about the accuracy and currency of the report overall.
| 9 Metaverse Characteristics | 9 Social VR Platforms |
| Persistence Synchronous Real-Time Massive Scale/Concurrency Cross-Platform Access Virtual Economy User-Generated Content Interoperability Avatar/Identity Systems Immersive 3D/Spatial Computing Open Standards/Decentralization Spanning Physical-Digital Ethical Goivernance/Accessibility | VRChat Horizon Worlds (note: old name used) Rec Room Resonite ChilloutVR AlspaceVR (was shut down) Second Life (not social VR!) Roblox Fortnite (not social VR!) |
Then Versus Now: ChatGPT
An interesting difference between the August 2025 report from ChatGPT and today’s report is this: in last year’s report, for whatever reason, the tool asked me a follow-up question to clarify what was wanted (I did use the Deep Research feature in the 2025 report, as well):

Based on that clarification prompted by ChatGPT, I actually think I preferred the 2025 report format over this new one. So why didn’t ChatGPT 5.2 Thinking ask me any follow-up questions this time around? And that’s part of the frustration with tese tools; the way that they operate is still very much a black box, where you don’t understand how the tool is processing what you ask of it.
Then Versus Now: Gemini
The last comparison is between the Google Gemini report I produced on August 7th, 2025, and today’s report. One thing I noticed about the Aug. 7th report is how hard it tried to shoehorn in an overarching narrative into the final result, in a way that seemed a bit hamfisted, frankly. But the result was still a very detailed report with an extensive list of citations, comparable to today’s report. I prefer today’s version.
SECTION 4: Executive Summary and Ranking
This is going to be concise, I promise! Five points.
First, while we might be entering what Ethan Mollick calls “the agentic era,” my experience today shows that simply handing something off to an agent, as opposed to the back-and-forth conversation with a chatbot interface, does not always give the best result. In particular, Claude Cowork gave me terrible results, and eventually, I ran out of daily use credits to actually run the report I wanted in the first place.
Second, the user interface for these GenAI tools is awful and NON-intuitive. Hiding critical options like Deep Research under drop-down menus, and not making it clear what options have been selected when you do a text prompt, is a major problem. All three companies need to hire some good user interface/user experience staff. If I, with decades of computer experience and a goddamn computer science degree, can’t figure this shit out, God help the average non-technical user—and isn’t that what the point of generative AI is supposed to be, to make it easier for the user to do things??
Third, when these tools work, they are astoundingly good (the Gemini 3.0 Pro report with Deep Research turned on, and the Claude Opus 4.6 report with Research, Web Search, and Extended Thinking turned on). But when they don’t, they can still fail spectacularly (Claude Cowork). So you still have to be the human in the loop here, to figure out when you get a good result versus a bad one. What is frustrating is that all these GenAI tools operate in a black box, with only Gemini making some attempt at explaining what it was doing, as it was doing it.
Fourth, as Ethan himself said in his latest AI Guide:
The top models are remarkably close in overall capability and are generally “smarter” and make fewer errors than ever. But, if you want to use an advanced AI seriously, you’ll need to pay at least $20 a month (though some areas of the world have alternate plans that charge less). Those $20 get you two things: a choice of which model to use and the ability to use the more advanced frontier models and apps. I wish I could tell you the free models currently available are as good as the paid models, but they are not.
In other words, you get what you pay for. And sometimes, even the $20-a-month level isn’t enough, as seen with my experience on Feb. 18th with Claude (and yes, using the cutting-edge features does eat into your usage limits pretty quickly, as I learned to my chagrin).
Finally, I have found that the one of the best ways to see where the strengths and weaknesses of these GenAI tools is to enter the exact same text prompt into each of them, and then compare and contrast the results you get back. However, that approach is gonna cost you at least US$60 a month, so it might not be worth it to you. (And will I be doing this forever? No; at some point, I will just pick one or perhaps two tools and cancel my subscriptions to the rest of them.)
So, in this current round of testing, I would rank the results as follows (separating the results from Claude into the chatbot-generated report and the Cowork report):
- Google Gemini 3.0 Pro (with Deep Research turned on) provided me with a very detailed report with citations, as well as giving me a detailed play-by-play on how it was answering my query, which I really appreciated.
- Claude Opus 4.6 report (with Research, Web Search, and Extended Thinking turned on) also gave me a detailed report with citations, but several errors in the comparison chart made me question the overall quality and currency of the report. I also really hated how I had to futz around to get the results I really wanted!
- ChatGPT 5.2 Thinking is in a clear third place, in my opinion. Not bad, but not as detailed a result as Gemini and Claude provided.
- Claude Opus 4.6 Cowork, with perhaps the prettiest output but easly the least substantial result, using lower-quality sources of information, clearly failed at this task. For those reasons, I ranked it in last place. Ethan’s “Agentic Era” might be true for some applications, but certainly not this one!
I have found these little excursions into generative AI to be quite enlightening, and they have definitely given me some new ideas of topics to explore when I begin my research and study leave to write an OER about the metaverse. Hopefully, you found it enlightening, too. Please go subscribe to Ethan Mollick’s free Substack newsletter; he tends to update his AI Guide recommendations fairly regularly, and it’s really the best way too stay on top of a rapidly changing and evolving field!

