I have used generative AI (GenAI) tools (specifically, Claude) in researching and writing up the Some Definitions section of this blogpost. The rest is all me, with no AI assistance (and yes, I have been using em-dashes long, long before ChatGPT came out in 2022!). And, of course, I do document my (mis)adventures with Elicit, NotebookLM, and Undermind here.
I freely admit that this was not the next blogpost I was planning to write, but as a follow-up to my previous detailed discussion of what I have started to call the “Big Three” of good (sometimes, good enough) general-purpose generative AI (GenAI) tools—ChatGPT, Claude, and Gemini—I wanted to write a little bit more about two particular subsets of GenAI tools which are focused on the academic research process. And, since I have two things coming up on my calendar which necessitate academic research, that is:
- first, a keynote speech I am delivering in the virtual world of Second Life on Friday, March 20th, as part of the 2026 Virtual Worlds Best Practices in Education (VWBPE) conference; and
- second, a one-year research and study leave, where I will be writing a special kind of textbook called an Open Educational Resource (OER) on the topic of the metaverse—
I figured, well, what better time to demonstrate some of these GenAI tools than with some real-world, real-life examples. from my own use of some of these new tools?
Some Definitions

These two categories of tools are:
- AI research assistants: tools specifically designed to help researchers search, discover, synthesize, and analyze academic and scientific literature. Each of them uses large language models (LLMs, i.e. GenAI) combined with scholarly databases (e.g. PubMed for medicine; AGRICOLA for agriculture), to help users find relevant papers, extract key findings, and synthesize evidence across studies. Examples of such tools are Elicit, Undermind, Consensus, and Assistant by Scite. Keep in mind such tools are only as good as the scholarly databases they access! For example, while Consensus proudly announces partnerships with major academic publishers like Sage, Wiley, Taylor & Francis and ACS on their front page, Elicit only seems to use freely-accessible sources like Semantic Scholar and OpenAlex, as you will see below.
- AI-powered document analysis tools: While AI research assistants search across published scholarly literature, GenAI-powered personal library/document analysis tools are built around the concept of “source-grounding” — you upload your own documents (e.g. PDFs of journal articles and conference papers, word processor documents, websites, YouTube videos, audio files, etc.) and then the GenAI tool works exclusively from those materials. They’re intended to help researchers make sense of a lot of information. The best-known of this relatively new category of GenAI tools is Google’s NotebookLM, but there are other products similar to it: Nouswise, and the open source tool Open Notebook.
To summarize the difference between the two: AI research assistants (Elicit, Consensus, etc.) help you discover literature, while AI-powered document analysis tools (NotebookLM, etc.) help you analyze and synthesize literature you’ve already collected. They occupy different stages of the research workflow.
Undermind
I currently have a Pro account with Undermind, at US$16 per month, which is one step up from their limited-use, free service. My initial question to Undermind was as follows:
I am researching the topic of the metaverse, both older virtual worlds (e.g. Second Life) and newer social VR/AR platforms (e.g. VRChat). I am interested in the history of the concept of the metaverse, and how the meaning of the term “metaverse” has evolved over time.
Undermind took this initial question, and asked a series of follow-up questions in order to clarify what I was looking for. Here’s part of that chat:

Eventually, I was able to come up with a more specific search, as follows:

The final question I finally sent Undermind off to work on was as follows:
Find academic literature on the history of platforms and user practices associated with what is now discussed as the metaverse, staying broad across decades. Focus on the history of virtual world platforms and how people used them, including older virtual worlds such as Second Life and newer social VR/AR platforms such as VRChat, while also including adjacent predecessor platforms that predate the coining and later popularity of the term “metaverse.” Emphasize user practices broadly rather than narrowing to a single type of practice, and help trace how the meaning of the term “metaverse” evolved over time in relation to these platforms and practices.
My search results were 80 papers which Undermind determined were relevant to my final question, covering a publication date range of 1970 to 2024:

Note, at the bottom of this screen capture, how Undermind actually went through and sorted these papers into eight broad categories or subtopics, in essence giving me a nice overview of these 80 published academic papers. This kind of context/overview work is something at which GenAI tools tend to excel, and it can save an academic researcher hours of work (but, of course, you still have to be the human in the loop, and actually read and digest all the papers retrieved!).
But even more important to note is how GenAI tools like Undermind mark a dramatic change in information retrieval: a shift away the from the sometimes-arduous task of using keyword searching, controlled thesaurus vocabulary, and Boolean logic to search traditional academic databases (e.g. PubMed and its MeSH or Medical Subject Headings), towards actually having a conversation with the search tool, starting with a plain English statement, and answering follow-up questions to clarify and refine that initial prompt into a final search question, then submitting it.
If you like what you see (and I did), you can click on the Generate Report button to start a new process, which prompts you:

I’d like to write a report based on papers from the search “History of metaverse platforms and practices”.
Let’s briefly discuss the content before you start writing.
And again, Undermind asks a helpful series of clarifying questions to help you figure out what you want from a report on all this research data it dredged up:

The final report (which I could save as a PDF or markdown file, using one of three citation styles), looked like this:

The resulting report had 36 citations. However, unlike the Elicit report, the Undermind report did not have a section where it got into the nuts-and-bolts of what sources it used to discover the papers used in this report, nor the method by which it selected them. So, while the initial read looked good, it would take actually getting and reading the full-text of the papers cited in this report to determine exactly how good it was.
Elicit
I also decided to spring for a Plus-level account on a tool similar to Undermind, Called Elicit (again, one step up from a free, Basic account, which offers a more limited service).
Having already done the Undermind search mentioned in the previous section of this blogpost, I decided to use the final search statement as my starting point, plugging it verbatim into Elicit to see what would happen…

…only to discover that Elicit doesn’t consider that a very concise search question, at all! (Actually, I kind of agree here. But Undermind let me do it, anyway!) However, instead of asking a series of follow-up questions like Undermind did, instead Elicit offered a series of buttons which, when pressed, rewrote the question to be much more narrowly focused, for example:

So, I clicked on the offered “Temporal and conceptual scope” button, and edited it a bit to include specific examples of what I was talking about, and hit the green Send button, using the default settings of research papers, and asking for a general review. Elicit then asks me what level of detailed answer I want (with the most detailed alternative greyed out unless I pony up more money for their Pro plan, one level up from my measly Plus plan):

I went with the Balanced report. However, I am not crazy about the limitations, especially when I could do a more traditional database search, using one of the over 650 databases offered by my university’s library service, without such petty limits as “the top 500 sources” (and, remember, that’s a ranking based on a newish GenAI computer algorithm, not keyword matching using a controlled thesaurus vocabulary and Boolean logic to construct a search strategy, the old-school way). Essentially, it’s a trade-off: a search using plain English language to start, with prompts to refine it, and a pre-limited number of sources examined—and with even more restrictions on the number of sources from which a comparative chart would be constructed (25). If you want more—and many users would want more—then you’ll have to pay extra for it.
However, for all of its limitations, the final report looked pretty good, at first. You can save a PDF version of the report, and you can even ask questions of it, via a chatbot interface (using the chat box located in the bottom-right corner of the screen capture below):

However, in doing a read of the PDF report, I was struck by several things:
- Again, the hard limit of 25 papers from which data was extracted, which essentially makes Elicit useless to me at this level;
- The fact that zero papers of the 500 selected were screened out by the selected criteria (see image below taken from the report: although, to be honest this technique probably would have worked much better for examining clinical research studies in, say, medicine, rather than looking for papers about metaverse platforms);
- The search was performed against “over 138 million academic papers from the Elicit search engine, which includes all of Semantic Scholar and OpenAlex,” but again, my librarian mind kept thinking that there would be a lot of full-text content that was locked away behind academic publisher paywalls. And indeed, of the 25 sources picked for this report, only 15 sources had the full text of the article retrieved. For the other ten sources, Elicit likely relied only on the (freely-available) author-provided abstract. And indeed, many of these GenAI research tools tend to rely on scraping free sources such as Semantic Scholar and OpenAlex, rather than enter into potentially expensive agreements with academic publishers such as Elsevier and Wiley, which would give their users full access to content they own, and frankly, more complete data from which to write reports.

I actually came away from reading this report more disturbed by its limitations than I was impressed by any conclusions it was able to draw. Again, I hasten to add that my real-world test case would probably have performed much better if I had an actual real-world use case that fit Elicit better (like a systematic review of clinical medical trials, for example). It might just be that my admittedly fuzzy subject area didn’t fit the way Elicit works, at all. And that’s fine.
However, what bothered me most was that somebody without my 30-plus years of academic library experience would run this report it, read it, nod, and think that this was a good response. Even worse, an in-depth response. When, in fact, a more traditional search against a library database (perhaps executed with the expertise of a professional librarian) would give much better and more thorough search results.
Even worse, how many of those Elicit users would stop here, and run with this summary, and not actually go and read the full text of the 25 papers that were selected for the report, not to mention the countless papers NOT included? I would suspect that it’s more than a few. So yeah, this academic librariam does have some reservations about where all this is headed. However, I can also confess that the report did give me a few new ideas to think about, and some possible new directions to follow in my own academic research, which I might not have found otherwise.
NotebookLM
Now, I turn to NotebookLM, Google’s Language Model (the “LM” in the product name) which tries to do the same sort of thing to your personal research library that Google Gemini tries to do with—well, with an infinitely larger library of millions and millions of documents. The idea is the same, though: to feed a (much smaller) set of documents, audio, video, etc. into a service which then allows you to use a chatbot-type interface to ask it questions, and (hopefully) get some useful answers back. But, again, how useful NotebookLM will be to you depends entirely upon what you feed it! In computer science we have a saying, with the acronym GIGO: Garbage In, Garbage Out. If you fill NotebookLM with crappy sources, don’t be too surprised if you get crappy answers back!
I have a Google AI Pro plan, with 2 Terabytes of storage, which includes access to Google Gemini 3.1 Pro. This costs me CA$26.99 per month (approximately US$20), and frankly, I’m pretty sure I am not getting my money’s worth out of it. With that, My NotebookLM service is rated at the Pro level, which means I can have up to 500 notebooks, with each notebook having up to 300 sources. (NotebookLM Standard, the free service, lets you have up to 100 notebooks, with up to 50 sources each. You can compare the various levels of plans here.)
I have uploaded 103 documents (mostly PDFs of journal articles from my personal Zotero research library) into NotebookLM. Again, some of them are probably of lower quality than others, so the GIGO rule applies. For example, the notebook summary it seems to have automatically created veers alarmingly close to gobbledegook, and there’s even a mention of (gasp!) blockchain, and the audacity to name it as a “primary pillar necessary to facilitate real-time, multisensory interactions between users.” (WHAT THE ACTUAL FUCK?!?? Okay, I take it back, it is gobbledygook, a Frankenstein-like creation stitched together from bits and pieces of documents I had uploaded. I actually created this monster.)

There’s absolutely no explanation of how or why this summary was generated. In fact, the whole user interface of NotebookLM I found to be extremely confusing. I had to dig through the product’s Frequently-Asked Questions list to find out why some things wouldn’t load: any uploaded file over 200MB in size, any source with over 500,00 words, and any copy-protected PDF files will not load, but you don’t get any sort of error message if you try. In my limited testing thus far, you get…no response.
Even worse, this feels like a product that Google has just sorta dropped on us, with only the previously-mentioned FAQ and an email address for product support (yes, even for Pro users). I shouldn’t be surprised, I suppose. Just like I wouldn’t be surprised if Google is silently compiling notes on how people use NotebookLM, or decide to yank it away, like so many other previous Google products and services.
Honestly, I need to spend some more time playing around with NotebookLM before I issue any final summary judgement on the product. In particular, I get the feeling that the GIGO rule really applies to NotebookLM! Google themselves, in their NotebookLM FAQ, states:
Sometimes NotebookLM can’t answer your question because of…
- Information not in sources: NotebookLM answers questions based on the information provided in your uploaded sources. If the answer isn’t in the source material, it won’t provide a response.
I had a very interesting day playing with these GenAI tools, and I learned a few things. I’ll keep you posted on how things go!























