Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Role-playing with AI will be a powerful tool for writers and educators (resobscura.substack.com)
154 points by benbreen on Dec 12, 2023 | hide | past | favorite | 101 comments


The author sort of hints at this in a roundabout way, but the problem for education - until the systems stop lying about things they don't actually know, or using "facts" that are wildly inaccurate or out-of-date, they are not reputable/reliable enough to use at all.

Right now it's like we're in 2001-2005 for wikipedia. It's probably a good tool for basic, basic, basic beginnings of your work in the classroom, but you're going to get things wrong in surprising ways if you use it without verifying.


Schoolbooks for example have wildly out of date facts, or are brief to the point of inaccuracy. Much less watching any history channel show as those tend to suck badly.

One of the things AI's commonly allow is for people to ask questions without fear of judgement. I've not seen one scowl and call a kid an idiot yet (though I'm sure someone has a jailbreak for that). Having an AI trained to take these kids off the wall (and often based on historical inaccuracies) questions would be a really interesting tool.

Just not the only tool, and not really one that should be fully authoritative in itself.


> One of the things AI's commonly allow is for people to ask questions without fear of judgement.

Search engines have been around for almost 30 years now and they do this job better than spicy autocomplete. I type stupid questions into Google all the time and get good answers. The "AI" version of this involves strapping a search engine onto a language model, ostensibly to summarize results, but in practice there are examples of the language model just lying instead of doing the actual search for you.


>I type stupid questions into Google all the time and get good answers.

I type questions into Google and frequently get misleading or outright incorrect answers directly in their BS summaries.


> I type questions into Google and frequently get misleading or outright incorrect answers directly in their BS summaries.

This is a good point. If you treat a search engine as a language model and vice versa you will run into issues. These issues compound even further for users that decline to scroll down and click the links that search engines return


I started doing both.

Some result or tasks are better fitted for LLM. Or you want a bullet point response that Google can’t do anymore.


Right, but when a schoolbook-writer doesn’t know something, they don’t just make up an outrageous lie. Not saying LLMs don’t have a place in education, but they have a long way to go.


Feynman would have disagreed with you

https://www.rangevoting.org/FeynTexts.html

>The reason was that the books were so lousy. They were false. They were hurried. They would try to be rigorous, but they would use examples (like automobiles in the street for "sets") which were almost OK, but in which there were always some subtleties. The definitions weren't accurate. Everything was a little bit ambiguous – they weren't smart enough to understand what was meant by "rigor." They were faking it. They were teaching something they didn't understand, and which was, in fact, useless, at that time, for the child.


Gonna adopt this paragraph for describing LLMs


[1964]

Not to be snarky, but education has come quite far since the 60's.


Has it? From what I’ve heard about the textbook industry, I feel like it hasn’t.


Oh sure, a human might hurt your feels so you should only talk to a mechanical turk.

We are all clearly going mad.


Based on the number of replies you've made that have been downvoted, I think maybe as a child you are one of those people that would have benefitted from a mechanical turk guiding you how to behave.

Maybe you should reflect on the abuses some humans have received from other humans. For example like your^H their own parents and teachers?

Not all of us have had great lives. People that have been abused tend to abuse others unless someone/something breaks that cycle (and it's almost never about pulling up your own bootstraps).


This was a very rude response coming from a person feeling superior enough to lecture someone else on proper behavior. You could have set an example of the behavior you'd like to see instead of being a jerk.


Oh sure, someone talking to a mechanical turk might hurt your feels so you should only talk to a human.

We are all clearly going mad.


Lol thanks I feel you helped my point


As a large language model, I cannot help points as they may be harmful to humans.


Remember, LLMs are purely language AIs. They can and will lie because "an answer " has been weighted high enough that it'll do anything to satisfy that requirement. The purpose of them is to create natural sounding language in response to what it parses the input to be. That's it.

It's similar with what I call "action AIs", where an AI tries to learn to walk, or race a car, optimally, through a track. It will often repeat mistakes, because short term it gains a higher score and takes time to learn that short term gains, in some cases, harm long term gains.

People are using a hammer to install screws. Technically it works, but that's not the droid they were really looking for.


Which puts pathological liars into a new light, at least for me. They’re compulsively story completing.


This isn’t the problem people think it is.

Chain-of-Verification, Process Supervision, encoder/decoder and a plethora of other models are quickly maturing, AND it’s important to remember the current systems ARE NOT particularly optimized in any way for objective accuracy but instead to carry on conversation. It’s a conversation bot.

It’s also important to understand that most systems out there are building on top of the same AI APIs. There isn’t actually that much diversity in the ecosystem that’s broadly deployed yet, so problems with ChatGPT and Bard are “AI problems” and not limited in scope.

As the market matures, solution diversity will increase dramatically and systems that solve these major challenges will emerge and not necessarily from the incumbents. That’s been the pattern in tech waves of the past and why Apple always takes the wait and see and implement the winning solution in the product strategy so often.

These are early days and it’s important not to see the current generation as anything greatly exceeding the first demonstration level technology wave, hard as that is to imagine. It’s 2000, and we are looking at something like a Nokia 3210 (gpt 3.5) and 3310 (gpt 4) talking about how it has problems.

Yep. It’s not a iPhone yet. And we still think Nokia is going to dominate. And the current systems just have problems that are kinda holding them back… but this is the way technology waves break…


>2001-2005 for wikipedia

I was there [still am]. Recently, I provided an update to "transistor density," which was then cited by Perplexity.AI when I asked about a specific new processor type (eerie, having been an early adopter for both wiki and LLMs, from a user-perspective).

I'm left wondering "how much an old wiki handle" [account] might be worth, if it is so-readily cited as "leading authority" (when in reality I was just a curious teenager, trying to figure out what made encyclopedia "so special," when wikipedia provides all these linkages FOR FREE).

Half a lifetime ago, and I'm still curious how this whole "open source thing" is going to play out...


> use it without verifying

There isn't a single general source of truth that we can use without verifying. Human teachers especially aren't such sources.


This isn't as big a problem as you seem to think it is. Even if people need to verify facts maybe it will give them better judgement when other humans lie to them.


Yeah, I always gaslight the kids so that they'll be prepared for the team world.


gpt4 needs better marketing. its not perfect but its without a doubt the best learning tool humans have achieved. its better than books, wikipedia, libraries, etc. its basically your personal college professor with 24/7 office hours on practically any subject. using it in combination with other tools is the best approach, but this has always been the best approach for every learning tool.


As a teacher, I agree.


So if you're teaching, I dunno, Introduction to Physics, your claim is you'd rather assign students GPT 4 than a physics textbook if you could only assign them one educational tool? Because it's the best tool?

If you were teaching fourth grade math, instead of assigning a workbook of math problems you'd prefer to tell the kids "Ask GPT to make up math problems" because it's the best tool, so if you could only pick one tool you'd go with that?

If you were teaching history and had the choice of sending kids to the university library to write research papers of having them ask GPT 4 about history, you'd have them just ask GPT 4 about history, because it's the best educational tool?

Bold claim.


A physics textbook is a great start, but it has one major drawback: it's linear, and can't adapt to the reader. Maybe I already understand half of the textbook, but I can't figure out this one section. I'm out of luck, because the textbook won't expand on the part I struggle with.

One major value of the teacher in the classroom is to be able to sense when students are getting lost, and have ways to slow down, re-explain concepts that they missed.

A basic AI sounds like it could soon do a great job of providing the content of a physics textbook, with the adaptability of a one on one teacher.


Sans neuralink or equivalent tech, I’m skeptical that an AI will soon develop the input mechanisms to make those judgements on par with an experienced human tutor. Students often (inadvertently) mischaracterize what is blocking their progress - and blockers are often nonacademic. What I would like to see, and work on, is augmenting the human instruction rather than replace it.


A physics textbook? For intro to physics? Instead of Khan academy? I wouldn't be surprised is a double digit percentage of students never even open their 30 year old decaying textbook in their high school physics class.


no one made the claim that gpt4 is the only tool that should be used


The claim was GPT4 was the best education tool. I think that's absurd.

Forcing an educator to choose it or a book demonstrates the book is in fact the best education tool unless the educator is going to double down and say they'd teach from a chatbot over a textbook.


I'm curious how many students would agree that textbooks are useful at all. My memories of college include actively finding ways to avoid having to buy textbooks and never opening a textbook if lectures were recorded. The best classes by far were seminar classes where the entire text basis for the class would be reading publications in the field - cell signaling and microbiology and others didn't have any recommended books at all. I don't remember ever using a textbook in a CS class either. I'm pretty sure GPT-4 would be infinitely more useful than a textbook in those classes considering we didn't use textbooks


its not absurd at all. if i want to learn what an atom is, why on earth would i open a book when i can just have a hour long conversation with an AI that can explain the concepts to me at any level and answer my questions? as a learning tool, why is a book better?


If I wanted to know what an atom is I wouldn't sign up for a college course.

But if I did sign up for a college course i would expect a more systematic presentation of the material than "whatever random thing I thought to ask an AI."

Not to mention just the other day I asked Chatgpt 4 accounting questions and it gave the wrong answer, and only interrogating it prompted it to correct itself.


you seem to think "gpt4 is the best learning tool" means "gpt4 is the only tool that should be used in education". i literally said in my original reply that it should be used with other learning tools. your example is the perfect use case. the course is a great tool for providing an outline of what to learn. if i follow that outline with gpt4 im going to learn the concepts much faster than i would with a book, and since i'm learning them faster i will have more time (and motivation) to learn more concepts.


I'm saying GPT 4 isn't the best learning tool, books are, and GPT 4 can be useful without exaggerating it's usefulness and declaring it "the best tool".


That's debatable.

The hardest thing about learning a new concept/idea is to get started. GPT lowers the barrier of entry, and you can use the knowledge you got to tackle other learning tools and fix the parts GPT got wrong.


if i had this when i was in school im certain i would have been an A student. my biggest hangup has always been fear of looking dumb, asking dumb questions, holding up the class to clarify things...gpt4 eliminates all these problems


They're good for at least exploring the vocabulary of a new domain you're not familiar with. You can really dig into a topic and ask systematic things while jotting down new terms to look up and verify. It can help get you past the "don't know what you don't know" stage.


We're at a strange place in computing right now.

At one end of the spectrum we have machines that are fast and efficient. Able to store data, search for data and retrieve data as accurately as it was originally stored.

On the other end of the spectrum we have machines that are so creative we can't control the creativity so it lies and is inaccurate.

What's missing is the machine in the center of these two extremes. A machine that can be creative and factually exact at the same time. We're slowly converging on that goal right now as chatGPT can now use bing to look things up.


Altman has been recently talking about (basically) that continuum and the intention of making it configurable. Seems like an obvious goal to strive for.


Early wikipedia was flawed but c'mon, hoping that a probabilistic text generator happens to string together a series of correct statements is a fundamentally unserious way to gather information


Did you know that those text generators can invoke tools like web search, retrieval, and more to pull in external truth?


Using chatGPT's web search is the worst of both worlds. You get all the inefficiency of google with the potential of your text generation fucking up what you searched.


ChatGPT's search is powered by Bing.


Even worse


This actually has made me less happy with chatgpt.

If I wanted to google something (or bing, whatever), I would have done that. A major draw for me has been that chatgpt was providing a much better experience than search engines.

Now it sometimes feels like a fancy lmgtfy.


use chatgpt classic. It's faster, too. It's my default.


Awesome suggestion. I just asked a question that chatgpt failed to answer because it tried to bing it. Finally dug through the menu and found chatgpt classic and it answered it just fine (verified myself).


> use chatgpt classic

We're really speed running development cycles nowadays aren't we?


> tools like web search, retrieval, and more to pull in external truth

When did those tools start outputting truth?


And yet it works. I asked Google Bard five multiple choice questions, each with four options and it got them all correct. I'm sure you can figure out the odds of that happening by chance.

It makes no sense to say something can't work when it clearly and obviously does. Most humans would do worse.


Nothing about the LLM experience is deterministic, so these anecdotal experiments mean nothing. Your experiment led the witness by feeding it possible answers. Small wonder it got them all right. Try this: give it four wrong answers for each question and see how it fares. In my experience it will pick one and convincingly rationalize why it's right, unless you question it, at which point you're leading it again.

Anecdotes are so worthless, in fact, here's mine-- I asked Azure's GPT for Powershell help. After seven regens in which it tried to include a different fictional library, I gave up. So which of us had the "real" LLM experience?

These things are storytellers, not teachers. Sometimes it gets it right. Maybe most of the time. It's convincing enough that unless you're an expert, you'll never guess when it's wrong, and the lies are bespoke for every user so there's never going to be an errata page to document its failures. It will always appear reliable.


> These things are storytellers, not teachers.

I really liked how a recent paper from DeepMind put it - LLMs are just role-playing: https://arxiv.org/abs/2305.16367. This explains so much.


> Your experiment led the witness by feeding it possible answers. Small wonder it got them all right.

Are you claiming you've always scored 100% on every multiple choice test because you've been "fed the answers"?

What kind of dumb response is that?


> Are you claiming you've always scored 100% on every multiple choice test because you've been "fed the answers"? What kind of dumb response is that?

In giving it answers, you gave it context to infer the right one. You narrowed the search domain. It does have the same effect on humans, which is why multiple choice tests are easier than others-- when you show up to take the test wholly unprepared, you're looking for the most plausible answers based on the context you're given. Fortune tellers and Clever Hans work the same through interactive reactions.

You take to insult but I'll challenge you again to run your experiment and provide only wrong answers for all of the questions. Bullshit your fortune teller and see what answers it comes back with in the impossible situation you create.


Fortune tellers and Clever Hans do not score perfectly on multiple choice tests and neither do humans.

I don't have any evidence of how humans perform on multiple choice tests where all the answers are incorrect and they are not given that option. Do you? Or are you just assuming that they would challenge the context?


There are a couple good examples from the SAT. Veritasium did a video on it a couple weeks ago.

https://www.youtube.com/watch?v=FUHkTs-Ipfg

Of all the 300,000 students who took the test, only three reported a problem with the question. So at least for this case, 1/100,000 were able to identify the problem with the question and report it.


Did you stop reading the comment there?


This is such a tiring comment. Might I suggest you actually try GPT-4? It will do amazingly well at most tasks you throw at it, especially if you're decent enough at the task to course-correct it.


Giving the load of bollocks teachers propagate themselves, I suspect the first problem after automated homework will be students calling out the BS. I recall the system really didn. T likd it.


AI as a tool to kickstart your own imagination is fine; but despite the author's claims that he's not excited about AI-as-author, it seems like they're mostly just reading fan-fiction and fan-art created by an AI.

Even the old timey doctor sections, the author immediately admits are mostly factually wrong. What good does that do?


Not factually wrong at all - the dosages, ingredients, diagnosis and even the language are all strikingly accurate. Naturally, the "fake" 1680s doctor didn't write the same prescription as the real one (Sydenham). But a different real life doctor would've disagreed with Sydenham, too. In other words, if you had 100 physicians in the 1680s write out a prescription for hysteria and "hypochondriacal passion," this would be (IMO) indistinguishable from the real ones. What's different is that this is interactive, so you can change elements at will. Again, this isn't reflective of historical fact. It also isn't simply fan fiction. This text isn't an end point, but a starting point for jumpstarting your own thinking about the affordances of a past world.

In other words, I think this is a new method for thinking creatively about history -- one among many that already existed, like historical fiction, historical re-enactment, various forms of experiential learning like debates and roleplaying, etc. But it's cool that there's a new one!


I’m actually building a role play training tool at the moment - https://Solidroad.com . The AI plays the part of a fake customer so sales and support reps can practice before talking to real customers. We’ve built a pretty low latency voice to voice conversation simulator. It helps people build confidence on the phone etc. and we’ve found that accuracy isn’t as important in this context (ie. It’s ok if the AI says something weird every once in a while) as long as the results are by and large believable.

Have a demo in our website where you can try and sell a smart phone to Michael Scott from the office.


> It’s ok if the AI says something weird every once in a while

If anything, I think having the occasional 'complete weirdo' interaction would be good training for the real world. Most people never get to practice how to handle the truly strange cases before the first time they have to deal with a customer who wants to return a case of half-eaten chocolate bars.



I've been a little disappointed with role playing with ChatGPT 4. The AI starts out really close to the behavior that you would expect for the proposed role, but as the conversation gets longer, it starts to forget the role and becomes more generic, like you'd expect normal ChatGPT to be.

I hope this is an implementation detail and that the technology can, in the future, maintain the role playing across really long conversations.


Forgetting context is a problem, for sure. One thing I've found that works fairly well is to include a request for a "status bar" in your prompt. I.e. you ask it to remind itself with each response 1) who it is pretending to be 2) what the date is 3) what the setting is 4) what is in their NPCs "inventory" (which it intuitively understands because LLMs seem to have a natural affinity with MUDs). You can even have it track its mood and variables like weather.

As the context windows of Claude/GPT-4 etc increase, I think this will be less of an issue, but for now it's a pretty effective workaround.

Here's an example of the prompts I'm using (from an activity I just did with my world history class): https://docs.google.com/document/d/1sLRsUVJ_KSPtjrO83ko2MSFf...

And my writeup of an earlier version: https://resobscura.substack.com/p/simulating-history-with-ch...


I played through the "fall of the ming dynasty" one when I saw your submission and it was pretty fun. I do find they are a little too easy to persuade of heroic things. It's easy to lead a heroic resistance and win the fight within a turn or two if you provide a justification, no matter how tenuous or unlikely. Especially if it's "feelgood" like you stand on a table and make a speech about how the wealthy kleptocrat emperor should agree to give the peasants their freedom if they resist the invaders or whatever (even if that makes no sense and would never work)


I'm not sure how exactly ChatGPT uses its own responses in the context, but I've found that it can ignore the instructions you've made it repeat over and over, even though it just did it successfully in the last response. I've had better results making it produce something to paste in my next prompt.

Relatedly, when editing past prompts, I've found that it sometimes reuses things that were in the original branch of the conversation, even though they're not in the logical flow of the current conversation.


NovelAI uses something like this with their home-rolled models: you can put in some generic 'memory' as well as keyword-based notes and it will automatically get included up to a certain token limit, including some prioritization handling for cases where many different notes are triggered at once.


A useful 1a for ChatGPT specifically is BabyAGI-style "what their current/next objective is."

There are vector database solutions that address the forgetting of context. Oobabooga has superbooga (chromadb), SillyTavern requires the headache of the Extras service but will let you use chromadb, Google or OpenAI. Koboldcpp doesn't do vectorization, but does have a clunky autogenerate-summary feature that injects a summary of events so far into the prompt.


Look at MemGPT


When I write now, I will sometimes load a page or two into a LLM and ask for an ‘opinion’, and also suggestions for more sub-topics. This is sort-of like bouncing ideas off a human collaborator but also very different.

More to the point of this article, I have a few test prompts that define a bizarre remote planet and culture and ask for a story to be written using this context as a kick-off point. GPT-4 and Claude 2 generally do a good job of ‘creatively’ generating a story. Some smaller LLMs that I run locally on a 32G Mac Mini to a poor job and others are OK. Related: when using smaller local LLMs, I find it important to have many models installed and keep notes on which smaller models are good or bad at specific tasks.


At some point the Danielle Steels of the world will use models trained on their body of work to generate new novels from whole cloth to pump out content with little effort. The only constraint would be not releasing them too quickly for human-written books.


I don't think this will be a viable option. After the first 4-5 of books created like that they'll get stale. The AI won't get any new training data (as Ms. Steel is presumably not writing a book while the AI writes a book for her), and even if the prompts vary, the outputs will feel more similar to each other than an average author's latest 4-5 books


I dunno man, I've seen some uncensored LLMs come up with some pretty wild shit. This timeline is about to get a lot weirder.


But that wild stuff will become bland after the gold rush I think. You still need to do something different with the same AI that all the other artists / authors have.


I've tried a few things in ChatGPT, it's going to be a real uphill battle to get a story >1,000 words that's not bizarrely generic. What you can get now is more like madlibs 2.0 than actual story generation.


This is a cool project that implements role-playing AI:

https://github.com/joonspk-research/generative_agents


Another one (inspired by the above) that doesn't rely on OpenAI servers:

https://github.com/a16z-infra/ai-town


> Setting Up the Environment

> To set up your environment, you will need to generate a utils.py file that contains your OpenAI API key and download the necessary packages.

> Step 1. Generate Utils File

> In the reverie/backend_server folder (where reverie.py is located), create a new file titled utils.py and copy and paste the content below into the file:

    # Copy and paste your OpenAI API Key
    openai_api_key = "<Your OpenAI API>"
I guess it still relies on OpenAI


Right, sorry I forgot to add you can override the url with `OPENAI_API_BASE` and point it to a text-generation-ui OpenAI API[0] compliant model.

0: https://github.com/oobabooga/text-generation-webui/discussio...


This is amazing, thanks for sharing it! I’ve been thinking about building something along these lines, so it’s great to see a working model.


We're working on AI role-playing for sales training and coaching. But as we've been validating it we've been learning about a lot of other industries that could use something similar.

https://quick.live


As the pendulum has shifted way (way) over to art as commerce, as a product, then AIs are plausibly useful. Why not make the 'product' more efficiently and cheaply.

But art for me is self-expression; I'm encountering another human being or I'm expressing myself. AI creating that art (really the only "art" IMHO) is a useful as an AI writing memoirs or a love letter or a condolence note.

In general perception, art always has been some mix of those two and people seem to unconsciously conflate them, to slip from one to the other. If AI - plus the current socio-political madness of dismissing all humanities, all compassion, and all humans as anything but economic devices - effectively wipes out art, what have we wrought?

What has the IT revolution, what has Silicon Valley wrought? Look at the world we are creating. And for what? So a few people can make lots of money?


AI generation is like any tool. Once it's available to everyone, you have to do something unique with it to have something worth grabbing people's attention. If AI can illustrate and flesh out the plot of a comic book on a simple prompt, then we'll quickly get bored of those comic books. When someone takes the energy and time to create a unique plot, setting, rich characters, timely themes, etc. and then feeds that to the AI, the result will be interesting to people. People who put less effort in, will get bland, unsellable output out of an AI


And video game creators. True open world games with infinite choices that affect things down the line.


Definitely not -- at least, not yet. Current generation language models are spectacularly bad in this application. It's very difficult to get them to role-play a character in a universe which differs substantially from the real world (since that's where all their training data came from), and they're overly credulous when responding to player input. As a result, they're likely to rapidly go "off the rails" when interacting with players -- they're likely to act unaware of details about the world they're in, to fabricate details about that world or mix in details from other fantasy universes or the real world, or to allow the player to introduce incongruous elements without being challenged.


Maybe not for live interaction with a player, but for seeding otherwise procedurally generated content absolutely. I look at what a game like Dwarf Fortress does with their procedurally generated world history and narrative and see so many ways it could be enhanced with an LLM. And this would only be during world generation or potentially for major world events, but not in a real-time user triggered response.


Yeah, i mean't infinite figuratively. LLMs should at least enable some middle ground between selecting between A,B,C,D and anything goes.


This is merely another way in which AI will be cast as a co-creator. Especially given the limitations in copyright law in the US I think many early systems will operate like this, just to secure the output as intellectual property…


Question is... would using an AI as a muse lead to more creativity or dulling a personal creativity? It is one thing where people would hang out and spitball ideas that led to inspiration. But I wonder if people in the future will become too dependent on AI muses. Don't forget how we now have generations of math illiterates since the dawning of calculators.


I've tried something similar with a little side project, https://catchingkillers.com. The witnesses, chatgpt chat bots, wouldn't have much of an attention span and start making things up. I think this added to the fun of the game. You had to question each witness to get verification on what another chat bot said.


Can confirm, here's a 600 page story written with it starting in 2015. https://archiveofourown.org/works/46518058/chapters/11713547...


Honestly as a writer I agree that it could be a useful tool, but it would require a lot of UX and UI work to make it truly usable. I don't know if it's worth the effort with the technology as it is. Maybe in 2 months it will be.


Netwrck. Com it's crazy good AI and image generation right now.

If you say "appears" it will generate art that's like midjourney level it's crazy.

But the art is described by the AI first.


That has been a thing long before now.


[flagged]


or not. It's interesting to me.


Downvotes without comments is pithy. Can you suggest my error at least?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: