Hilarious! (For comparison, here's GPT-4 getting it on first try: https://chat.o...

kevinmchugh · on Feb 8, 2024

My understanding is that gpt4 is better at this than 3.5 and it seems to get it pretty reliably. One thing that's interesting to do is to imply the answer is incorrect and see if you can get it to change its answer. If you let it stop answering when it's correct, you get the Clever Hans effect.

whimsicalism · on Feb 8, 2024

yes, although gpt-4 has been finetuned on this one