Could you expand on your point re more sophisticated prompting?
I have found it hard to replicate high quality human-written prose and was a bit surprised by the results of this test. To me, AI fiction (and most AI writing in general) has a certain “smell” that becomes obvious after enough exposure to it. And yet I scored worse than you did on the test, so what do I know…
For flash you can get much better results by asking the system to first generate a detailed scaffold. Here's an example of some metadata you might try to generate before actually writing the story: genres the story should fit into; pov of the story;
high level structure of the story; list of characters in the story along with significant details; themes and topics present in the story; detailed style notes
From there you have a second prompt to generate a story that follows those details. You can also generate many candidates and have another model instance rate the stories based on both general literary criteria and how well the fit the prompt, then you only read the best.
This has produced some work I've been reasonably impressed by, though it's not at the level of the best human flash writers.
Also, one easy way to get stuff that completely avoids the "smell" you're talking about by giving specific guidance on style and perspective (e.g., GPT-5 Thinking can do "literary stream-of-consciousness 1st person teenage perspective" reasonably well and will not sound at all like typical model writing).
I have found it hard to replicate high quality human-written prose and was a bit surprised by the results of this test. To me, AI fiction (and most AI writing in general) has a certain “smell” that becomes obvious after enough exposure to it. And yet I scored worse than you did on the test, so what do I know…