AI may have been used to pick from a repertoire of stock responses, but not to generate (hallucinate) responses. Thus you may have gotten a response that fails to address your request, but not a response with false information.
You asked why they would start adding human checks with the “way better” tech. That tech gives false information where the previous tech didn’t, therefore requiring human checks.