Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why are completion tokens more with image prompts yet the text output was about the same?


Some multimodal models may have a hidden captioning step that may take completion tokens, others work on a fully native representation, and some do both I think.


"Thinking" Mode


it doesn't say that anywhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: