I think comparisons to base LLaMA are not so interesting, as almost no one is us...

ilaksh · on Dec 11, 2023

How does it compare to existing 13b models on benchmarks?

epups · on Dec 11, 2023

Good question. If you believe the results on the HuggingFace leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...), which I find very hard to navigate, we find that Mistral was not even the best 7B model in there, and there is a huge variance as well. I prefer to rely on benchmarks done by the same group of known individuals over time for comparisons, as I think it's still too easy to game benchmark results - especially you are just releasing something anonymously.

gorbypark · on Dec 11, 2023

Most of the top 7B models on the leaderboard are finetuned Mistral 7B models.

epups · on Dec 13, 2023

You are right - upon closer inspection, even models that were not previously Mistral finetunes are now using Mistral in their later versions. I wasn't aware of it before as I could not filter results in the leaderboard (it doesn't even load at all for me now).