I’ve taught LLMs imaginary words and their meanings with minute amounts of data (two or three examples) via full fine-tuning, LoRA and QLoRA.
I have no idea where the myth of ‘can’t add new knowledge via fine-tuning’ came from. It’s a sticky meme that makes no sense.
Pretraining obviously adds knowledge to a model. The difference between pretraining and fine-tuning is the number of tokens and learning rate. That’s it.
I have no idea where the myth of ‘can’t add new knowledge via fine-tuning’ came from. It’s a sticky meme that makes no sense.
Pretraining obviously adds knowledge to a model. The difference between pretraining and fine-tuning is the number of tokens and learning rate. That’s it.