I’ve taught LLMs imaginary words and their meanings with minute amounts of data ...

I’ve taught LLMs imaginary words and their meanings with minute amounts of data (two or three examples) via full fine-tuning, LoRA and QLoRA.

I have no idea where the myth of ‘can’t add new knowledge via fine-tuning’ came from. It’s a sticky meme that makes no sense.

Pretraining obviously adds knowledge to a model. The difference between pretraining and fine-tuning is the number of tokens and learning rate. That’s it.