> Could you train a small language model with a big one? Yes, it's called distil...

		lossolo 7 months ago \| parent \| context \| favorite \| on: Does the Bitter Lesson Have Limits? > Could you train a small language model with a big one? Yes, it's called distillation.