Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They add new data to the existing base model via continuous pre-training. You save on pre-training, the next token prediction task, but still have to re-run mid and post training stages like context length extension, supervised fine tuning, reinforcement learning, safety alignment ...


Continuous pretraining has issues because it starts forgetting the older stuff. There is some research into other approaches.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: