They are likely doing some interpolation for 200B or benchmarking it in wrong wa... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		YetAnotherNick on May 3, 2023 \| parent \| context \| favorite \| on: OpenLLaMA: An Open Reproduction of LLaMA They are likely doing some interpolation for 200B or benchmarking it in wrong way. e.g. Hellaswag accuracy for llama 7b is 0.76[1], but it is written 0.56 in the repo. Even at 200B tokens, it is higher than 0.56 for llama looking at the charts. [1]: https://arxiv.org/pdf/2302.13971.pdf

byefruit on May 3, 2023 [–]

They ran lm-evaluation-harness on both this model and the original llama weights, which is the correct way to do it.

Many people have been struggling to reproduce the benchmark numbers included in the original llama paper.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact