A simulation of God's decision tree for the Serenity Prayer: "God, grant me the serenity to accept the things I cannot change, the courage to change the things I can, and the wisdom to know the difference."
I wonder if a language model can be treated as a policy over token-level actions instead of full response actions. Then each response from the language model is a full rollout of the policy. In math and coding, the reward for the response can be evaluated. This is not how DeepSeek works now, right? It treats full responses from the language model as the action if I understand correctly.
Larger brains don't mean a whole lot. Look at crows. If you do question they do have relatively big brains for their size... That's completely fair. Still, current research still hasn't proven if it's indeed causal, and not just correlated.