Did you set them to use the same memory budget? Adam holds more state. They do s...

Did you set them to use the same memory budget? Adam holds more state.

They do say it consistently matches or outperforms despite simplicity, and I think that statement means at the lower budget for their approach, but a fair comparison fk seems if it is at least promising would be take advantage of the lower memory read to add more params in their version in the comparison.

Also the paper says slow initial convergence, under limitations:

> More- over, our methods ensure a steady and stable update during training, allowing the model to converge better in a given task with sufficient training steps. Thus, we might observe that the convergence speed is relatively lower than Adam’s in the early stage of training; as our primary focus is to investigate the effectiveness of the SaI approach, we left the acceleration of convergence speed in future work.