What I like about your approach is that it circumvents the data problem. You don't need a dataset with review histories and flashcard content in order to train a model.
GPT-4 can probably estimate whether two flashcards are functionally equivalent
https://notes.andymatuschak.org/zJ7PMGzjcgBUoPjLUHBF9jn
GPT-4 can probably estimate whether one prompt will spoil retrieval of another
https://notes.andymatuschak.org/zK9Y15pCnRMLoxUahLCzdyc
What I like about your approach is that it circumvents the data problem. You don't need a dataset with review histories and flashcard content in order to train a model.