Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you considered training a shallow MLP autoencoder, perhaps with tied weights between the encoder and the decoder to reduce the dimensionality of the embeddings? Another (IMO, better) approach I can think of off the top of my head, would be to use a semi-supervised contrastive learning approach, with labelled similar and dissimilar video pairs, like in this notebook[1] from OpenAI.

[1]: https://github.com/openai/openai-cookbook/blob/main/examples...



I originally started doing a triplet loss for video similarity similar to https://keras.io/examples/vision/siamese_network/ except for video instead of images.

But I'm sort of hoping to avoid training a model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: