[*] The actual model is trained on clip-ViT-L-14 and is around 1.5GB. I couldn’t get it loaded on Streamlit Cloud so I chose the lighter clip-ViT-B-16 for running the comparisons. So in case the scores are off by a little, now you know why :)
clip-ViT-L-14
clip-ViT-B-16