I use an on-premise installation of a LLM (specifically glm-4.6) and thus came by the embedding model option as I set the API path. It looks like there are thousands of models (okay, about 300…) to choose from, and I was also able to find out that there are apparently specialized models for scientific literature, for example SPECTER (the successor SPECTER2 seems to be not in the list).
From what I read so far, the answer to my question is probably “it depends” but nevertheless I ask it: what embedding model do you use?
I am currently using sentence-transformers-multilingual-e5-large, but new models come out every so often. If you are looking for something specific, I can recommend checking MTEB Leaderboard - a Hugging Face Space by mteb. You will notice that not all models there are supported by JabRef. If you want to have a model supported, you will need to convert the model to a format supported by djl and ask them to host it, making it available for us to query it. Maybe it also works by placing the model in the appropriate djl folders on your computer. Would need some experimentation.
I see! I tested the allenai-specter and the results were not so much different from the default model, however I’m also not so sure what to expect tbh I think the most limiting factor is actually glm-4.6 in my case.