Extract information from PDF import

I don’t know if JabRef can extract information from PDF, from my view, it can’t be used. After I drag some PDF files into JabRef and read some guide from the PDF import, and nothing correct is extracted.


we have good news for you. This year’s GSOC student @btut is already working on improved PDF importing and extraction (using GROBID under the hood).

Indeed this is a feature I was very much looking forward to myself, that’s why I implemented it :wink:
I am happy to report that most things are done. I am working on some details and working for some changes in Grobid to be accepted, then we have a much more comprehensive pdf import.
You can track progress here, but as it depends on Grobid, which is not updated yet, it cannot be tested (you would need to build your own Grobid server from my Grobid PR and point JabRef to your server). Expect the feature to be in the main branch in the comming weeks and in the next release!

I hope I can use this function as soon as possible, it’s very helpful and thanks.

Hi @malacology! The new PDF import features are now available in the main branch, but not in the latest release. If you want to try it out already you can try out the builds here.

We use multiple ways of extracting metadata from PDFs now! One of them is Grobid, a deep-learning approach. JabRef now runs a Grobid server for that purpose. You will be prompted to allow JabRef to send your PDFs to that service the first time you try to import a pdf. Allow for best results, deny if you don’t want to transmit your files.

I hope this new feature is helpful to you!