Parsing references from the PDF

JabRef offers the feature to extract references from the bibliography of a PDF using GROBID’s capabilities. This works OKish, but has the major drawbacks that data is sent from JabRef to a service hosted by the JabRef developers. When conducting paper reviews, one does not want to share the PDFs to other third parties.

Therefore, we included a possibility for offline parsing in JabRef. It currently works well for IEEE papers, but not for others.

You can try it out if it works for you, too: Download the latest development version from https://builds.jabref.org/main/.

Ensure that online services are disabled

You need to scroll down in the “Web search” preferences.

Start parser

Have the PDF attached to an entry.
Right-click on the entry in the entry table.
Select the item “Extract references from file (offline)”

Ensure that “offline” is written. If there is “online” written, you have GROBID services enabled.

There will be a popup with all parsed references

First, click “Select all new entries”, then deselect “Download linked online files”. Finally, press “Import entries” to add the entries to your library.

Result

The entries are now imported into the main table.

The original entry has links to the cited entries in the “Other fields” section:

Links

You find the PDF used for testing at jabref/src/test/resources/org/jabref/logic/importer/fileformat/tua3i2refpage.pdf at main · JabRef/jabref · GitHub.

A discussion of older functionality is done at Import from Reference text.

Outlook

We are happy to hear feedback on this functionality and how that it works for your use-case, too. If it does not, please search for an open-access PDF - or create a new example PDF for us. Then, we can work on parsing your variant of references similar to the pull request 11156.