PDF metadata extraction quality

InAnYan · February 5, 2025, 2:54pm

Hi! Thanks for checking out JabRef.

I haven’t used Zotero much, but here is what I can tell you from knowing internals of JabRef:

JabRef has several methods (heuristics) for extracting metadata from PDFs. This includes:

Finding embedded .bib source code or (or a whole .bib file) in the PDF.
Finding XMP metadata in the PDF file (XMP is a general format for metadata).
If PDF “came” from Springer or IEEE, then JabRef can handle them easily, as the format of those publishers is “well-defined”. In other words, it can parse files of
specific styles, though not all of them.
JabRef can also send files to Grobid (well known service/technology/system for analyzing documents), however currently it doesn’t work.

JabRef applies all of those methods and forms a final .bib entry.

By looking at Zotero docs (retrieve_pdf_metadata [Zotero Documentation]), it seems that Zotero sends couple of pages from PDF (several pages at the beginning) to some external online service. I think it’s probably Grobid as well (or one of the services). It also fetches information from DOIs or ISBNs.

JabRef can also extract metadata from DOIs and ISBNs, but it doesn’t do this automatically.

To conclude, both JabRef and Zotero rely on external services to retrieve metadata of PDF, but:

Currently, JabRef’s Grobid instance is down.
JabRef’s algorithms for handling PDFs could be a bit refined (e.g.: to automatically fetch metadata if it has found a DOI inside PDF).

And thus, JabRef might be a little bit behind Zotero in this functionality

Topic		Replies	Views
Batch Import about 100,000 pdf articles forming a new library from my hard disk Help	5	123	September 24, 2025
How to fetch data after droping a pdf file? Help	8	1395	May 5, 2020
Extract information from PDF import Features	14	2087	December 22, 2021
Creating Bibtex or DOI list from bibliography Features	4	1205	March 12, 2024
How to read bib data for/from a PDF file Help fetcher , entry-editor	8	3338	April 24, 2021

PDF metadata extraction quality

Related topics