Hi all,
I come from Zotero (which is nice but limiting for group use) and my first impression of Jabref is really good!
Only i noticed that the quality of meta-data extraction from pdfs for Zotero is superb, while Jabref has quite some problems. I saw that in the unreleased 6.0 version some improvements have happend.
In general, will the JabRef parser be on par with Zotero? What are the bottle necks?
I haven’t used Zotero much, but here is what I can tell you from knowing internals of JabRef:
JabRef has several methods (heuristics) for extracting metadata from PDFs. This includes:
Finding embedded .bib source code or (or a whole .bib file) in the PDF.
Finding XMP metadata in the PDF file (XMP is a general format for metadata).
If PDF “came” from Springer or IEEE, then JabRef can handle them easily, as the format of those publishers is “well-defined”. In other words, it can parse files of
specific styles, though not all of them.
JabRef can also send files to Grobid (well known service/technology/system for analyzing documents), however currently it doesn’t work.
JabRef applies all of those methods and forms a final .bib entry.
By looking at Zotero docs (retrieve_pdf_metadata [Zotero Documentation]), it seems that Zotero sends couple of pages from PDF (several pages at the beginning) to some external online service. I think it’s probably Grobid as well (or one of the services). It also fetches information from DOIs or ISBNs.
JabRef can also extract metadata from DOIs and ISBNs, but it doesn’t do this automatically.
To conclude, both JabRef and Zotero rely on external services to retrieve metadata of PDF, but:
Currently, JabRef’s Grobid instance is down.
JabRef’s algorithms for handling PDFs could be a bit refined (e.g.: to automatically fetch metadata if it has found a DOI inside PDF).
And thus, JabRef might be a little bit behind Zotero in this functionality
Here are some good points of JabRef, you might be interested in, regarding PDF import:
JabRef has strict policies for connect to Internet services. It never does (at least it should never do) this without explicit user permission. If needed, you can redirect from Internet-based services, to local ones in a local network (example: you can serve your own Grobid instance locally, and if I remember correctly, you can setup JabRef to use it, instead of some service from Internet).
Even without access to external services, JabRef still can import bits of PDFs.
So, JabRef has some compelling advantages for this task. If you are doing private research (or concerned about privacy), JabRef will be a good reference management system.
It’s not the best idea to import articles/entries by PDF, as storing metadata in PDFs (especially bibliography metadata for research/librarian purposes) is not well-defined. Sometimes PDF has metadata, sometimes not. Someone call \hypersetup (or use pdfx) in a LaTeX document, someone not.
It’s always better to rely on identifiers like DOI.
Thank you for the detailed answers!
I guess it really is the use of the external service (like grobid), that makes Zotero so good in importing the meta-data. It’s sad to hear JabRef has this option but that currently it does not work. Is there a way to work around this without setting up a local instance of such a service?