Creating Bibtex or DOI list from bibliography

Ernst_Ungenuss · December 12, 2023, 4:15pm

Hi,

I’ve got a very big PDF file full of bibliographies that I copied from many different papers. Is it somehow possible to automatically create a bibtex file or at least batch-find all JOI’s or other identifiers of all sources inside this PDF?
Of course this PDF doesn’t only contain published sources with JOI’s, but having help in extracting the main portion of the sources would already help a lot.
I originally thought Jabrefs Plan References Parser would be able to do something similar, but it seems to be really unreliable for my use case.

Cheers!

ThiloteE · December 12, 2023, 5:00pm

In theory, what you want is to import the pdf with all the references that are included in the pdf.
Try File > Import > Import into current library > Filetype: PDFcontent. If it does not work, which I fear, you can simply try to drag and drop the file into JabRef, which will use our Grobid feature (Extract information from PDF import). Grobid uses AI technologies that are inherently propabilistic, so at one point, you WILL end up with hallucinations. Doing it that way will only be a first lead and you will have to crosscheck or update the information with correct data from the net.

Have a look at Menu "Update references" too, which will teach you how to update references.

JabRef cannot update references in bulk yet and I am not sure how Grobid handles multiple references. I think it ignores references in the text and bibliography and only parses the “main” reference of the pdf instead. With main reference, I mean author, editor, journal, title etc. of the pdf.

Ernst_Ungenuss · December 13, 2023, 3:05pm

I tried it, but this only extracts the first reference it can find. Still, thanks!

ThiloteE · December 18, 2023, 10:12pm

This pull-request tried to implement this: [WIP] Extract PDF References by aqurilla · Pull Request #10437 · JabRef/jabref · GitHub
Unfortunately seems stale.

koppor · March 12, 2024, 6:23am

@Ernst_Ungenuss The latest build available at https://builds.jabref.org/main/ includes that feature. A short usage description is given at [WIP] Extract PDF References by aqurilla · Pull Request #10437 · JabRef/jabref · GitHub. You need to habe GROBID enabled in your preferences to have this working.

Happy to hear feedback!

Topic		Replies	Views
Parsing references from the PDF Beta Testing	1	292	February 7, 2025
How to read bib data for/from a PDF file Help fetcher , entry-editor	8	3196	April 24, 2021
Table of recognised sources from pdf entry Features	8	653	April 8, 2024
Extract information from PDF import Features	14	1964	December 22, 2021
Download citations from DOIs en masse Help	5	3608	October 17, 2017

Creating Bibtex or DOI list from bibliography

Related topics