Best way to merge two databases with large overlaps?

I imported my reference database from Zotero to Jabref and cleaned it up. In the last few days I found that some references are missing from Jabref (several hundred records of 3000+). This is more than the number of duplicates I removed in Jabref.

The best idea I have is to import the references from the old Zotero set to a new JabRef database, generate a fresh set of BibTex keys based on my existing template, and then merge the two databases. The difference between the original and merged JabRef databases should only be the new imported records plus the tens of records that I edited manually (and likely have different BibTex keys).

Any ideas?

Just for the start:

  1. Which version of Jabref are you using? You can find this information under Help>About Jabref
  2. Which format did you export to in Zotero? (e.g. .bib or .ris)
  3. Which method did you use to import into Jabref? There is file>open file and file>import.

Your method looks nice on paper, go for it :slight_smile: although it might be a little tedious with hundreds of entries.

Whatever you do: Have backups!

I would also check, if the new file you imported from zotero has the same amount of entries that you have within Zotero

Version of JabRef 5.5–2022-01-17–27a05c7.

I exported to Bib from Zotero.

I think I used file>open because it is native for JabRef. How would that have been different from file>import?

Only a few tens will need to be manually reconciled I hope - the rest should be the new records…

For the actual difference between import file and open file, one of the more experienced maintainers would need to chip in here.

Why i think it is relevant to be precise here is because in Jabref versoin 5.3 importing entries by using XMP metadata that was attached to pdf files was not working well, so what you texted here reminded me of that problem. Back then, it was something very specific to importing via XMP only. The other methods of importing bibliographic data were not affected.

The bib versus the Zotero file: 3488 vs 3816 entries, after cleaning up not more than 50 duplicates in the bib and deleting maybe 50 more irrelevant entries.

On your last point: yes, the few new PDF files that I imported had very, very little metadata come through. Even on recent (2018 onwards) publications.

I hope that is fixed soon. Otherwise I’m very pleased with JabRef


What i meant to say: Can you confirm that when you export your Zotero library and then open that library with Jabref that there are 3816 entries within Jabref? (Without having cleaned duplicates and deleting stuff by you). If yes, that would mean it has nothing to do with import, but there is something going on with what you did AFTER you imported it. Just trying to narrow down the possible causes by excluding that it has somethign to do with opening the file or importing the entries

I’ve checked the Zotero library - there are 3816 entries. I then export to bib format

When I File>Open the bib file in Zotero the new library reports 3790 entries in JabRef

I also tried File> Import and the Import dialogue reported 3790 entries in the bib file. After Select All and Import, JabRef becme unresponsive and after 15 minutes I cancelled the process.

Import took longer than expected, but I assume it was running a lot of checks and I assume there is a log of issues reported? Wasn’t clear where to look

EDIT: I left it running and the successful import took 25 minutes, and reported 3790 entries too
The cleaned JabRef database has 3488 entries

I may not actually need to do a full merge. Just a quick comparison of entries (would love to export author, year, title in flat files), import to a conventional database and then run a join.

I had found that entries from one author present in previous libraries (Zotero and Mendeley) were not found when I searched in JabRef (“No entries present”). This seems to be an intermittent issue that occurs when I have hundreds of megabytes of swap memory under Ubuntu. I’ve also seen it under Windows.

My browser keeps chewing up more and more memory over time. If I don’t keep it open I don’t have this problem in JabRef

I have 8GB physical memory and a similar amount of swap under Ubuntu, so it isn’t a hardware issue

For better performance, try to disable Autocomplete in the preferences

I would suggest to use this thread for the original question: “What is the best way to merge two databases with large overlaps?” It is a good question and maybe somebody else in future may have an even better answer to what you found, so let’s leave it open and discuss the other issues in different threads. :slight_smile:

Best idea so far about merging two databases with large overlaps: