I have a BibTex file originally generated in BibDesk that I’ve converted for JabRef, changing some tags, etc. The file has almost 12,000 bibliography records and is about 20MB in size.
However, the resulting file always fails to load into JabRef, failing at the same file line with an ‘unexpected EOF’ error. Cleanups of non-printing characters, non-ascii characters have the same results.
Is there some type of limit on file size, number of records, size of fields (I have some very long ‘annote’ entries)?
Not that I know of, but currently, huge libraries will (unfortunately) eventually face severe performance problems, as some users have reported. I believe, 12, 000 records should still be within the manageable though.
With regard to the EOF error it is more likely that there is indeed a faulty line in your library instead of having reached the max file size. It would be immensely helpful, if you could post the full entry that holds the line, which triggers the error. You also can try to remove that line completely and see if you can import the rest.
If there are multiple errors in the file, following the method of halfsplitting usually yields results very fast.
Also, as a general advise, please make sure to keep a backup of your library, before you use cleanup functions and advanced features of JabRef . Depending on what you did exactly, some of this stuff is hard to reverse automatically and may require manual intervention. You may run into pitfalls and only realize in a far distant future.
I’m using the python bibtexparser to map tags from BibDesk to JabRef. I think I’ve successfully identified mutli-byte unicode characters and converted them to LaTeX commands and output as UTF-8. Grepping the resulting BibTeX file for ‘unprintable’ characters seem to return clean.
Viewing the file at the EOF line reported shows it as actual end of the file, but JabRef reports about 400 fewer entries than the original input files and processed by bibtexparser.
Thanks for the info. I’ll take a crack at splitting the file to see if that provides more insight.
Thinking about this a little more, I can recommend Meld, with which you can compare both files side by side. 400 entries is a lot and while JabRef already has the tools to compare and fix this, it may be more convenient to use a programm that is specialized to compare multiple files.
I had tried meld, but it is only effective if some of my preprocessing does not alter the order of the entries.
However, a new version of bibtexparser recently came out and it handled the syntax problems more robustly and provided sufficient info for me to track down the problem.
My next big challenge is to restore the links from the BibTeX record to the PDF files in my archive.