Unlinked files Importer: Fall back for files failed to be processed

Since the file unlinked files importer has problems with many of my pdf-Files, I would be very happy, if the Importer would nor only continue to work if some file can not be processed (see github Issue Unlinked files Importer: Continue (fall-through) when individual files fail to be processed #7206 (yes, I used parts of this issue name to formulate the name for this issue)), but would also create for each of these files a new entry in the database with a link to to this file and a title like
"File No. 01 of run of search for unlinked local fikes of 2021-12-06, 1:23:13 p.m. that could not be processed ".
Thanks to this feature one could find these entries in the data base and could start to fill the missing information for these files manually and would not need to keep track onthe names of these file somehow during the import cycle and to link them afterwards by hand.


Please vote if you support this proposal:

  • I would like to have this feature, too!
  • I don’t care.

0 voters

  1. Are you saying there is no fallback even after #7206? Sorry, i have not tested, as i currently don’t know how to reproduce an error while importing.

    Maybe it would be good if you were to post the errors you encounter while importing. If the file is not of sensitive nature, you could post it here as well.

  2. It was hard for me to understand this proposal initially, but after thinking about it i think it’s a partly good proposal. The only problem i have is what happens if the entries that you would want to be created (e.g. "File No. 01 of run of search for unlinked local fikes of 2021-12-06, 1:23:13 p.m) is a duplicate of an entry that is already in your library. You will then have two entries in your library that relate to the same bibliographic data. One with a link to the file and one without the link (but maybe already proper bibliographic data you got from DOI or other means). You will want to delete one of the two afterwards, which is one of the major concerns of what issue ##7206 tried to address.

    • I would suggest to not automatically create these new entries, but rather make it an option to be ticked in the error message for failed import dialogue. This also would give the user the chance to move the successfully imported and linked files to a different folder before handling the failed files.
    • Another option would be to automatically create these entries but create a cleanup action that automatically can remove these entries.

Hi,

I don’t know which version of JabRef you are using, but I suggest trying out the latest development version
https://builds.jabref.org/main/
because there is already a fall through. Files that can and cannot be imported will be shown in the table Import results in the lower part of the bottom.
There is also a button to export the list of selected files.

For further UI ideas see this issue

1 Like

Dear Thiote E,

  1. I am using
    JabRef 5.3–2021-07-05–50c96a2
    Linux 5.3.18-lp152.106-default amd64
    Java 16.0.1
    JavaFX 16+8
    and I must admit that there that some error messages show up when I start jabref. So maybe this creates additional problems.
    I would need to check, where to find information on these problems, since currently I just have the observation that the import does not work without further comments. And I would have to check, which files I could send in view of copy-right issues

  2. So, if one starts to deal with one of the files created, and starts to add
    the information to the emergency entry, one should check then (e.g. by using the title and search) at some time, if this entry already exists. Of course, then one should merge these entries, or coping the file link form the emergency entry to the already existing entry.
    But, this is not only a problem for my emergency entries, but also for those entries that could be parsed somehow by the file importer, but the result is not correct. (I had observed for many preprints from the arXiv Server that the large arXiv Information that is “printed” on the first page in the pdf-file becomes the title in the Bibtex entry).
    Hence, already in this situation one may have two entries for one paper, and has deal with this situation somehow. And is the importer able to deal with the situations that there is an unlinked file and some entry in the database with the corresponding bibliographic data?
    I am not sure, but I believe the major concern in the issuUnlinked files Importer: Continue (fall-through) when individual files fail to be processed 7206 seemed to me the problem that after one incomplete import run some files were already imported and if these files are selected in the next run again there may be several entries for the same file.

But, in view of your remark: if some falied import dialogue is implemented, one can also add the possibility to to tich/untick the creation of
these emergency entries. In my suggestions I had in mind the situation that
there may be an implementation that simply allows to continue the import after an error without am elaborated failed import menu.
If one would like to get rid of the (unchanged) emergency entries, one could use search to find using the part of the title that is fixed. Or maybe, the imported entries/falied could be somehow marked/ groued by jabref, see my Featuere suggesttion Automatic markging/grouping enties after actions like mergiing duplicates/serching for unlinked files

Before we try to come up with further features, i would suggest you take Christoph’s advice and check out the development version. Seems like the problem is already solved :slight_smile:

Hello,

I have now tested the development version, and it seems the development version reacts worse.

I downloaded the preprint BV-NORM CONTINUITY OF SWEEPING PROCESSES
DRIVEN BY A SET WITH CONSTANT SHAPE
, renamed the from 1512.08711v1.pdf to 1512.08711v1-myjabref-test.pdf:PDF.
Afterwards, I started the latest development version of Jabref, i.e.
JabRef 5.4--2021-12-07--b1338b1 Linux 5.3.18-lp152.106-default amd64 Java 16.0.2 JavaFX 17.0.1+1
Then I start search for unlinked files, select only 1512.08711v1-myjabref-test.pdf in the search results and press afterwards `import’, and it does not seem that Jabref is doing anything. … :shushing_face:

If I try the same with my Jabref 5.3 version (details so in my post above), Jabref produces an entry
with the following contents
@Misc{,
title = {arXiv:1512.08711v1 [math.DS] 29 Dec 2015},
creationdate = {2021-12-08T18:12:41},
file = {:/home/klein/artikel-vor-other/sorted/1512.08711v1-myjabref-test.pdf:PDF},
modificationdate = {2021-12-08T18:14:40},
owner = {klein},
}

Okay, the filer loader was fouled by the large information on the arXiv preprint on the first page and used this as entry for the tittle, but this I can correct by inspecting the pdf-file.
I must admit that I would like to use the search unlinked files those files in my directories that I have not linked until now. Since there are almost as many unlinked files as linked ones, my main problem is to get entries with a links to the unlinkes files. It is nice, if the entry is already filled with somedata, but I do not care if I have to fill it by some copy-pasting, preferable of some DOI information, from the pdf file to the bibtex entry. it may take some minutes, but afterwards I will need the same amount of time to check, which of my groups I have to activate for the paper/preprint unter consideration.

Error Message at start of Jabjef

Maybe the problem of the data extraction could be partially related to the error shown at the start of Jabref:

Starting my Jabref 5.4 version I see:

Unexpected problem occured during version sanity check
Reported exception:
java.lang.AbstractMethodError: Receiver class org.apache.logging.slf4j.SLF4JServiceProvider does not define or inherit an implementation of the resolved method ‘abstract java.lang.String getRequestedApiVersion()’ of interface org.slf4j.spi.SLF4JServiceProvider.
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.versionSanityCheck(Unknown Source)
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.performInitialization(Unknown Source)
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.getProvider(Unknown Source)
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.getILoggerFactory(Unknown Source)
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.getLogger(Unknown Source)
at org.slf4j@2.0.0-alpha5/org.slf4j.LoggerFactory.getLogger(Unknown Source)
at org.jabref@5.4.540/org.jabref.gui.JabRefMain.(Unknown Source)
at org.jabref@5.4.540/org.jabref.gui.JabRefLauncher.main(Unknown Source)
ERROR StatusLogger Unrecognized format specifier [d]
ERROR StatusLogger Unrecognized conversion specifier [d] starting at position 16 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [thread]
ERROR StatusLogger Unrecognized conversion specifier [thread] starting at position 25 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [level]
ERROR StatusLogger Unrecognized conversion specifier [level] starting at position 35 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [logger]
ERROR StatusLogger Unrecognized conversion specifier [logger] starting at position 47 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [d]
ERROR StatusLogger Unrecognized conversion specifier [d] starting at position 16 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [thread]
ERROR StatusLogger Unrecognized conversion specifier [thread] starting at position 25 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [level]
ERROR StatusLogger Unrecognized conversion specifier [level] starting at position 35 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [logger]
ERROR StatusLogger Unrecognized conversion specifier [logger] starting at position 47 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.
Dec 08, 2021 6:39:28 PM com.sun.javafx.application.PlatformImpl startup
WARNING: Unsupported JavaFX configuration: classes were loaded from ‘module org.jabref.merged.module’, isAutomatic: false, isOpen: true

(JabRef:3820): Gdk-WARNING **: 18:39:31.480: XSetErrorHandler() called with a GDK error trap pushed. Don’t do that.

Starting my Jabref 5.3 version I see:

ERROR StatusLogger Unrecognized format specifier [d]
ERROR StatusLogger Unrecognized conversion specifier [d] starting at position 16 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [thread]
ERROR StatusLogger Unrecognized conversion specifier [thread] starting at position 25 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [level]
ERROR StatusLogger Unrecognized conversion specifier [level] starting at position 35 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [logger]
ERROR StatusLogger Unrecognized conversion specifier [logger] starting at position 47 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [d]
ERROR StatusLogger Unrecognized conversion specifier [d] starting at position 16 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [thread]
ERROR StatusLogger Unrecognized conversion specifier [thread] starting at position 25 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [level]
ERROR StatusLogger Unrecognized conversion specifier [level] starting at position 35 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [logger]
ERROR StatusLogger Unrecognized conversion specifier [logger] starting at position 47 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.
Dec 08, 2021 6:43:56 PM com.sun.javafx.application.PlatformImpl startup
WARNING: Unsupported JavaFX configuration: classes were loaded from ‘module org.jabref.merged.module’, isAutomatic: false, isOpen: true

(jabref5x3:4019): Gdk-WARNING **: 18:43:59.178: XSetErrorHandler() called with a GDK error trap pushed. Don’t do that.

JabRef 5.4–2021-12-07–b1338b1
Windows 10 10.0 amd64
Java 16.0.2
JavaFX 17.0.1+1

I cannot reproduce the pdf import problem you report.

I disabled grobid (options>preferences>import and export) for the following tests.

  • When i use ‘file>import>import into current directory’ it finds the file:

  • When i use search for unlinked files then select the file and press ‘import’, a new entry with this library data will be created:

    @InProceedings{DRIVENEtAl1512a2f,
    author   = {DRIVEN BY and A SET and WITH CONSTANT and SHAPE },
    date     = {1512},
    title    = {arXiv:1512.08711v1 [math.DS] 29 Dec 2015},
    abstract = {We prove the BV-norm well posedness of sweeping processes driven by a moving
    convex set with constant shape, namely the BV -norm continuity of the so called play operator
    of elasto-plasticity.},
    file     = {:C\:/Users/Thilo/Desktop/test/1512.08711v1.pdf:PDF},
    }
    
    
  • When i drag the file into from my folder per mouse into jabref an entry will be created that holds the same bibliographic data than what is shown above.

  1. If you can not repeat my main problem with your Windows version of Jabref, then these are good news for windows users…
  2. Your have reproduced my minor pdf import problem in a modified form, and in my humble opinion this example shows that checking the created entries should be supported somehow by Jabref or by st by pointing out in the documentation how one can keep track of these new entries.
  • title in the entry isi again "arXiv:1512.08711v1 [math.DS] 29 Dec 2015" but should be "BV-NORM CONTINUITY OF WEEPING PROCESSES DRIVEN BY A SET WITH CONSTANT SHAPE" . Okay, my manual correction would produce "{BV}-norm continuity of sweeping processesdriven by a set with constant shape", but realizing that BV as abbreviation of bounded variation is capitalized in this paper and in the related publications is beyond what one should expect form a software that is not a full AI.
  • authors in the entry is "DRIVEN BY and A SET and WITH CONSTANT and SHAPE" but should be "JANA KOPFOVÁ and PAVEL KREJČÍ and VINCENZO RECUPERO" or
    "Jana Kopfová and Pavel Krejčí and Vincenzo Recupero" or "Jana Kopfov{\'{a}}, Pavel Krej{\v{c}}{\'{i}} and Vincenzo Recupero"