Help with search expression for 'Automatically set file links' (F7); Attach files to entry

I have a lot of old pdf files with filenames like these:

  • author (date) title
  • author & author (date) title
  • author, author & author (date) title
  • author et al. (date) title

The style roughly follows APA 6th edition (APA style - Wikipedia) without the second name. In Jabref this would roughly be [authEtAl], but there is a space when there there are more than one authors and also space between the et al.. Et al. is also not capitalized and as you can see the date is in brackets. Also … there can be commas and there is the & when there are two or more authors.

I would like to automatically attach them to the entries in my jabref library.

Under Options>Preferences>linked files> one can find these options:

How can i use the search expressions in Jabref to capture the above file names?

I am following these docs but could not make it work:

What i tried so far:

  • Jabref works in its intended way and standard settings perfectly if i rename one of my pdfs to its bibtexkey format, but i have too many pdf files that are not named after the bibtexkey. I would have to rename every single pdf file manually in order for Jabref then be able to automatically add them. I also do have pdfs that were written from multiple authors in multiple years, so if i just use [auth] it will attach multiple wrong files to an entry.
  • Something like **/.*[auth][date].*\\.[extension] does not work. I suspect the reason is that in my file names there is space betwen the author and the date and in addition i do have brackets () around the date, which is also not captured.
  • I don’t understand for what the ‘linked file name conventions’ are there for. Is it for files downloaded by Jabref? When i enter something like [auth] ([date]) it does not work either.

I would be happy if i could attach 90% of the files and for the very specific filenames i would add them by hand. Am i doing something wrong, or is it just not possible in Jabref?

Hi @ThiloteE,
AFAIK the field markers (like authEtAl) are mainly used for citation key generation. That’s why whitespaces are avoided.
There is another marker you could use, auth.etal. It should be very similar with the exception that periods are inserted between names (if there is more than one author). You could then use a modifier to replace periods with spaces. You can read about modifiers here.
For the space between the authors and the date and the brackets around the date, you should be able to just require those characters in the regular expression.

The linked file name conventions are for files downloaded by JabRef. You can also right-click a linked file and decide to move and rename it according to this pattern. So once you are happy with the linking, you could use this field to make sure all the files you download follow your naming convention.

1 Like

Thanks for the response! It took me a while, but your tip led to some nice improvements!

These are search expressions that at least partly work (without date):

**/.*[auth.etal:regex("\\."," & "):regex("\\etal","et al")].*\\.[extension]		Captures: Author; Author & Author, but nothing with et al.

**/.*[auth.etal:regex("\\."," & "):regex("\\etal","et al.")].*\\.[extension]	Captures: Author; Author & Author, but nothing with et al.

**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*\\.[extension]	Captures: Author; Author & Author, but nothing with et al.

Things that still do not work:

1. [date].
In this screenshot you can see that the above search expressions attach ANY file with filenames that include the author syntax from above.

Hence my need to include date or year. How to?

Expressions that would allow me to enter the date:

**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*[DATE].*\\.[extension]
**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*\\([DATE].*\\.[extension]
**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*\\([DATE]\\).*\\.[extension]

If i replace [DATE] with [YEAR] it also works

Why these expressions still do not work:

  • Unfortunately [DATE] seems to not honour filenames with YYYY and YYYY-MM very well. It mostly leads to files not being found or worse sometimes to error.
  • [year] works a lot better without problems, as it only takes YYYY into account. Unfortunately it does not capture the instances when one author wrote more than one article in a given year.

What i want that i don’t know how to get. Probably enhancement for JabRef:

  • What i really would want is a [DATE] linked file search that honours all three formats: YYYY-MM-DD, YYYY-MM and YYYY.

2. [pdf] instead of [extension].
I would like to exclude other file types and only attach pdf, but [pdf] seems not to be able to replace [extension].

3. et al.

None of the above mentioned search expressions capture “et al.”
I have no clue why.

I guess Date only handles YYYY-MM-DD, but you could use [YEAR] and [MONTH] in a separate combination.
For et.al maybe @k3KAW8Pnf7mkmdSMPHz2 has an idea

I tried:

**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*[YEAR].*\\-[MONTH].*\\.[extension]
--> Works for YYYY-MM entries, but ignores all files that only have YYYY. Not good. My bib entry can have YYYY-MM but the pdf file can be YYYY and then it doesn't get found.

I also tried using :(x)

:(x): The string between the parentheses will be inserted if the field marker preceding this modifier resolves to an empty value. The placeholder x may be any string. For instance, the marker [VOLUME:(unknown)] will return the entry’s volume if set, and the string unknown if the entry’s VOLUME field is not set

Unfortunately it doesn’t work, because :(x) doesn’t acknowledge bib entry fields such as [YEAR] or [DATE].
Otherwise, i probably could build amazing expression, such as these ones:

**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*[YEAR].*\\-[MONTH:([YEAR]).*\\.[extension]
**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*[DATE:(YEAR)].*\\.[extension]
**/.*[auth.etal:regex("\\."," & "):regex("\\.etal","et al.")].*[DATE:([YEAR].*\\-[MONTH])].*\\.[extension]

But they are all slightly faulty and are very limited in their ability to scale.

A proper solution would probably be to capitalize on a modifier marked as empty, which would allow an IF, THEN statement. If the preceding modifier resolves to an empty value, then the next specified modifier (or search expression) in line should be triggered. Adding Boolean expressions (AND and OR functionality) could be a solution too. Unless one wants to change and add to the current modifiers.

I found a partial workaround outside of Jabref.

Linux Mint 20.2 includes a tool called “Bulk File Renamer”. There are similar tools for different Operating Systems. For now i can manage with this. Of course it takes a little bit of time to search for files in all folders and subfolders, then renaming all pdfs, before attaching them to Jabref entries with the existing Jabref search expressions, but it probably is faster than doing it by hand.

If anybody would want to enhance this process and make it faster as well as eliminate the dependance on external programms, then the above mentioned proposals may be a way forward.

Edit: This only solves the et al. problem.

What it does not solve:

  • Jabref still automatically attaches ALL filetypes, instead of just pdf with the [extension] command.
  • My stupidity to only use the year format (YYYY) for a long time to name my files, instead of YYYY-MM-DD format or another more accurate identifier.

**/.*[authEtAl:regex("EtAl"," et al.")].*[YEAR].*\\.[extension]

works with et al. :sweat_smile::ok_hand::relaxed:

Edit: Does not capture & between first and second author. So in general good, but not for my specific use-case

Helloooooo, i found the best search expression!!

**/.*[auth.etal:regex("\\."," & "):regex("\\& etal","et al.")].*[Organization].*[YEAR].*\\.[extension]

It also honours the &, when there are only two authors!

3 Likes

For anybody with similar problems: Nowadays, I probably would use a bulk file renamer tool to simplify the file names of the pdfs first, before trying to come up with a regex…