Regex not working as expected - year and month

I want to create a group that filters for a particular author and from the year their research group started. Using the documentation, I can get something that’s close author = name and year = 201[4-9]|202[0-9] to search for all years from 2014 until the present (technically this will break at 2030 but that’s a problem for later). This regex101: build, test, and debug regex shows the regex working as I want, but this does not work within JabRef itself. The entries with months in the year entries don’t get captured i.e., 2015 gets captured but 2015-03 do not get captured in JabRef, but both get captured in the regex tester.

Am I missing something?

Your RegEx was slightly incomplete. Using the documentation for JabRefs search at Searching within the library | v5 | JabRef as help, I modified your searchterm to this one, which should also find the months:

author = name and year = ^201[4-9]-?.?.?$|^202[0-9]-?.?.?$

RegEx:

^201[4-9]-?.?.?|202[0-9]-?.?.?$

Explanation:

1st Alternative ^201[4-9]-?.?.?

  • ^ asserts position at start of a line
  • 201 matches the characters 201 literally (case sensitive)
  • [4-9] matches a single character present in the list [4-9]
  • 4-9 matches a single character in the range between 4 (index 52) and 9 (index 57) (case sensitive)
  • - matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • . matches any character (except for line terminators)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • . matches any character (except for line terminators)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • | denotes a boolean OR

2nd Alternative 202[0-9]-?.?.?$

  • 202 matches the characters 202 literally (case sensitive)
  • [0-9] matches a single character present in the list [0-9]
  • 0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
  • - matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • . matches any character (except for line terminators)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • . matches any character (except for line terminators)
  • ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  • $ asserts position at the end of a line
1 Like

I think the intended meaning was that the expression should match the appropriate years, regardless of the month, not that the expression needed to capture the year and month. The expression should not have to match the entire year/date field to match the target years.

Am I missing something?

The original expression date = 201[4-9]|202[0-9] (changing year to date for my biblatex libary) matches the target years in my biblatex library, whether or not the field includes months.

However, the pipe character does not work entirely as a I expect in JabRef, and I am not sure if this is a JabRef thing, Java regex syntax, or something else. In my case, there is clearly something wrong because the toolbar says “No results found” even though the entry table is filtered to the matching entries. This happens with literal or regex searches.

Here are some examples of my results using the search field.

  1. date=^20(1[4-9]|[2-9]\d)
    Line starting with 20 followed by 14-19 or 20-99 => No match

  2. date=^20((1[4-9])|([2-9]\d))
    Same as #1 with extra parentheses for good measure => No match

  3. date=^20\d\d(?<=(1[4-9]|[2-9]\d))
    Line starting with 20 followed by two digits, then look behind for 14-19 or 20-99 as the two digits => No match

  4. date=^20(?=(1[4-9])|([2-9]\d))\d\d
    Line starting with 20, then looking ahead for 14-19 or 20-99 and if positive match the next two digits => No match

  5. date=^201[4-9] or date=^20[2-9]\d
    Same as #1, but split in two expressions => Matches years 2014 through 2099

Edit: Clarification – I expect all of these expressions to match the target years if this syntax is compatible with Java/JabRef. All match when I test in VSCode, but only the last one does in JabRef’s search field in the toolbar.

JabRef 5.13–2024-04-01–6bdcf63
Linux 6.8.8-1-default amd64
Java 21.0.2
JavaFX 22+30

1 Like

You are correct, that was intention. I wanted 2015-03 to get captured because of the year but originally, it was not.

That’s actually a very good observation. I did not catch that my library had mixed year and date fields (I think I switched from bibtex to biblatex at somepoint and did not update the libarary accordingly). After updating my library so that all entries were consistent, my original expression did work as I expected based on the link I shared. That’s the solution I was looking for!

2 Likes

This one matches something, if you remove date=, so there seems to be issues of limiting certain RegExes to only a specific field. JabRef’s RegEx implementation is not 100% compatible with the regex101 website, as you have found out.
Maybe you could create a Github issue for those, but I don’t think those are a high priority, as we have found some workarounds.

1 Like

And yes. If you have mixed date and year fields in your library, you are better off standardizing your library. As a reminder: bibtex is an old unmaintained standard, which requires the year field, whereas the newer and maintained biblatex standard has introduced the date field and follows iso 8601

1 Like

In JabRef date and year are treated as alias fields internally, so JabRef can take month and year also from the date field. Or the other way round use the year field. This might explain some observations