We had a hard time to discuss about the search syntax and whether we should keep the JabRef-custom search syntax or to migrate to some more common syntax. We opted for Apache Lucene Syntax, which we already use for the Web search.
Now, we are in the discussion of migrating of groups based on a free-form search expression. We think, that everything can be migrated, but would like to have a more usage-based opinion.
We found out, that the current search does not cover all possibilities described at the search help.
Call for action: Open your bib file with a text editor and search for SearchGroup. What is your content?
@mlep Thank you for the fast reply! Are we guessing right, that you are searching for substrings only? We have a long discussion about “choc” matching “chocolate” and “apple” also matching “pineapple”. In other words: exact versus inexact matches and how to guide the user. Currently, we think that it is OK for the 6.0-alpha (and maybe also 6.0) to enable in-exact matches only. What’s your take on this?
I was searching for the EXACT word, otherwise I would have used a regular expression such as Built .
In my example, it does not make much difference. But, anyway, I would have expected that “choc” would not report “chocolate”. But I may not be a good tester for this.
I guess JabRef should do as most of users would expect.
About this:
in Linux Mint cinnamon, the file manager (nemo) will show “chocolate” if you look for “choc”.
Windows file manager works the same.
I do not know for Mac.
So, a default behaviour with “choc” giving “chocolate” seems the way to go.
I agree that substring matching is what most users would probably expect, not only because this is typical in applications, but also because it is common to all the search interfaces I use for research (PubMed, ProQuest, Ovid, Embase, Cochrane, etc).
Another reason to maintain this behaviour in the long run is that the results make the syntax self-evident. If searching for apple matches “apple”, “apples”, and “pineapple”, then I see immediately that substring matching is in effect. In contrast, if a search for “apple” fails to return “apples”, then the user has to also search for “apples” to find out if the results change.
Regarding JabRef groups, I use substring matching very often, and design my static groups to take advantage of this.
For example, if an entry has the group one/included and another entry has the group two/included then I can search for
groups=/included
to find entries included in one or two.
I have been avoiding regex for “performance” optimisation, but otherwise, regex is a suitable alternative for me and converting existing groups to regex is not too much trouble.
I have several groups like this. Here is an example
2 SearchGroup:beaming\;2\;title|abstract|groups|keywords:/beam/ AND -groups:/demo/exclude|:not/\;0\;1\;1\;0xb3661aff\;\;\;;
beam is intended to capture variations such as beams or beaming. demo/exclude and :not are prefixes intended to capture various subgroups.
Updates:
I installed the latest development version and migrated groups to the new syntax.
Several of my search groups, including those like the example above no longer worked correctly after conversion.
I rewrote multi-field searches to OR statements (is this documentation helpful?)
Some expressions that JabRef converted automatically to regex could be expressed in Lucene syntax without regex.
Strings that JabRef converted to regular expressions were missing escape characters before : and / (incorrect)
Searching for fieldname:something can return false negatives if the target is preceded by a non-word character, e.g. “hyphenated-something” or “:something”. I needed to use fieldname:*something to get the desired results.
The wildcard * seems to work in the first position, even though this is said to be unsupported by Lucene.
I noticed that the regex option has been removed from JabRef’s search bar and from “Free search expressions”, but is still present for groups created by “Searching for a keyword”.