Feedback on search groups - syntax update

Some of you might have noticed, that we are working on a newly improved search for JabRef. (For the insiders, this is the pull request: Lucene search by LoayGhreeb · Pull Request #11542 · JabRef/jabref · GitHub)

We had a hard time to discuss about the search syntax and whether we should keep the JabRef-custom search syntax or to migrate to some more common syntax. We opted for Apache Lucene Syntax, which we already use for the Web search.

Now, we are in the discussion of migrating of groups based on a free-form search expression. We think, that everything can be migrated, but would like to have a more usage-based opinion.

We found out, that the current search does not cover all possibilities described at the search help.

Call for action: Open your bib file with a text editor and search for SearchGroup. What is your content?

Example:

1 SearchGroup:Example\;2\;Example Search\;0\;0\;1\;0x4a797fff\;MDI_ACCOUNT_CARD_DETAILS\;Description\;;

We are especially curious whether you do have something like journal|booktitle:something as search term.

My line looks quite simple (too simple to be of interest?):

1 SearchGroup:Built\;0\;built0\;0\;1\;\;\;\;;
1 Like

@mlep Thank you for the fast reply! Are we guessing right, that you are searching for substrings only? We have a long discussion about “choc” matching “chocolate” and “apple” also matching “pineapple”. In other words: exact versus inexact matches and how to guide the user. Currently, we think that it is OK for the 6.0-alpha (and maybe also 6.0) to enable in-exact matches only. What’s your take on this?

I have

@Comment{jabref-meta: databaseType:biblatex;}

@Comment{jabref-meta: grouping:
0 AllEntriesGroup:;
1 SearchGroup:Entries without a group\;0\;groups != .+\;0\;1\;1\;\;\;\;;
1 SearchGroup:Entries without a linked file\;0\;file != .+\;0\;1\;1\;\;\;\;;
1 SearchGroup:To read\;0\;groups != .+ and readstatus != .+\;0\;1\;1\;0x008080ff\;\;\;;
1 KeywordGroup:Skimmed\;0\;readstatus\;skimmed\;0\;0\;1\;0xffff00ff\;\;\;;
1 KeywordGroup:Read\;0\;readstatus\;read\;0\;0\;0\;0x00ff00ff\;\;\;;
1 StaticGroup:Used\;0\;1\;0x0000ffff\;\;\;;
}
1 Like

(JabRef 5.16–2024-08-24–f6ea6a9)

I was searching for the EXACT word, otherwise I would have used a regular expression such as Built .

In my example, it does not make much difference. But, anyway, I would have expected that “choc” would not report “chocolate”. But I may not be a good tester for this.

I guess JabRef should do as most of users would expect.
About this:

  • in Linux Mint cinnamon, the file manager (nemo) will show “chocolate” if you look for “choc”.
  • Windows file manager works the same.
  • I do not know for Mac.
    So, a default behaviour with “choc” giving “chocolate” seems the way to go.
2 Likes

Yeah,

Spotlight by default searches for anything with the search word starting or maybe even containing

here are my SearchGroup lines

1 SearchGroup:Project\;2\;Project =\;0\;0\;1\;\;FILE_DOCUMENT_MULTIPLE\;\;;
3 SearchGroup:CyberNatures\;2\;project=CyberNatures\;0\;0\;1\;\;\;\;;
3 SearchGroup:2012 Susi Workshop\;2\;project="2012 Susi Workshop"\;0\;0\;1\;\;\;\;;
3 SearchGroup:EMS 2012 Workshop\;2\;project="EMS 2012 Workshop"\;0\;0\;1\;\;\;\;;
1 SearchGroup:Teaching Modules\;2\;teachingmodules =\;0\;0\;1\;\;TEACH\;\;;
2 SearchGroup:Skimmed\;0\;readstatus=skimmed\;0\;0\;1\;\;\;\;;
2 SearchGroup:To read\;2\;priority=prio*\;0\;1\;1\;\;\;\;;
2 SearchGroup:Read\;0\;readstatus=read\;0\;0\;1\;\;\;\;;
1 SearchGroup:LocationHardCopy\;2\;locationhardcopy=*\;0\;0\;1\;\;BOOKSHELF\;\;;
2 Likes

I agree that substring matching is what most users would probably expect, not only because this is typical in applications, but also because it is common to all the search interfaces I use for research (PubMed, ProQuest, Ovid, Embase, Cochrane, etc).

Another reason to maintain this behaviour in the long run is that the results make the syntax self-evident. If searching for apple matches “apple”, “apples”, and “pineapple”, then I see immediately that substring matching is in effect. In contrast, if a search for “apple” fails to return “apples”, then the user has to also search for “apples” to find out if the results change.

Regarding JabRef groups, I use substring matching very often, and design my static groups to take advantage of this.

For example, if an entry has the group one/included and another entry has the group two/included then I can search for

groups=/included

to find entries included in one or two.

I have been avoiding regex for “performance” optimisation, but otherwise, regex is a suitable alternative for me and converting existing groups to regex is not too much trouble.

At the most basic level, it would be helpful to have an indication in the GUI of which groups are search-groups.

I have several groups like this. Here is an example

2 SearchGroup:beaming\;2\;title|abstract|groups|keywords:/beam/ AND -groups:/demo/exclude|:not/\;0\;1\;1\;0xb3661aff\;\;\;;

beam is intended to capture variations such as beams or beaming. demo/exclude and :not are prefixes intended to capture various subgroups.

Updates:

  • I installed the latest development version and migrated groups to the new syntax.
  • Several of my search groups, including those like the example above no longer worked correctly after conversion.
  • I rewrote multi-field searches to OR statements (is this documentation helpful?)
  • Some expressions that JabRef converted automatically to regex could be expressed in Lucene syntax without regex.
  • Strings that JabRef converted to regular expressions were missing escape characters before : and / (incorrect)
  • Searching for fieldname:something can return false negatives if the target is preceded by a non-word character, e.g. “hyphenated-something” or “:something”. I needed to use fieldname:*something to get the desired results.
  • The wildcard * seems to work in the first position, even though this is said to be unsupported by Lucene.
  • I noticed that the regex option has been removed from JabRef’s search bar and from “Free search expressions”, but is still present for groups created by “Searching for a keyword”.