Parse medical subject headings (MeSH) when importing PubMed text files

the structure of the keywords should be changed, so that it takes the delimiters and splits by slash

Yes. Headings and subheadings have a many-to-many relationship, represented in PubMed format as one-heading to zero-or-more subheadings (one heading per line).

Here is a basic example of terms that I modified before importing.

The original form of the record indexed with all the terms was:

MH  - Ankle Joint/innervation/physiology

Usually, I split lines like this into three. While it is redundant to include the heading as a term of its own, this puts all “Ankle Joint” entries in one group, without using subgroups or searches, regardless of subheadings. I find this convenient for working with the references.

Here is another example, this time including major topics, as indicated by an asterisk.

In the original form each record had one of the following forms:

MH  - Imaging, Three-Dimensional
MH  - *Imaging, Three-Dimensional
MH  - *Imaging, Three-Dimensional/methods
MH  - Imaging, Three-Dimensional/*methods

In the transformation, I

  • retained commas from the original terms, and set JabRef’s delimiter to ;.
  • moved asterisks to the end of the heading and/or subheading to keep like-terms adjacent alphabetically
  • marked each keyword with [mh], because the abbreviated syntax for searching subject headings in PubMed is "Some Heading"[mh] and, brackets are not used in medline subject headings (as far as I know).

Notice that all of the entries in the second example deal with three-dimensional imaging, but if I am studying imaging methods, then I am particularly interested in the last group, where “methods” is a major topic and not merely incidental. Headings and/or subheadings can be a major topic.

More info:

  • The label OT - maps to keywords in JabRef. These terms need no special treatment.
  • The label RN - , for registry number named substances, also maps to keywords in JabRef. These terms often include characters that can be incorrectly interpreted as delimiters. I have not dealt with this recently, and I am not sure if JabRef has any difficulty with this.

Here is an example of a named substance registry number (no delimiter-like characters in this case).

RN  - 0 (Chromatin)