Hi. I came here because I was previously interested in finding a way to disambiguate author names in my database. For example, if I have the following names in my database:
- John Smith
- J. Smith
- John R. Smith
- J. Roosevelt Smith
I should be able to select, from some kind of list, which names belong to the same person and my disambiguation tool should replace all occurrences of those names in by bib file, keeping only the name in its most complete version (e.g. John Roosevelt Smith). Maybe there are instances of J. Smith that refer to a Janet Smith, for example, so I should be able to select in which entries the name should be changed and in which ones it should be kept.
I find this important because bib files found in the wild rarely get author names right, be it because of accents, abbreviations or missing middle names. There are styles which depend on an author name being written in the exact same way in every entry in order to apply some formatting rule, so I believe this feature/problem has relevance.
I naively tried creating a Python script to solve this issue automatically, but now I know this is impossible to solve without human intervention. In it, I tried using a combination of the initials of a person’s name and their last name (without accents) to detect multiple different names which might belong to the same individual.
I believe a feature like this would be interesting in JabRef. It already shows concern in normalizing name fields during cleanups and it also has a powerful feature to fetch author names from CrossRef via an entry’s DOI, which could be used as an additional source of information when deciding an author’s full name representation.
I also found another post related to finding author information online which may be helpful in tackling this.
Right now, I am busy with my own research, but I’ve developed Java applications and libraries before, so I might be able to help in the future.