Universal Citekey Generator

The universal citekey generator from Papers 2 and Papers 3 tries to generate unique letters for the citekey in a deterministic and consistent way using the title or the doi.

For example, consider the paper below:
Nick Bostrom, Are We Living in a Computer Simulation?, The Philosophical Quarterly, Volume 53, Issue 211, April 2003, Pages 243–255, https://doi.org/10.1111/1467-9213.00309

Using the universal citekey generator we get the following citekeys:

Universal Citekey From DOI: Bostrom:2003bq
Universal Citekey From Title: Bostrom:2003tn

The first one uses the DOI and the second one uses the title to generate the citekey. Actually, the last two letters are unique and built in a deterministic way. This way, they are consistent across users and are independent of the order in which the articles were added.
We can find the code in javascript here:

Thus, this functionality, besides being useful for allowing a desirable standardization of the citekeys, is easy to be implemented, since there is a code already ready and available for this.

The functionality could be added as a “special field”, e.g., [PapersCitationKey]. However, unless the linked source-code is the actual code used by Papers 2/3 it might produce different results (depending on how they deal with unicode).
If you are referring to https://www.papersapp.com/ , perhaps it would be possible to ask them what is being used?

This “special field” would be perfect.
An yes, this is indeed the method used in Papers 2 and Papers 3. And I am referring to the papers app versions 2 and 3.

My question is if it is the code they are using, not just the method =/
It is indeed possible to implement the Java equivalent version of the linked JavaScript, but I don’t know if it would produce the same citation keys as Papers 2/3 without knowing more about their implementation.

The issue is that a title containing {\o}, might be interpreted as,

  1. o
  2. {\o}
  3. ø (and its different unicode variants)
  4. oe (unlikely)

which would generate different citation keys. See https://github.com/retorquere/universal-citekey-js#open-problems-discussed-here for some additional issues.

They are no longer using this method? Perhaps [Papers3CitationKey] would be better in that case.

In fact, the company that developed the Papers application (versions 2 and 3) was acquired by Readcube, which introduced the Readcube Papers application in which this universal citekey function has not yet been implemented.

Being more rigorous in the analysis, it is also clear that due to a counting argument, it is not possible to make a collision-proof scheme with just two letters.

I see, hum… I think the discussion deviated from where I wanted it to go :stuck_out_tongue:
I guess it boils down to the following,

  1. It might be possible to make a citation key generator that generates the exact same key as Papers 2/3, but I can’t. I would need more information for that. If Readcube Papers does not currently support the feature, I assume they can’t help out.
  2. Making an “universal” citation key generator similar to the linked one is possible, e.g., a [Mekentosj] “special field” based on the link with some changes.
  3. Making a “checksum” modifier (e.g., [title:checksum:truncate2]), in which case the user could decide themselves what probability of collision they care about, could be an interesting addition to the current “Letters after duplicate generated keys” scheme. Are you interest in implementing this?

Hi K3,
I think 2 is a good alternative.

Regarding 3, I am not qualified to implement this issue. Could you elaborate more about the idea behind 3? The number of letters is up to the user?

A Universal Citekey Generator that is not universal would be strange, no?
Maybe contacting Readcube Papers to develop and use the same algorithm would make sense.

Basically, the idea behind 3 is that it might be easier to make something from scratch that is more native to Java. It might also be useful if it is intentionally not universal, but user configurable. If/when https://github.com/JabRef/jabref/issues/7111 gets implemented, a user can decide to use more “stable” fields, if they have those in their database, e.g. [DOI:{[EPRINT:{[TITLE]}]}:checksum:truncate2] to generate a key between aa and zz or ...:truncate3] for a key between aaa and zzz.
It would mostly be useful between libraries or within a group if everyone uses JabRef (you can share cite key patterns with the .bib file).

Regarding option 2, as @mlep says, it might be possible to contact the developers at Readcube Papers. There exists some support in Java for running JavaScript using Nashorn/GraalVM so it might be possible to run exactly the same code. Since it has been questions asked about it on the Zotero forum as well perhaps it would be useful to someone even if there isn’t a 1:1 correspondence in cite keys as long as they are “close enough”.

Regarding implementing this, I will not have time for quite a long while, hence I felt I should ask :stuck_out_tongue:

1 Like

Hi! Original writer of the universal citekey implementation. It’s been 2 years, but in case this is ever useful, the javascript code is exactly equivalent to what we shipped with Papers 2 and 3. The hashing function CRC-32 is easy to implement in any language (and in many, it’s part of the standard library).