Universal Citekey Generator

sdotvdot · December 1, 2020, 1:30pm

The universal citekey generator from Papers 2 and Papers 3 tries to generate unique letters for the citekey in a deterministic and consistent way using the title or the doi.

For example, consider the paper below:
Nick Bostrom, Are We Living in a Computer Simulation?, The Philosophical Quarterly, Volume 53, Issue 211, April 2003, Pages 243–255, https://doi.org/10.1111/1467-9213.00309

Using the universal citekey generator we get the following citekeys:

Universal Citekey From DOI: Bostrom:2003bq
Universal Citekey From Title: Bostrom:2003tn

The first one uses the DOI and the second one uses the title to generate the citekey. Actually, the last two letters are unique and built in a deterministic way. This way, they are consistent across users and are independent of the order in which the articles were added.
We can find the code in javascript here:

Thus, this functionality, besides being useful for allowing a desirable standardization of the citekeys, is easy to be implemented, since there is a code already ready and available for this.

k3KAW8Pnf7mkmdSMPHz2 · December 1, 2020, 3:06pm

The functionality could be added as a “special field”, e.g., [PapersCitationKey]. However, unless the linked source-code is the actual code used by Papers 2/3 it might produce different results (depending on how they deal with unicode).
If you are referring to https://www.papersapp.com/ , perhaps it would be possible to ask them what is being used?

sdotvdot · December 1, 2020, 3:22pm

This “special field” would be perfect.
An yes, this is indeed the method used in Papers 2 and Papers 3. And I am referring to the papers app versions 2 and 3.

k3KAW8Pnf7mkmdSMPHz2 · December 1, 2020, 6:29pm

My question is if it is the code they are using, not just the method =/
It is indeed possible to implement the Java equivalent version of the linked JavaScript, but I don’t know if it would produce the same citation keys as Papers 2/3 without knowing more about their implementation.

The issue is that a title containing {\o}, might be interpreted as,

o
{\o}
ø (and its different unicode variants)
oe (unlikely)

which would generate different citation keys. See GitHub - retorquere/universal-citekey-js: Javascript implementation of universal cite key for some additional issues.

They are no longer using this method? Perhaps [Papers3CitationKey] would be better in that case.

sdotvdot · December 1, 2020, 7:21pm

In fact, the company that developed the Papers application (versions 2 and 3) was acquired by Readcube, which introduced the Readcube Papers application in which this universal citekey function has not yet been implemented.

Being more rigorous in the analysis, it is also clear that due to a counting argument, it is not possible to make a collision-proof scheme with just two letters.

k3KAW8Pnf7mkmdSMPHz2 · December 2, 2020, 12:10am

I see, hum… I think the discussion deviated from where I wanted it to go
I guess it boils down to the following,

It might be possible to make a citation key generator that generates the exact same key as Papers 2/3, but I can’t. I would need more information for that. If Readcube Papers does not currently support the feature, I assume they can’t help out.
Making an “universal” citation key generator similar to the linked one is possible, e.g., a [Mekentosj] “special field” based on the link with some changes.
Making a “checksum” modifier (e.g., [title:checksum:truncate2]), in which case the user could decide themselves what probability of collision they care about, could be an interesting addition to the current “Letters after duplicate generated keys” scheme. Are you interest in implementing this?

sdotvdot · December 9, 2020, 10:55pm

Hi K3,
I think 2 is a good alternative.

Regarding 3, I am not qualified to implement this issue. Could you elaborate more about the idea behind 3? The number of letters is up to the user?

mlep · December 10, 2020, 4:10pm

A Universal Citekey Generator that is not universal would be strange, no?
Maybe contacting Readcube Papers to develop and use the same algorithm would make sense.

k3KAW8Pnf7mkmdSMPHz2 · December 15, 2020, 12:43am

Basically, the idea behind 3 is that it might be easier to make something from scratch that is more native to Java. It might also be useful if it is intentionally not universal, but user configurable. If/when https://github.com/JabRef/jabref/issues/7111 gets implemented, a user can decide to use more “stable” fields, if they have those in their database, e.g. [DOI:{[EPRINT:{[TITLE]}]}:checksum:truncate2] to generate a key between aa and zz or ...:truncate3] for a key between aaa and zzz.
It would mostly be useful between libraries or within a group if everyone uses JabRef (you can share cite key patterns with the .bib file).

Regarding option 2, as @mlep says, it might be possible to contact the developers at Readcube Papers. There exists some support in Java for running JavaScript using Nashorn/GraalVM so it might be possible to run exactly the same code. Since it has been questions asked about it on the Zotero forum as well perhaps it would be useful to someone even if there isn’t a 1:1 correspondence in cite keys as long as they are “close enough”.

Regarding implementing this, I will not have time for quite a long while, hence I felt I should ask

cparnot · December 23, 2022, 9:29pm

Hi! Original writer of the universal citekey implementation. It’s been 2 years, but in case this is ever useful, the javascript code is exactly equivalent to what we shipped with Papers 2 and 3. The hashing function CRC-32 is easy to implement in any language (and in many, it’s part of the standard library).

Topic		Replies	Views
Smart key generation for duplicated citation key Features	3	567	May 17, 2023
Possible improvement on citation key generation Features	0	162	March 14, 2024
Key generator for the [authorsAlpha][year] style Help	5	556	July 6, 2021
Need help on citation key generator Help	2	2926	July 28, 2022
Add `a` to duplicated citation keys, start on second duplicate with letter `b` Features	0	425	October 12, 2023

Universal Citekey Generator

Related topics