Integrating related articles from MrDlib into JabRef

Thirdly, I suggest showing off some of the intelligent parts
of Mr.Dlib, such as the relation degree with the selected article. Also,
providing options for how the user can respond to the provided list (View,
Download, Cite).

Probably something like:

I think in overall it may look like:
https://postimg.org/image/3q3csdb2d/

However, if the Mr. Dlib tab is not consistent with the rest
of the program it may feel like ads section and probably be neglected.

I am confused in how the user can reach the recommended
articles if they started from scratch? Or should the user collect some initial
entries in the database to be able to use Mr. Dlib???

Another question, is it possible to choose multiple entries
to calibrate the parameters of the recommendation?

These are initial thoughts of my first impression, I would
be glad to share more if it is appropriate for you.

@joerg.lenhard we are primarily interested in a code review as we are aware that the user interface itself it not yet ideal ;-).

I would also like to point out that the current version is certainly not yet ready for a release. For instance, the current recommender system does not yet request related articles for the currently selected article. Instead, we hardcoded to request related articles always for the same input document. Hence, the current version is just intended to show you our progress, and receive feedback regarding the code.

Stefan will create a proper pull request next week.

@Sara_Yousif Thank you very much for your thoughts. I really like your illustrations/drafts and believe that this is certainly how the recommendations should be displayed - in the long run.

Let me explain some of my ideas and how I see the integration of Mr. DLib in JabRef.

One of the main reasons for me to create Mr. DLib was to have an application that allows me conducting research in the field of recommender systems. My goal is that in the long run many dozens of partners use Mr. DLib. To make my research as easy as possible, and prevent duplicate implementations (and hence wasting valuable development time), I want to implement everything only once, i.e. on the server of Mr. DLib. This means, I would prefer if as much of the presentation as possible is specified by Mr. DLib, so we do not have to implement displaying the recommendations in the clients of each partner.

Therefore, Mr. DLib delivers recommendations in formatted HTML like this:

<a href='https://api-dev.mr-dlib.org/v1/recommendations/3336751/original_url?access_key=ec5946b1645539faeccb2d331d69ae58&format=direct_url_forward'><font color='#000000' size='5' face='Arial, Helvetica, sans-serif'>The mind and the machine: philosophical aspects of artificial intelligence.</font></a><font color='#000000' size='5' face='Arial, Helvetica, sans-serif'>. <i>Ellis Horwood series in artificial intelligence</i>. 1984.

This way, the client (JaRef) only needs to be able to display HTML code, and we can easily change on our server the way how recommendations are displayed (without the need of adjusting JabRef).

Of course, the current presentation is not ideal but it will only be the very first version. Once, everything is running, we will improve the presentation. Actually, we want to research which presentation is ideal. So, we will have e.g. 100 variations and display variation A to some users, variation B to some other users, variation C 
 and then see which one performs best. You are very welcome to support us in this, and I will send you later an email with more details

sorry for the muliple postings, but this forum only allows adding two URLs in one post

I am confused in how the user can reach the recommended
articles if they started from scratch?

Assuming that we would integrate the arxiv.org articles, we would have two options. We could link a recommendation to the article’s overview page, e.g. [1602.02842] Collaborative filtering via sparse Markov random fields and/or directly link the PDF https://arxiv.org/pdf/1602.02842v1.pdf

See also HTTP : // From which source would JabRef users like to receive article-recommendations?

is it possible to choose multiple entries to calibrate the parameters of the recommendation?

Not yet. The first version will only be able to display related articles for a single input document. However, after that we will implement more sophisticated solutions, see also https ://isgroup.atlassian.net/browse/MDL-51

@joeran I’m a bit skeptical that returning HTML from Mr. DLib is the best way to integrate it into JabRef. You definitely have a good point regarding maintenance, which is a lot easier. However you are sacrificing the level of how deep the recommendation system can be integrated into JabRef. For example, it will be very hard to interact with the existing user’s db or use JabRef’s features. Just to give you an idea, what I would like to have in the future:

  • Click on a recommended item, which is already in the db should highlight this item in JabRef itself instead of opening a new website.
  • Import recommended item(s) into JabRef with a single click (I don’t want to open a website, search for the bib-data there and then come back to JabRef)
  • Preview pdf-file of recommended article (we are currently considering to implement a pdf reader in Jabref)

All of these features are hard to implement from a server-side since they require to call JabRef’s internal code.
I would really like to see Mr DLib provide an interface (REST?) where I send an article and get a list of recommended items in form of structured data (BibTeX, XML, JSON). Then its JabRefs job to display the items.

@joeran @tobiasdiez I agree with tobias. If Mr. DLib only returns HTML that is displayed in a tab in JabRef, then it will be hard to have a decent integration with the remaining functionality. Also meaningful error handling will be difficult.

On top of that, it will mean that Mr.DLib will determine formatting, layout and colors and that is something that I do not like. This means that it will not be possible for us at JabRef to make sure that there is a consistent layout and formatting. Also you could change the formatting at your end any time and thus inject a new look into JabRef that we have no control over. Moreover, various ways of shooting down JabRef with invalid HTML might become feasible.

I certainly get your point regarding maintenance for you, but you surely understand as well that our priorities are different (a nice looking, consistent, stable JabRef). I think the proper way for an implementation would be an interface as tobias suggests.

@joeran I agree with @tobiasdiez that returning HTML is not a good idea as it will be as an isolated section with no cross interaction between the Mr. Dlib and the reference management system.

My question is, which is the main and which is helper? the reference manager or the recommender system? I mean, in Zotero for example, the browser is the main (containing the searched content and list of articles from different sources) and then Zotero tab is only one third of the page to show the reference database.
To see the big picture I think we should look from the user point of view first. What is the scenario of Mr. Dlib from the user point of view?? I will give my understandings and please correct me if I got something wrong.

First of all, the user goal is to get appropriate Literature for his research. This literature has some criteria such as quality, recognized publisher, known authors, recently published and other criteria. Personally, I start from google scholar entering my keywords then I get a list of results, where I choose the suitable article and add it to the references database to be used in my editor. However, when I open Google scholar’s main page I got some articles from Google’s recommender system by knowing my research interests from previous searching and keywords. (If they are appropriate for me it will give positive feedback to the learning results otherwise it will adjust the parameters). As a result, the recommender system conduct suggested articles that could be better that the result from user’s keywords.
The drawback could be that the user is looking for different details each time but under the same umbrella. For example, looking for IoT applications, next looking for IoT architectures (different branches of the mind map).

For JabRef and Mr. DLib how will the scenario be? (from the user point of view)

And one additional thought to the points @joerg.lenhard and @tobiasdiez already mentioned:

Whereas it even might be acceptable for us to show the HTML version provided by Mr. DLib this will be inacceptable for other potential partners - because of the reasons Jörg pointed out.
So I think providing a REST API returning XML/JSON would be preferable not only for us, but also for other partners who want to be in control of what and how it is displayed.

The return format doesn’t even need to be XML/JSON. It could just be plain BibTeX :wink:

@Sara_Yousif: In this case, the scenario from the user point of view will definitely be the reference manager first. JabRef is not an addon to a recommender system, but a recommender system might be an addon to the reference manager. At least this is how I understand it, why I contribute to JabRef, and what I will try to safeguard :slight_smile:

Which would of course be easy to integrate for us - bot not necessarily for other potential partners :wink:

However, exporting some already existing format might ease the integration as often some importer code is already available.

JabRef is not an addon to a recommender system, but a recommender system might be an addon to the reference manager.

I can definitely agree to this point of view :wink:

However, IMHO there are many more issues to consider than Functionality and Maintainability. I will explain all issues in detail, but first let me mention that Mr. DLib does not only delivers HTML code but XML with HTML embedded. See, for instance, here https://api-dev.mr-dlib.org/v1/documents/ubk-opac-HL000670928/related_documents/. So, there is the option to deliver some additional structured information in addition to the HTML if needed.

Functionality
I agree with you that some functions will be more difficult to implement when Mr. DLib delivers (primarily) HTML (and a few things might be even impossible). Nevertheless, most functions should still be rather easy to implement. For instance, selecting an article in Jabref after clicking a recommendation should be possible with a URL handler like jabref://article/#bibtexkey instead of opening a URL like http://arxiv.org/


“Researchability”
Our goal is to make our recommender system as effective as possible for every partner. This also includes an optimal presentation of the recommendations – and the presentation may have a really big effect on user satisfaction with a recommender system. However, finding the ideal presentation is not trivial. We need to do many experiments for this in which we vary all kind of variables. If we generate the presentation on our server, it will be very easy to change variables and analyze the effect. We would not have the manpower to implement those experiments separately for each partner. In other words: If you want the presentation primarily to be implemented in JabRef then we would have to take just one presentation and stick to it.

Compatibility
Mr. DLib is a new project, and most likely there will be some changes in the XML format. The more of the presentation is implemented in JabRef (based on the XML or whatever), the higher the probability that one day a new version of Mr. DLib is not compatible any more with an old version of JabRef.

Long-term Sustainability
As mentioned, we do not have the resources to implement an individual presentation for each partner. Consequently, the JabRef developers would have to support us here. I know that currently you are quite active but this has not always been the case in the past years. If in e.g. one or two years the development resources of JabRef would be limited again, probably no one could take care of required changes in the presentation.

Maintainability / Ease of bug fixing / need for organization
The more recommendation functionality is implemented in JabRef, the more dependent we become in our release cycles on JabRef. For instance, if we want to test a new way of presenting recommendations, we would have to wait until a new version of JabRef is released. This is not ideal. Similarly, as mentioned, but fixing would take much longer.

Also you could change the formatting at your end any time and thus inject a new look into JabRef that we have no control over.

Even if we did this (which we won’t), would this really so bad? I mean, the integration of e.g. MathSciNet also doesn’t exactly look like the other JabRef layout.

Moreover, various ways of shooting down JabRef with invalid HTML might become feasible

I am sure there are good JAVA libraries to validate HTML code, so this risk would be very low. In addition, by sending XML, the same problem theoretically would exist.

Whereas it even might be acceptable for us to show the HTML version provided by Mr. DLib this will be inacceptable for other potential partners

I am optimistic that I will convince other partners to use HTML, too :slight_smile:

My suggestion is, we try the HTML way (especially since it is already implemented). And when we start to think about the “advanced” JabRef recommender system in which users also should be able to register an account and receive personalized recommendations, we talk again about if to continue using HTML or not.

@joeran: Apologies if my post sounds rather negative, it is really meant constructively.

We just had a devcall and decided that we will not support an integration of Mr. DLib purely with embedded HTML. Ultimately, if you want integration with JabRef, and thus get access to a potentially large user base, you will have to be ready to invest significant effort into JabRef development itself.

An API-based solution (which we prefer) would also have many advantages regarding maintainability and sustainability for you. We should probably have another telco together, since all these things can be clarified more easily in direct communication.

As for your points:

Functionality
From my experience in programming JabRef, I disagree with what you write. Integration will not be rather easy. We have no support for URL handlers or similar stuff at the moment (as far as I know)

Researchability
Sorry, but recommender system researchability is not the purpose of JabRef and will not go into our consideration for integrating features

Compatability
Sure, you would have to make sure that this goes as smoothly as possible. External tool integration often breaks for older releases, most prominently with the google scholar fetcher.

Long-term Sustainability
You can easily turn that argument around: If Mr. DLib has limited resources, than all the maintenance effort will be left with the JabRef team. This is not what we want. To make this entirely clear: Long-term sustainability of Mr.DLib will be up to you, we will not implement the presentation in JabRef for you and you should be willing to maintain it. Small minor changes (one-liners) can probably be done by us, but if the presentation breaks in the future and no one is ready to fix it, then we have a problem.

Maintainability / Ease of bug fixing / need for organization
If you want JabRef integration, you will have to live with the release cycles of JabRef. Since bug fixing regarding Mr.DLib will have to be done by your team, you have full control over its speed. Finally as you point out, the MathSciNet tab is one of the many ugly parts of JabRef, which really needs improvement.

I am sure there are good JAVA libraries to validate HTML code, so
this risk would be very low. In addition, by sending XML, the same
problem theoretically would exist.

What if your server is hacked? Injection attacks are certainly always possible, but there exist much more sophisticated HTML-based injection attacks than, for instance, BibTeX-based injection attacks. A tunnel for arbitrary HTML dynamically loaded at runtime from some server into JabRef is really not what we want.

My comments may sound quite negative now, but they are really not meant that way. We are of course interested in getting recommender system support for JabRef, we just have to be clear about the interaction: We do not have the capacity in the JabRef team to develop or maintain the integration for you. Nevertheless, you will have to follow the style and quality constraints we set. Also our time is very very limited, so we cannot pick up even more tasks from Mr.DLib.

All the discussion we are having now is still quite unclear, since we do not really know how things are going to work. We would need you to provide an implementation at which we can look (in a PR) to see what problems may arise. Then we can have a more meaningful discussion on how things could work. Just be prepared that this requires time and effort.

hi joerg, thank you for the clarification. i suggest we upload the source code of what we did so far next week, and then - as you suggested - we schedule another phone call to talk about the next steps.

hi joeran, thank you for your understanding! This sounds fine, let us proceed that way.

Current pull request: https://github.com/JabRef/jabref/pull/2189

Encountering this feature in 4.0.0 beta I wonder whether it could also be made optional for the user to not show&load this tab?

I think there is an option in the preferences to disable the recommendation feature.

@i_ngli @tobiasdiez Yes, this has been requested before and is implemented. It is explained here.

thanks, ok. I would say: the shift of formulations is confusing. better to show coherence. thus I suggest a change here https://github.com/JabRef/jabref/issues/2805

Hi All

I am working on a research project on recommender system interfaces. I wanted to add some features to the related articles presentation tab . I am looking for some help form this form .

Best Regards
Ahmad