Relevant articles

1 minute read


Recently, together with Jose Eliel Camargo I have been exploring a very nice and simple idea. When writing scientific articles, researchers put a lot of effort collecting relevant references and placing them within their text for different puposes: to give credit, to guide the reader to other points of view, to support some statement, etc. This means that looking for papers which tend to be cited close to each other in a collection of scientific articles should provide a good way to extract a group of similar or relevant articles.

With this in mind, we have extracted reference lists from inspirehep using the inspirehep python wrapper. Each reference list for us is just a list of inspire article ids. We then trained a Skip-Gram model using the gensim library implementation. We end up with a dense representation in the space of inspirehep article ids, from where we can extract similar items using cosine similarity. Very simple!

Lets look at some of the results, I will start with one of my favourites:

I retrieve the three closest articles by cosine similarity to the following classic article:

I get the following results:

These results are very good, as these articles developed simultaneously with the article by Gerard ‘t Hooft and M.J.G. Veltman the concept of dimensional regularization.

Lets look at another article, starting with

we predict the following three most similar articles

Which again, looks quite good taking into account the Nobe Prize for Physics in connection with the Higgs boson discovery. We are very happy with the results obtained so far and continue working on the topic.