INSPIRE API

Q: I want to do automated searching and receive machine readable responses. Do you have an API?

A: YES, Inspire has a feature-rich programmatic query interface for third party tools and overlays.

Inspire exposes an API (Application Programming Interface) for querying most aspects of its holdings and provides responses in either XML, Enhanced MARCXML or JSON. This is described below.

Note that the API is intended for real time querying and interactive tools. For (large scale) bulk download the OAI-PMH interface is much more suitable, in particular with its incremental update functionality, which tracks the ever changing information in Inspire. A periodically updated snapshot of HEP record metadata in json format is also available at https://inspirehep.net/hep_records.json.gz (400 MB) with checksum https://inspirehep.net/hep_records.json.gz.md5 (61 B).

An additional monthly updated Enhanced MARCXML dump of INSPIRE metadata is available at https://inspirehep.net/dumps/inspire-dump.html which is useful to bootstrap services that want regularly harvest INSPIRE via MARCXML.

Inspire API query format

The API requests have the following general form:

GET /search?p=...&of=...&ot=...&jrec=...&rg=...

To return information on a single record, simply replace /search with /record/

p = pattern (query)

This is the query in the Inspire search syntax. All search features and operators familiar from the Inspire web interface and documented in the online help are supported and complex queries are possible.

of = output format

The format of the response sent back to the client. There are two choices, of=xm for (MARC-) XML or of=recjson for JSON. The XML response format is MARCXML or fragments thereof when individual fields are selected via the ot parameter

ot = output tags

Select (filter) specific tags from the MARCXML response. This option takes a comma separated list of MARC tags. Valid MARC tags for Inspire records can be found here.

Select specific named fields for the JSON response. This is similar to selecting MARC tags, however by name instead of numerical value. In addition the JSON response can contain derived or dynamically calculated values which are not available in MARC. See below for more information on JSON field names.

rg = records in groups of (25)

This parameter specifies the number of records per chunk for long responses. Note that the default setting is 25 records per chunk. The maximum number is 250.

jrec = jump to record (123)

Long responses are split into several chunks. To access subsequent chunks specify the record offset with this parameter.

There are several other options; see https://inspirehep.net/help/hacking/search-engine-api for a full list of possible parameters (key/value pairs).

MARCXML Example

As mentioned above, the XML API uses MARC tags. Here is an example of a request for the first author and title of a particular record:

GET /record/451647?of=xm&ot=100,245

returns



<!--?xml version="1.0" encoding="UTF -8"?-->
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
     <controlfield tag="001">451647</controlfield>
     <datafield ind1=" " ind2=" " tag="100">
          <subfield code="a">Maldacena, Juan Martin</subfield>
          <subfield code="i">INSPIRE-00304313</subfield>
          <subfield code="u">Harvard U.</subfield>
     </datafield>
     <datafield ind1=" " ind2=" " tag="245">
          <subfield code="a">The Large N limit of..</subfield>
     </datafield>
</record>
</collection>

 

JSON Examples

The JSON API operates similarly, with named fields instead of MARC tags. The same request as above would look like:

GET /record/451647?of=recjson&ot=recid,authors,title

Since the field names are evolving, a comprehensive list is currently best found in the source: https://github.com/inspirehep/invenio/blob/prod/modules/bibfield/etc/atlantis.cfg

Note that the XML API only covers MARC metadata. If you are interested in information such as citation counts or who is citing you, try using the JSON API.

An example for getting citation counts:

GET /record/451647?of=recjson&ot=recid,number_of_citations,authors,title

returns

{
     "recid": 451647,
     "number_of_citations": 9739,
     "authors": [{
          "INSPIRE_number": "INSPIRE-00304313",
          "affiliation": "Harvard U.",
          "first_name": "Juan Martin",
          "last_name": "Maldacena",
          "full_name": "Maldacena, Juan Martin"
     }]
     "title": {
          "title": "The Large N limit of ... ity"
     }
}

An example for finding out who cites you:

GET /search?p=refersto:author:maldacena&of=recjson&ot=recid,creation_date,authors[0],number_of_authors,system_control_number

returns

[{
     "authors": [{
          "INSPIRE_number": "INSPIRE-00117035",
          "affiliation": "Santa Barbara, KITP",
          "first_name": "Joseph",
          "last_name": "Polchinski",
          "full_name": "Polchinski, Joseph"}],
     "recid": 37688,
     "number_of_authors": 2,
     "system_control_number": [{
          "institute": "arXiv",
               "canceled": "oai:arXiv.org:hep-th/9404008",
               "value": "oai:arXiv.org:hep-th/9404008"},
          {"institute": "DESY",
               "canceled": "D94-08493"},
          {"institute": "DESY",
               "canceled": "D94-18453"},
          {"institute": "SPIRESTeX",
               "canceled": "Polchinski:1994my"},
          {"institute": "KEKSCAN",
               "value": "2000-33-672"},
          {"institute": "CDS",
               "value": "261256"}],
     "creation_date": "1994-04-06T00:00:00"}

...

]

When searching large collaborations using JSON, you may want to use authors[0],number_of_authors to retrieve just the first author instead of a complete list, as in the example above.

MARCXML Vs. Enhanced MARCXML

MARCXML is the native format used to store metadata in INSPIRE. All the bibliographic metadata that could be hand-curated are stored in the MARCXML format. Links between authors in papers to corresponding authors in HepNames, links between 

Author disambiguation and references are available in the Enhanced MARCXML format. This is based on the original MARCXML, but with additional subfields that express relations across records.

E.g. 100__ $$x in a HEP record contains the recid of the corresponding HepName author. 100__ $$y contains 0 if this relation was guessed by our algorithm, 1 if it was claimed by the author itself or a curator.

E.g. 999C5 $$0 contains the recid of the corresponding cited paper.

For more information on the additional relations see the detailed MARCXML description in:

(the additional relations are tagged as "Only in XME format").

The Enhanced MARCXML format is available:

Note: please use the Enhanced MARCXML format only if needed (in particular via OAI-PMH/API) since it's currently computed at run-time, and is quite computational intensive to generate.

Note that the API is still under development and things may change without warning. In particular the entry point for API requests will be moved to /api which allows better traffic separation, sandboxing, and throttling of API requests where necessary. Currently we ask you to be mindful of expensive queries that would be rough on the servers. Check back here and follow @inspirehep on Twitter for updates.

If you have questions, please send them to feedback@inspirehep.net

Last modified: 2016-01-11