TODO: Add an abstract!

Introduction

Good descriptions of data are essential to finding, understanding and ultimately reusing data. However, data is all too often published without adequate descriptions of its provenance.

Here, we describe nanopublications, a community-driven approach to representing structured data along with its provenance into a single publishable and citable entity. A nanopublication minimally consists of an assertion, the provenance of the assertion, and the provenance of the nanopublication. The nanopublication is represented with and may be queried using Semantic Web technologies (RDF, OWL, SPARQL). Nanopublications also feature a mechanism to ensure the integrity of data and its provenance. Access to provenance enables users to assess the trustworthiness of data and provides a mechanism by which authors and institutions may be acknowledged for their contribution to the global knowledge graph. Nanopublications may be used to expose quantitative and qualitative data, as well as hypotheses, claims, and negative results that usually go unpublished. With nanopublications, it is possible to disseminate individual data as independent publications with or without an accompanying research article.

This document describes the structure of a nanopublication. It offers guidelines in their composition, implementation and use. It was produced by members of the Concept Web Alliance (CWA), an open collaborative community that is actively addressing the challenges associated with the production, management, interoperability and analysis of unprecedented volumes of data. We collaborate via Github

Primer

Nanopublications combines an assertion, the provenance of the assertion, and the provenance of the nanopublication into a single publishable and citable entity. Both the assertion and provenance are represented as RDF graphs.

The assertion graph of the nanopublication contains an assertion that is comprised of one or more RDF triples (subject-predicate-object tuples). Examples of assertions include:

:assertion {
    ex:trastuzumab ex:is-indicated-for ex:breast-cancer .
}
:assertion {
    ex:BRCA1-gene ex:is-involved-in ex:breast-cancer .
    ex:BRCA1-gene ex:encodes ex:BRCA1-protein .
    ex:BRCA1-protein ex:is-expressed-in ex:breast .
}

The provenance graph of the nanopublication contains one or more RDF triples that provide information about the assertion. A nanopublication MUST have a provenance graph identifier linked to the assertion graph identifier. Provenance means, ‘how this came to be’, and may include any statement that discusses how the assertion was generated, who generated it, when was it generated, where was the assertion obtained from, and any other similar information. Examples of assertional provenance include:

:provenance {
    :assertion prov:generatedAtTime "2012-02-03T14:38:00Z"^^xsd:dateTime  .
    :assertion prov:wasDerivedFrom :experiment . 
    :assertion prov:wasAttributedTo :experimentScientist .
}

The publication information graph contains one or more RDF triples that offer provenance information regarding the nanopublication itself. In this case, the subject of the triples in the publicationInfo graph MUST be the nanopublication URI and SHOULD contain attribution and timestamp. Examples of the nanopublication provenance include:

:pubInfo {
    ex:pub1 prov:wasAttributedTo ex:paul .
    ex:pub1 prov:generatedAtTime "2012-10-26T12:45:00Z"^^xsd:dateTime .
}

As far as applicable, provenance and publication info SHOULD use the provenance ontology PROV-O and/or ontologies that have a mapping to it, including Dublin Core and PAV.

The nanopublication itself receives its own identifier and refers to its parts via triples in the head graph:

:head {
    ex:pub1 a np:Nanopublication .
    ex:pub1 np:hasAssertion :assertion .
    ex:pub1 np:hasProvenance :provenance .
    ex:pub1 np:hasPublicationInfo :pubInfo .
}

Note that the head URI is different from the nanopublication URI. The head URI represents for the four triples shown above, whereas the nanopublication URI stands for the four graphs taken together (including the head graph).

Nanopublication Ontology

The structure of a nanopublication is defined by the following ontology using the Web Ontology Language (OWL). The namespace http://www.nanopub.org/nschema. Our namespace policy is that the current version of the ontology is always at this url. We suggest using TriG syntax for writing nanopublications.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix np: <http://www.nanopub.org/nschema#>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

<http://www.nanopub.org/nschema> a owl:Ontology ;
	dc:created "2012-10-26"^^xsd:date;
	dc:modified "2013-10-23"^^xsd:date;
	dc:description <http://www.nanopub.org/2013/WD-guidelines-20131215/> ;
	owl:priorVersion <http://www.nanopub.org/nschema-1.9> .

## A nanopublication should be associated with at most one assertion,
## provenance of assertion, and provenance of the nanopublication (publicationInfo)
## We use rdfg:Graph to denote a graph as used in the RDF 1.1. Trig syntax. 
## Each sub class of rdfg:Graph should be separate Named Graph

np:Nanopublication rdf:type owl:Class.
np:Assertion rdfs:subClassOf rdfg:Graph.
np:Provenance rdfs:subClassOf rdfg:Graph.
np:PublicationInfo rdfs:subClassOf rdfg:Graph.

np:hasAssertion a owl:FunctionalProperty.
np:hasAssertion rdfs:domain np:Nanopublication.
np:hasAssertion rdfs:range np:Assertion.

np:hasProvenance a owl:FunctionalProperty.
np:hasProvenance rdfs:domain np:Nanopublication.
np:hasProvenance rdfs:range np:Provenance.

np:hasPublicationInfo a owl:FunctionalProperty.
np:hasPublicationInfo rdfs:domain np:Nanopublication.
np:hasPublicationInfo rdfs:range np:PublicationInfo. 

Well-formed Nanopublications

The current ontology is loose in its specification to encourage adoption. We thus define an additional set of criteria to further constraint the usage of the ontology. These criteria may be adopted into further versions of the ontology.

A nanopublication MUST comply with all of the following criteria to be considered well-formed:

This is an example of a well-formed nanopublication in TriG notation:

@prefix : <http://example.org/pub1#> .
@prefix ex: <http://example.org/> .
@prefix np:  <http://www.nanopub.org/nschema#> .
@prefix prov: <http://www.w3.org/ns/prov#> . 
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

:head {
    ex:pub1 a np:Nanopublication .
    ex:pub1 np:hasAssertion :assertion .
    ex:pub1 np:hasProvenance :provenance .
    ex:pub1 np:hasPublicationInfo :pubInfo .
}

:assertion {
    ex:trastuzumab ex:is-indicated-for ex:breast-cancer .
}

:provenance {
    :assertion prov:generatedAtTime "2012-02-03T14:38:00Z"^^xsd:dateTime .
    :assertion prov:wasDerivedFrom :experiment .
    :assertion prov:wasAttributedTo :experimentScientist .
}

:pubInfo {
    ex:pub1 prov:wasAttributedTo ex:paul .
    ex:pub1 prov:generatedAtTime "2012-10-26T12:45:00Z"^^xsd:dateTime .
}

And the same example in the N-Quads format:

<http://example.org/pub1> <http://www.nanopub.org/nschema#hasAssertion> <http://example.org/pub1#assertion> <http://example.org/pub1#head> .
<http://example.org/pub1> <http://www.nanopub.org/nschema#hasProvenance> <http://example.org/pub1#provenance> <http://example.org/pub1#head> .
<http://example.org/pub1> <http://www.nanopub.org/nschema#hasPublicationInfo> <http://example.org/pub1#pubInfo> <http://example.org/pub1#head> .
<http://example.org/pub1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.nanopub.org/nschema#Nanopublication> <http://example.org/pub1#head> .
<http://example.org/trastuzumab> <http://example.org/is-indicated-for> <http://example.org/breast-cancer> <http://example.org/pub1#assertion> .
<http://example.org/pub1#assertion> <http://www.w3.org/ns/prov#generatedAtTime> "2012-02-03T14:38:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <http://example.org/pub1#provenance> .
<http://example.org/pub1#assertion> <http://www.w3.org/ns/prov#wasAttributedTo> <http://example.org/pub1#experimentScientist> <http://example.org/pub1#provenance> .
<http://example.org/pub1#assertion> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://example.org/pub1#experiment> <http://example.org/pub1#provenance> .
<http://example.org/pub1> <http://www.w3.org/ns/prov#generatedAtTime> "2012-10-26T12:45:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <http://example.org/pub1#pubInfo> .
<http://example.org/pub1> <http://www.w3.org/ns/prov#wasAttributedTo> <http://example.org/paul> <http://example.org/pub1#pubInfo> .

Query Template

To extract an entire nanopublication from a triple store, the following SPARQL query template can be used:

prefix np: <http://www.nanopub.org/nschema#>
prefix : <...>
select ?G ?S ?P ?O where {
  {graph ?G {: a np:Nanopublication}} union
  {graph ?H {: a np:Nanopublication {: np:hasAssertion ?G} union {: np:hasProvenance ?G} union {: np:hasPublicationInfo ?G}}}
  graph ?G {?S ?P ?O}
}

Integrity Key

The goal of the integrity key is to establish an identifier that can be used to check if a nanopublication has changed, thus enforcing the immutability of nanopublications. Trusty URIs [10] are the recommended way of assigning integrity keys to nanopublications. This is the exemplary nanopublication from Section 5 after generating and attaching a trusty URI:

@prefix this: <http://example.org/pub1.RAvVDzee5-fpWEFAvoa4Y3_7m9qIXJoKDTdBNbvWwnCiQ> .
@prefix sub: <http://example.org/pub1.RAvVDzee5-fpWEFAvoa4Y3_7m9qIXJoKDTdBNbvWwnCiQ#> .
@prefix ex: <http://example.org/> .
@prefix np:  <http://www.nanopub.org/nschema#> .
@prefix prov: <http://www.w3.org/ns/prov#> . 
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

sub:head {
	this: np:hasAssertion sub:assertion ;
		np:hasProvenance sub:provenance ;
		np:hasPublicationInfo sub:pubInfo ;
		a np:Nanopublication .
}

sub:assertion {
	ex:trastuzumab ex:is-indicated-for ex:breast-cancer .
}

sub:provenance {
	sub:assertion prov:generatedAtTime "2012-02-03T14:38:00Z"^^xsd:dateTime ;
		prov:wasAttributedTo sub:experimentScientist ;
		prov:wasDerivedFrom sub:experiment .
}

sub:pubInfo {
	this: prov:generatedAtTime "2012-10-26T12:45:00Z"^^xsd:dateTime ;
		prov:wasAttributedTo ex:paul .
}

Nanopublication Collections

(this is work in progress and under discussion; the schema has to be adapted to support this)

In some cases, in particular when exporting from existing datasets, a large number of nanopublications may have exactly the same provenance and publication information. In such cases, they can be represented more concisely as a nanopublication collection. A nanopublication collection has the same general structure as a nanopublication but with the following differences:

This is an example of a nanopublication collection:

@prefix : <http://example.org/dataset1#> .
@prefix ex: <http://example.org/> .
@prefix np:  <http://www.nanopub.org/nschema#> .
@prefix prov: <http://www.w3.org/ns/prov#> . 
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

:head {
    ex:dataset1 a np:NanopublicationCollection ; np:hasAssertionSet :assertionSet ;
        np:hasCollectionProvenance :provenance ; np:hasCollectionPubInfo :pubInfo .
    :assertionSet np:hasMember :1a, :2a, :3a .
}

:1a { ex:thingA ex:is-related-to ex:thingX }

:2a { ex:thingB ex:is-related-to ex:thingX }

:3a { ex:thingC ex:is-related-to ex:thingX }

:provenance {
    :assertionSet prov:wasDerivedFrom :experimentXYZ .
}

:pubInfo {
    ex:dataset1 prov:wasAttributedTo ex:paul .
    ex:dataset1 prov:generatedAtTime "2015-02-03T12:14:00Z"^^xsd:dateTime .
}

Collections are basically just a shorthand for representing nanopublications. The individual nanopublications can be extracted from a collection by applying the following well-defined rules for each of the assertion URIs:

The second nanopublication of the above example would therefore look as follows:

:2head {
    :2 a np:Nanopublication ; np:hasAssertion :2a ;
        np:hasProvenance :2prov ; np:hasPublicationInfo :2info .
}

:2a { ex:thingB ex:is-related-to ex:thingX }

:2prov {
    :assertionSet prov:wasDerivedFrom :experimentXYZ .
    :assertionSet np:hasMember :2a .
}

:2info {
    ex:dataset1 prov:wasAttributedTo ex:paul .
    ex:dataset1 prov:generatedAtTime "2015-02-03T12:14:00Z"^^xsd:dateTime .
    ex:dataset1 np:hasMember :2 .
}

Further Information

Please check http://www.nanopub.org for further guides and community information.

References