Parsing GoPubMed RDF with Sparta

Brad Chapman bio photo By Brad Chapman Comment

GoPubMed provides a semantic layer on top of PubMed articles, linking them to GO and other ontologies and providing this metadata as RDF.

Starting out with RDF can be a bit hairy when dealing with the namespaces that prefix many of the URIs. Sparta is a python library that sits on top of rdflib providing nice shortcuts for accessing terms. For instance, if you have the following RDF namespace and term:

[sourcecode language="xml"]


Example
[/sourcecode]

Instead of having to access this attribute with http://purl.org/dc/elements/1.1/title in rdflib, sparta allows you to do so with the shortcut dc_title. Here is a short example with GoPubMed which prints out a PubMed record and its title:

[sourcecode language="python"]
import urllib
from rdflib import ConjunctiveGraph as Graph
import sparta

url = 'http://www.gopubmed.org/GoMeshPubMed/gomeshpubmed/Search/' + \
'RDF?q=18463287&type=RdfExportAll'
gopubmed_handle = urllib.urlopen(url)
graph = Graph()
graph.parse(gopubmed_handle)
gopubmed_handle.close()

graph_subjects = list(set(graph.subjects()))
sparta_factory = sparta.ThingFactory(graph)
for subject in graph_subjects:
sparta_graph = sparta_factory(subject)
print subject, [unicode(t) for t in sparta_graph.dc_title][0]
[/sourcecode]

The Sparta library had not yet been updated to work with rdflib version 2.4, and also needed a few smaller fixes in matching URIs to attribute names. These are fixed in an updated version available here.

comments powered by Disqus