header

An Empirical Survey of Linked Data Conformance

42 Pages Posted: 23 Jun 2018 Publication Status: Accepted

See all articles by Aidan Hogan

Aidan Hogan

University of Galway - Digital Enterprise Research Institute (DERI)

Jürgen Umbrich

University of Galway - Digital Enterprise Research Institute (DERI)

Andreas Harth

Karlsruhe Institute of Technology - Institute of Applied Informatics and Formal Description Methods (AIFB)

Richard Cyganiak

University of Galway - Digital Enterprise Research Institute (DERI)

Axel Polleres

University of Galway - Digital Enterprise Research Institute (DERI)

Stefan Decker

University of Galway - Digital Enterprise Research Institute (DERI)

Abstract

There has been a recent, tangible growth in RDF published on the Web in accordance with the Linked Data principles and best practices, the result of which has been dubbed the "Web of Data". Linked Data guidelines are designed to facilitate ad hoc re-use and integration of conformant structured data—across the Web—by consumer applications; however, thus far, systems have yet to emerge that convincingly demonstrate the potential applications for consuming currently available Linked Data. Herein, we compile a list of fourteen concrete guidelines as given in the "How to Publish Linked Data on the Web" tutorial. Thereafter, we evaluate conformance of current RDF data providers with respect to these guidelines. Our evaluation is based on quantitative empirical analyses of a crawl of ~4 million RDF/XML documents constituting over 1 billion quadruples, where we also look at the stability of hosted documents for a corpus consisting of nine monthly snapshots from a sample of 151 thousand documents. Backed by our empirical survey, we provide insights into the current level of conformance with respect to various Linked Data guidelines, enumerating lists of the most (non-)conformant data providers. We show that certain guidelines are broadly adhered to (esp. use HTTP URIs, keep URIs stable), whilst others are commonly overlooked (esp. provide licencing and human-readable meta-data). We also compare PageRank scores for the data-providers and their conformance to Linked Data guidelines, showing that both factors negatively correlate for guidelines restricting use of RDF features, while positively correlating for guidelines encouraging external linkage and vocabulary re-use. Finally, we present a summary of conformance for the different guidelines, and present the top-ranked data providers in terms of a combined PageRank and Linked Data conformance score.

Keywords: semantic web, linked data, web of data, pagerank, web publishing, data quality, web science

Suggested Citation

Hogan, Aidan and Umbrich, Jürgen and Harth, Andreas and Cyganiak, Richard and Polleres, Axel and Decker, Stefan, An Empirical Survey of Linked Data Conformance (2012). Available at SSRN: https://ssrn.com/abstract=3198962 or http://dx.doi.org/10.2139/ssrn.3198962

Aidan Hogan (Contact Author)

University of Galway - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Jürgen Umbrich

University of Galway - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Andreas Harth

Karlsruhe Institute of Technology - Institute of Applied Informatics and Formal Description Methods (AIFB) ( email )

Kaiserstraße 12
Karlsruhe, Baden Württemberg 76131
Germany

Richard Cyganiak

University of Galway - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Axel Polleres

University of Galway - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Stefan Decker

University of Galway - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
52
Abstract Views
631
PlumX Metrics