header

Dimensional Enrichment of Statistical Linked Open Data

46 Pages Posted: 25 Jun 2018 Publication Status: Accepted

See all articles by Jovan Varga

Jovan Varga

Polytechnic University of Catalonia (UPC) - BarcelonaTech

Alejandro Vaisman

Instituto Tecnologico de Buenos Aires (ITBA)

Oscar Romero

Polytechnic University of Catalonia (UPC) - BarcelonaTech

Lorena Etcheverry

University of the Republic (Uruguay) - Facultad de Ingenieria

Torben Pedersend

Aalborg University

Christian Thomsen

Aalborg University

Abstract

On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.

Keywords: Linked Open Data, Multidimensional Data Modeling, OLAP, Semantic Web

Suggested Citation

Varga, Jovan and Vaisman, Alejandro and Romero, Oscar and Etcheverry, Lorena and Pedersend, Torben and Thomsen, Christian, Dimensional Enrichment of Statistical Linked Open Data (October 2016). Available at SSRN: https://ssrn.com/abstract=3199266 or http://dx.doi.org/10.2139/ssrn.3199266

Jovan Varga (Contact Author)

Polytechnic University of Catalonia (UPC) - BarcelonaTech ( email )

Jordi Girona
Barcelona
Spain

Alejandro Vaisman

Instituto Tecnologico de Buenos Aires (ITBA)

Av Eduardo Madero 399
Buenos Aires, 6393-4800
Argentina

Oscar Romero

Polytechnic University of Catalonia (UPC) - BarcelonaTech

Jordi Girona
Barcelona
Spain

Lorena Etcheverry

University of the Republic (Uruguay) - Facultad de Ingenieria

Av. 18 de Julio 1824-1850
11200 Montevideo
Uruguay

Torben Pedersend

Aalborg University

Fredrik Bajers Vej 7E
Aalborg, DK-9220
Denmark

Christian Thomsen

Aalborg University

Fredrik Bajers Vej 7E
Aalborg, DK-9220
Denmark

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
50
Abstract Views
518
PlumX Metrics