Q3-D3-LSA

SFB 649 Discussion Paper 2016-049

48 Pages Posted: 18 Nov 2016

See all articles by Lukas Borke

Lukas Borke

Humboldt University of Berlin

Wolfgang Karl Härdle

Blockchain Research Center Humboldt-Universität zu Berlin; Charles University; National Yang Ming Chiao Tung University; Asian Competitiveness Institute

Date Written: November 17, 2016

Abstract

QuantNet is an integrated web-based environment consisting of different types of statistics-related documents and program codes. Its goal is creating reproducibility and offering a platform for sharing validated knowledge native to the social web. To increase the information retrieval (IR) efficiency there is a need for incorporating semantic information. Three text mining models will be examined: vector space model (VSM), generalized VSM (GVSM) and latent semantic analysis (LSA). The LSA has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between documents. Our results show that different model configurations allow adapted similarity-based document clustering and knowledge discovery. In particular, different LSA configurations together with hierarchical clustering reveal good results under M3 evaluation. QuantNet and the corresponding Data-Driven Documents (D3) based visualization can be found and applied under quantlet. The driving technology behind it is Q3-D3-LSA, which is the combination of “GitHub API based QuantNet Mining infrastructure in R”, LSA and D3 implementation.

Keywords: QuantNet, D3, GitHub API, text mining, document clustering, similarity, semantic web, generalized vector space model, LSA, visualization

JEL Classification: C87, C88, G17

Suggested Citation

Borke, Lukas and Härdle, Wolfgang Karl, Q3-D3-LSA (November 17, 2016). SFB 649 Discussion Paper 2016-049, Available at SSRN: https://ssrn.com/abstract=2871111 or http://dx.doi.org/10.2139/ssrn.2871111

Lukas Borke

Humboldt University of Berlin ( email )

Unter den Linden 6
Berlin, AK Berlin 10099
Germany

Wolfgang Karl Härdle (Contact Author)

Blockchain Research Center Humboldt-Universität zu Berlin ( email )

Unter den Linden 6
Berlin, D-10099
Germany

Charles University ( email )

Celetná 13
Dept Math Physics
Praha 1, 116 36
Czech Republic

National Yang Ming Chiao Tung University ( email )

No. 1001, Daxue Rd. East Dist.
Hsinchu City 300093
Taiwan

Asian Competitiveness Institute ( email )

Singapore

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
133
Abstract Views
1,455
Rank
388,166
PlumX Metrics