A Cross-Verified Database of Notable People, 3500bc-2018ad

62 Pages Posted: 2 Mar 2021 Last revised: 31 Mar 2021

See all articles by Morgane laouenan

Morgane laouenan

Université Paris I Panthéon-Sorbonne - Centre d'Economie de la Sorbonne (CES)

Jean-Benoît Eyméoud

Institut d'Etudes Politiques de Paris (Sciences Po); Banque de France

Olivier Gergaud

Kedge - Bordeaux Business School

Palaash Bhargava

New York University (NYU) - New York University, Abu Dhabi

Guillaume Plique

Sciences Po; Medialab

Etienne Wasmer

New York University (NYU) - New York University, Abu Dhabi; Centre for Economic Policy Research (CEPR)

Date Written: March 2021

Abstract

We add to the literature on notable individuals (famous, prominent, distinguished) in collecting first a massive amount of data from various editions of Wikipedia and Wikidata along with deduplication techniques; and then using these partially overlapping sources to cross-verify each retrieved information. This strategy results in a cross-verified database of 2.2 million individuals, including a third who are not present in the English edition of Wikipedia. An extension to 4.7 million entries is currently not recommended given the inaccuracy of the information and discrepancies between Wikidata and other sources. A non-negligible fraction of newly-added individuals were collected from non-English editions of Wikipedia. We adopt a social science approach: data collection is driven by specific social questions on gender, economic and cul- tural development and quantitative exploration of cultural trends, that we document in this paper. A sample of 100,000 individuals is available here http://medialab.github.io/bhht-datascape, together with the most recent version of this paper.

JEL Classification: N01, N9, R00

Suggested Citation

laouenan, Morgane and Eyméoud, Jean-Benoît and Gergaud, Olivier and Bhargava, Palaash and Plique, Guillaume and Wasmer, Etienne, A Cross-Verified Database of Notable People, 3500bc-2018ad (March 2021). CEPR Discussion Paper No. DP15852, Available at SSRN: https://ssrn.com/abstract=3795248

Morgane Laouenan (Contact Author)

Université Paris I Panthéon-Sorbonne - Centre d'Economie de la Sorbonne (CES) ( email )

106-112 Boulevard de l'hopital
106-112 Boulevard de l'Hôpital
Paris Cedex 13, 75647
France

Jean-Benoît Eyméoud

Institut d'Etudes Politiques de Paris (Sciences Po) ( email )

27 rue Saint-Guillaume
Paris Cedex 07, 75337
France

Banque de France ( email )

Paris
France

Olivier Gergaud

Kedge - Bordeaux Business School ( email )

Domaine de Luminy - BP 921
BP 921
Marseille, PACA 13288
France

Palaash Bhargava

New York University (NYU) - New York University, Abu Dhabi ( email )

PO Box 129188
Abu Dhabi
United Arab Emirates

Guillaume Plique

Sciences Po ( email )

28 Rue des Saint-Peres
Paris, Paris 75006
France

Medialab ( email )

Etienne Wasmer

New York University (NYU) - New York University, Abu Dhabi ( email )

PO Box 129188
Abu Dhabi
United Arab Emirates

Centre for Economic Policy Research (CEPR)

London
United Kingdom

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
0
Abstract Views
472
PlumX Metrics