A Google trends spatial clustering approach for a worldwide Twitter user geolocation

User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling...

ver descrição completa

Detalhes bibliográficos
Autor principal: Zola, Paola (author)
Outros Autores: Ragno, Costantino (author), Cortez, Paulo (author)
Formato: article
Idioma:eng
Publicado em: 2020
Assuntos:
Texto completo:http://hdl.handle.net/1822/66815
País:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/66815
Descrição
Resumo:User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.