Location inference from social media data

The multifaceted nature of user-generated data, along with its geographic component, is nowadays being exploited to better understand social dynamics and propagation of information. Social media activities can be associated with both an explicit (e.g., in Twitter, metadata like the user profile location and the GPS coordinates of the device from which the activity is performed)  and an implicit (i.e., inferred from data, with a variable degree of confidence) geographic information component. Unfortunately, explicit tagging is used only in a small percentage of tweets, due to the fact that location services of mobile devices are often disabled or switched off to save battery, thus inferring additional tweeting position is an important issue to make microblog analysis more effective.

In the geospatial context, the geographical component is no more enough to assess connections between elements. In particular, geographical objects can be aggregated by position, category, density, and many other dimensions.

Embeddings allow representing the content of geographical objects in terms of vectors in a high dimensional space. In such a space, the distance reflects the “semantic distance” that holds among objects. This novel representation opens the door to the integration of geospatial ontologies into machine learning algorithms. The aim of this project is on inferring microblog messages location (i.e., the position of the user when the tweet was sent) rather than the user’s home location, investigating the coherence between geographical objects and embeddings. We claim that using embeddings to retrieve the semantic closest terms allows us to geolocate at sub-city level microblog messages. To this aim, we rely on several information, including toponyms contained in texts, social relationships and interactions between users, and text content.

Along with applications, we work on the theoretical side to deeply understand embeddings and proposing alternative algorithms to embed data.

 
 
Location inference from social media data