As modern technologies gradually come to permeate our lives, our use of them becomes second nature as the ``real’’ world naturally extends to include the online on. Even though the cognitive load imposed upon us to interact with state of the art technologies decreases, the amount of information that is collected and processed in the background can only increase. Such records provide a unique view of how we interact with these systems and, through them, how we interact with each other. In this tutorial we introduce the students to tools and techniques designed to harness this wealth of data with a special emphasis on datasets that reflect human behavior, interactions and collaborations. In particular, we will cover online social networks like Twitter and Foursquare, collaboration platforms such as Wikipedia and Github and, finally, how to scrape and extract information from generic webpages.
The massification of smartphone devices with Internet and GPS capabilities has once again opened the floodgates of innovation. As a result, most of the datasets we consider are enriched with detailed location information either in the form of precise GPS coordinates or as free text. In the second part of this tutorial we will introduce the participants to geolocation and spatial analysis and visualization techniques with practical examples and case studies using the data collected. To finalize, a full pipeline from data collection to result visualization will be described in detail.
All lectures will take place in “Sala Riunioni” of the “Dipartimento de Informatica” of the “Università degli Studi di Torino” in Turin, Italy.
Bruno Gonçalves is a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. He has a strong expertise in using large scale datasets for the analysis of human behavior. After completing his joint PhD in Physics, MSc in C.S. at Emory University in Atlanta, GA in 2008 he joined the Center for Complex Networks and Systems Research at Indiana University as a Research Associate. From September 2011 until August 2012 he was an Associate Research Scientist at the Laboratory for the Modeling of Biological and Technical Systems at Northeastern University. Since 2008 he has been pursuing the use of Data Science and Machine Learning to study human behavior. By processing and analyzing large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner.