Data, Science, and Machine Learning

View project on GitHub

Description

Throughout the past decade we have witnessed an unprecendented increase in the amount of data that is being produced about every facet of human existance. However, the rise in Data Availability has not resulted in a similar increase in the hability of researchers and stake holders to handle and extract information from the data. The goal of this course is to give an overview on the fundamentals of Data Science and Machine Learning as a first step towards building great Data Literacy among the students.

Venue

All lectures will take place in Room 5 of the “Antoni M. Alcover Building” of the “at Campus Universitat de les Illes Balears” in Palma de Mallorca, Spain.

Schedule

  • Sept 4 - 12:15-13:10 - Basic Statistics and Probability
    • Big Data vs Data Science
    • Descriptive Statistics
    • Anscombe’s Quartet
    • Quantiles
    • Correlations
    • Definition of Probability
    • Bayes Theorem
  • Sept 5 - 12:15-13:10 - Bayesian and Maximum Likelihood Analyses
    • Naive Bayes Classifier
    • Language Detection
    • Central Limit Theorem
    • Maximum Likelihood Estimation
    • Binomial Distribution
    • Beta Distribution
    • A/B Testing
    • p-values
    • Bonferoni Correction
    • Simpson’s Paradox
  • Sept 6 - 12:15-13:10 - Unsupervised Learning
    • Types of Machine Learning
    • Data Normalization
    • Clustering
    • K-Means
    • Silhouette Analysis
    • Expectation Maximization
    • Gaussian Mixture Models
    • Principal Component Analysis
  • Sept 7 - 12:15-13:10 - Supervised Learning
    • Regression vs Classification
    • Overfitting
    • Bias-Variance Tradeoff
    • K-Nearest Neighbors
    • Validation dataset
    • Perceptron
    • Activation Functions
    • Back Propagation
    • Support Vector Machines
  • Sept 8 - 12:15-13:10 Neural Network Applications
    • MNIST Digit recognition
    • Interpretability
    • word2vec
    • Word Analogies
    • Deep Learning Architectures

    Instructor

    Bruno Gonçalves is a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. He has a strong expertise in using large scale datasets for the analysis of human behavior. After completing his joint PhD in Physics, MSc in C.S. at Emory University in Atlanta, GA in 2008 he joined the Center for Complex Networks and Systems Research at Indiana University as a Research Associate. From September 2011 until August 2012 he was an Associate Research Scientist at the Laboratory for the Modeling of Biological and Technical Systems at Northeastern University. Since 2008 he has been pursuing the use of Data Science and Machine Learning to study human behavior. By processing and analyzing large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner.

    [web] [twitter] [github] [LinkedIn]