April 6th 2017
Organizations (via Meetup)
Ways to enter into data science
Typical difficulties:
Reference: L. Breiman's 'Statistical Modelling: the two cultures'
Example: role of multicolinearity
Example: churn prediction
Validation of models via cross validation: accuracy in unseen observations
"Other things being equal, simpler models are preferred"
Vapnik and Chervonenkis, 70's-90's: statistical learning theory. With high probability
\[\hat{\mbox{err}} \leq \mathbb E[ \mbox{err}] + I \] Where \(I\):
increases with complexity of the family used for modelling
For any given learning algorithm, we can build a probability distribution that learns arbitrarily slow.
Stats
ML
For instance:
Reference: 'Elements of statistical learning' Friedman, Hastie and Tibshirani
'Bayesian' statistics:
Reference: 'Machine learning, a probabilistic perspective' K.Murphy
For instance LDA (Latent Dirichlet Allocation) for topic modelling (documents)
Reference: 'Causality' Judea Pearl
Example: churn and selection bias
Objective: find patterns, understand causations
Team implication:
Reference: 'Statistical Engineering: An Idea Whose Time Has Come?', Hoerl and Snee
Increasing demand of scientific profiles: ability to deal with complex problems.
Complexity in definition of needs:
Type III error: give the right answer to the wrong question.
Are all the statisticians working at Universitat Autonoma de Barcelona…
...vallesians?