71

Gaussian Processes for Machine Learning (2006) [pdf]

This is the definitive reference on the topic! I have some notes on the topic as well, if you want something concise, but that doesn't ignore the math [1].

[1] https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs#gaus...

4 days agoabhgh

These are very cool, thanks. Do you know what kind of jobs are more likely to require Gaussian process expertise? I have experience in using GP for surrogate modeling and will be on the job market soon.

Also a resource I enjoyed is the book by Bobby Gramacy [0] which, among other things, spends a good bit on local GP approximation [1] (and has fun exercises).

[0] https://bobby.gramacy.com/surrogates/surrogates.pdf

[1] https://arxiv.org/abs/1303.0383

4 days agoC-x_C-f

Aside from secondmind [1] I don't know of any companies (only because I haven't looked)... But if I had to look for places with strong research culture on GPs (I don't know if you're) I would find relevant papers on arxiv and Google scholar, and see if any of them come from industry labs. If I had to take a guess on Bayesian tools at work, maybe the industries to look at would be advertising and healthcare.I would also look out for places that hire econometricists.

Also thank you for the book recommendation!

[1] https://www.secondmind.ai/

4 days agoabhgh

My take is that the Rasmussen book isn't especially approachable, and that this book has actually held back the wider adoption of GPs in the world.

The book has been seen as the authoritative source on the topic, so people were hesitant to write anything else. At the same time, the book borders on impenetrable.

3 days agotimdellinger

Why would you learn Gaussian Processes today? Is there any application where they are still leading and have not been superseeded by Deep NNets?

3 days agoheinrichhartman

I would argue there are more applications overall where Gaussian processes are superior, as most scientific applications have smaller data sets. Not everything has enough data to take advantage of feature learning in NNs. They are generally reliable, interpretable, and provide excellent uncertainty estimates for free. They can be made to be multiscale, achieving higher precisions as a function approximator than most other methods. Plus, they can exhibit reversion to the prior when you need that.

Another example where it is used is for emulating outputs of an agent-based model for sensitivity analyses.

3 days agohodgehog11

Basically they're incredibly useful for any situation where you have "medium" data where you don't have enough data to properly train a NN (which are very data hungry in practice) but enough data that you're not really exploiting all the information using a more traditional approach.

GPs essentially allow you to get a lot of the power of a NN while also being able to encode a bunch of domain knowledge you have (which is necessary when you don't have enough data for the model to effectively learn that domain knowledge). On top of that, you get variance estimates which are very important for things like forecasting.

The only real draw back to GPs is that they absolutely do not fit into the "fit/predict" paradigm. Properly building a scalable GP takes a more deeper understanding of the model than most cases. The mathematical foundations required to really understand what's happening when you train a sparse GP greatly exceed what is required to understand a NN, and on top of that there is a fair amount of practical insight into kernel development that is required as well. But the payoff is fantastic.

It's worth recognizing that, once you realize that "attention" is really just kernel smoothing, transformers are essentially learning sophisticated stacked kernels, so ultimately share a lot in common with GPs.

3 days agoroadside_picnic

AFAIK state of the art is still a mix of new DNN and old school techniques. Things like parameter efficiency, data efficiency, runtime performance, and understandability would factor into the decision making process.

3 days agocjbgkagh

Bayesian optimization of, say, hyperparameters is the canonical modern usage in my view, and there are other similar optimization problems where it's the preferred approach.

3 days agotimdellinger

To reduce the risk of being a lemming. It is in everyone's interests for some people not to follow the herd / join the plague of locusts.

3 days agoxpe

you can combine deep NNets with GPs, e.g. here https://arxiv.org/abs/1511.02222

So it isn't a matter of which is better. If you ever need to imbue your deep nets with good confidence estimates, it is definitely worth checking out.

2 days agoysaatchi

Stationary GPs are just stochastic linear dynamical systems. (Not just the Matern covariance kernel)

3 days agomemming