Who wants to spend a million?
okay, fine $1,191,700
I'm reading this paper What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation by Vitaly Feldman, Chiyuan Zhang (from Apple and Google Brain respectively). The website for the project is here. There's a metric that they wish to compute on each data point in a training set of a model (memorization and influence). In the "pure" form, computing this metric requires training a model for every point - in other words, if you have 10,000 points, and you'd like to compute memorization for each of them, you need to train 10,000 models.
For models that take any more than a half-a-second and datasets with more than 50k points, this metric is computationally and financially infeasible. However, in this paper, they compute an estimator for computing memorization and influence (bottom of page 5). I was curious - just how much does their estimator cost to run?
Combining what they write on page 5 in their experimental design and in the appendix (page 17); they train:
4000 models on MNIST (NVidia® Tesla P100 GPU)
"For MNIST ... We train the models for 30 epochs"
"Our training time on MNIST is about 7 seconds per epoch."
4000 models on CIFAR-100 (NVidia® Tesla P100 GPU)
"On CIFAR-100, the training time per epoch is about: 1 minute and 30 seconds for ResNet50,..."
"for CIFAR-100 training... During the 160 training epochs,"
2000 models on Imagenet, ("8 P100 GPUs with single-node multi-GPU data parallelization. ")
"For ImageNet, ... During the 100 training epochs"
"Our ImageNet training jobs takes about half an hour for each training epoch."
Looking at Google Cloud's Pricing, the price for a single P100 GPU is between $0.43 per GPU and $1.46 per GPU (though, I'd guess most likely they were paying something like $0.657 per GPU).
This ends up being:
(4000 models * 30 epochs * 7 seconds/epoch) = 840000 seconds = 233 hours
Low-end: 0.43 * 233 = 100$
High-end: 1.46 * 233 = 340 $
(4000 models * 160 epochs * 90 seconds/epoch) = 57,600,000 seconds = 16,000 hours
Low-end: 0.43 * 16,000 = $ 6,880
High-end: 1.46 * 16,000 = $ 23,360
(2000 models * 100 epochs * (60*30) seconds/epoch) = 360,000,000 seconds = 100,000 hours on 8 P100s
Low-end: 0.43 $* 100,000 hr * 8 gpu = $ 344,000
High-end: 1.46 $ * 100,000 hr * 8 gpu = $ 1,168,000
Low-end: 100 + 6,880 + 344,000 = $350,980
High-end: 340 + 23,360 + 1,168,000 = $1,191,700
In short, this paper costs between 350k and 1.2M to create these experiments (not to mention any other similar ones that were not included in the paper).
The cost to fund a PhD student for 1 year costs the university ~100k. So, in other words, this paper cost roughly 3 to 12 PhD-years. (Though, to be fair : that would not be enough to pay for Theodore Streleski’s 19 year PhD, which ended in him murdering his advisor for not letting him graduate.)