Image Recognition
The learning-rate scheduler for LeNet-5, η = 0.01, 0.005, 0.001 for epochs = [0, 100), [100, 150), [150, 200], respectively. For Tree-3 (K = 15, M = 80) and 10 Tree-3 (K = 15, M = 80), η decays by a factor of 0.6 every 20 epochs. The learning rate scheduler was the same as for Tree-3 (K = 15, M = 16), on the CIFAR-10 dataset. The gray squares in the first layer represent convolutional hidden units, ({\sigma }{Conv}), and max-pooling hidden units that are equal zero, except several denoted by RGB dots. The non-zero tree output hidden units, ({\sigma }{Tree}), are denoted by black dots.