It doesn't necessarily seem that surprising to me. If I can be *really* hand-wav...

blackbear_ · on Oct 20, 2020

I agree this is one of the reasons for the success of neural networks. But it was not obvious at first, and it still is quite hard to formalize and explain in mathematical terms. That's what I meant with "surprising".

Darkharbourzz · on Oct 20, 2020

No, it's not constrained at all. In fact, even single hidden layer networks with nearly arbitrary activation functions are universal approximators (Hornik et al. 1989). Polynomials are also universal approximators (Weierstrass).

mumblemumble · on Oct 20, 2020

I'm not trying to say that neural networks are inherently constrained. I'm saying that, in typical usage, they tend to be used a certain way that I believe introduces some useful constraints. You can use a single hidden layer and an arbitrary activation functions, but, in practice, it's a heck of a lot more common to use multiple hidden layers and tanh.

It's worth noting that neural networks didn't take off with Hornik et al. style simple-topology-complex-activation-function universal approximators. They took off a decade or so later, with LeCun-style complex-topology-simple-activation-function networks.

That arguably suggests that the paper is of more theoretical than practical interest. It's also worth noting that one of the practical challenges with a single hidden layer and a complex activation function is that it's susceptible to variance. Just like polynomial regression.

bonoboTP · on Oct 21, 2020

This kind of stuff is called inductive bias and is a sexy topic nowadays.