And remember, optimization problems can be _incredibly_ high-dimensional. A 7B parameter LLM is a 7-billion-dimensional optimization landscape. A grid-search with a resolution of 10 (ie 10 samples for each dimension) would requre evaluating the loss function 10^(7*10^9) times. That is, the number of evaluations is a number with 7B digits.
What about sampling at low resolution? If the hills and valleys aren't too close together, this should give a good indication of where the global minimum is.
Yeah. The problem is that the number of samples needed is exponential in the dimension, so in a 1000-dimensional space, you won’t even be able to subdivide it into 2×…×2.
For a loss-function, the value at each point must be computed.
You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search.
For high dimensions, there's just way too many "points" to compute.