Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you look at a 2D surface, you directly observe all the values on that surface.

For a loss-function, the value at each point must be computed.

You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search.

For high dimensions, there's just way too many "points" to compute.



And remember, optimization problems can be _incredibly_ high-dimensional. A 7B parameter LLM is a 7-billion-dimensional optimization landscape. A grid-search with a resolution of 10 (ie 10 samples for each dimension) would requre evaluating the loss function 10^(7*10^9) times. That is, the number of evaluations is a number with 7B digits.


What about sampling at low resolution? If the hills and valleys aren't too close together, this should give a good indication of where the global minimum is.


> If the hills and valleys aren't too close together

That’s a big “if”.


At least it will catch those valleys that are wider than the sampling resolution.


Yeah. The problem is that the number of samples needed is exponential in the dimension, so in a 1000-dimensional space, you won’t even be able to subdivide it into 2×…×2.


Damn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: