When you look at a 2D surface, you directly observe all the values on that surfa...

samsartor · 2025-08-17T16:34:37 1755448477

And remember, optimization problems can be _incredibly_ high-dimensional. A 7B parameter LLM is a 7-billion-dimensional optimization landscape. A grid-search with a resolution of 10 (ie 10 samples for each dimension) would requre evaluating the loss function 10^(7*10^9) times. That is, the number of evaluations is a number with 7B digits.

cubefox · 2025-08-18T11:14:26 1755515666

What about sampling at low resolution? If the hills and valleys aren't too close together, this should give a good indication of where the global minimum is.

xigoi · 2025-08-18T11:55:13 1755518113

> If the hills and valleys aren't too close together

That’s a big “if”.

cubefox · 2025-08-18T13:24:31 1755523471

At least it will catch those valleys that are wider than the sampling resolution.

xigoi · 2025-08-18T13:35:14 1755524114

Yeah. The problem is that the number of samples needed is exponential in the dimension, so in a 1000-dimensional space, you won’t even be able to subdivide it into 2×…×2.

cubefox · 2025-08-18T14:46:16 1755528376

Damn.