> It's not a weird coincidence that helps ML; it's inherent in the problem. This...

> It's not a weird coincidence that helps ML; it's inherent in the problem.

This depends on the application. If you are trying to design new proteins for something, unconstrained by evolution, you may want a method that does well on novel inputs.

> Same with drug design

Not by a long shot. There are maybe on the order of 10,000 known 3D protein-ligand structures. Meanwhile, when doing drug discovery, people scan drug libraries with millions to billions of molecules (using my software, oftentimes). These molecules will be very poorly represented in the training data.

The theoretical chemical space of interest to drug discovery is bigger still, with on the order of 1e60 molecules in it: https://en.wikipedia.org/wiki/Chemical_space