This isn't a criticism - I'm just curious to hear people's thoughts on this. Whe...

benschulz · on July 16, 2021

My understanding is that it has been manually tested. I.e. it has produced correct results to previously intractable problems. I'm not sure how much automated testing would add at that point.

dmos62 · on July 16, 2021

Unit testing usually isn't easily replaced by manual testing. If you have, for example, 3 units that can be in 2 different modes each, that's 2^3 different combinations, but only 2*3 unit modes. Testing the end result is more work than testing the units.

dekhn · on July 16, 2021

Discovery science is different from web software engineering. Most discovery scientists use manual testing, not unit testing. Very few actually do integration tests or system tests (this is something I'm trying to change).

And, given the external results of the application, it's unclear to me how much additional value would come from a rigorous testing system.

dmos62 · on July 16, 2021

> Very few actually do integration tests or system tests (this is something I'm trying to change).

Care to expand on what you're trying to do?

dekhn · on July 16, 2021

Sure, I'm trying to take the idea of merging continuous integration with workflow/pipelines. It's all stuff that I learned at Google and is non-proprietary. The idea is have presubmit checks that invoke a full instance of a complex pipeline, but on canned (synthetic or pseudoanonymized or somehow not directly connected to the prod system) data, as an integration test. This catches many errors that would be hard to debug later in a prod workfflow.

In a sense, I see software testing/big web data and modern large scale data processing in science as a continuum and I want to bring the practices from the big web data and testing fields to bear on science pipelines.

dmos62 · on July 16, 2021

Apart from a shift in mental attitude, is it primarily about getting a dataset for the integration test?

dekhn · on July 16, 2021

also making sure the testing is hermetic (not breaking prod) and all the components are actually reproducible.

allyourhorses · on July 16, 2021

Prior to this model, protein folding hadn't seen significant advancements in a decade or more. Worrying about the lack of tests in a first of its kind model is very much akin to complaining about the choice of font in a user manual for the world's first warp drive. I understand you're attempting to frame the problem in terms in things you know, but trying to weigh down pioneering research with professional development ceremony is very much counterproductive. The 'missing' ceremony would not have contributed to the strength of AlphaFold's result, the model's only purpose was to compete within the context of an existing validation framework.

plutonorm · on July 16, 2021

Because it passes the huge number of integration tests.

miltondts · on July 16, 2021

Research code is highly volatile: the details and architecture changes a lot. It is much more important to invest the time into writing more experimental code and validate it with e2e functional tests that don't need to change, than to constantly having to rewrite the code and the tests.