The other day, a friend of mine and I were working together to add some tests to a library of utilities he wanted to put together. He was keen to get some experience with my testing framework, so we decided to do some work on it together. He does a lot of parallel programming, and so some of his utilities involved some parallel algorithms. Once we started writing a test for one of these parallel utilities, we ran smack into an almost philosophical question. What does it mean to test parallel code, where the test itself depends on which image (thread) is executing it?
First, some background. As of Fortran 2008 and 2018, parallelism is a part of the language. The model of execution is that, when running parallel code, multiple copies of a program (images) are started, and any communication between images is done through a facility called coarrays, or specific intrinsics added to the language. This model of execution is called Single Program Multiple Data, or SPMD programming.
For example, when the program is executing, each image has its own, private copy of every variable. Thus, a variable can obtain different values on different images. However, by declaring a variable as a coarray, one image may access that variable on any other image in such a way that requires no coordination between the images.
Another way in which images may communicate between each other is through some of the new intrinsic procedures. One example which is relevant to the test we were writing is the
co_sum subroutine. Given that a numeric variable may take on different values on different images, at some point we might like to obtain the sum of that variable across all images. The
co_sum subroutine allows us to obtain this answer, with the added capability that we may specify that only one specific image should actually obtain the result. This is exactly the kind of functionality my friend was trying to test with regards to one of his utilities. Thus, on a specified image, the call should obtain the answer, but on all other images the call should leave the value unchanged. Therefore the test is testing different things depending on which image is executing it.
Up to this point, I had not given much consideration to testing parallel code and the kinds of complexities that entails. The answer that we came to kept things so simple and fit in so well with the parallel execution model of the language that it actually required almost no change to my testing framework, and the modifications to make it supported in a user friendly way were small and simple.
The answer is this: up to the point of executing the tests, the program is deterministic, and each image will come to the exact same state – each variable will contain exactly the same value on each image. The execution of the tests represents a branching point in the program; a point of non-determinism. But the model of parallel execution in Fortran supports this perfectly. The variables containing the results of the tests simply take on different values on different images, and each image progresses through the program reporting its own results. The only change actually required is that we should ensure that each image has actually had a chance to finish executing the tests and reporting the results before stopping the program. And in order to make the outputs more user friendly, we only report what tests will be run from a single image (remember up to this point we know that each image is exactly the same), and ensure that only one image reports its results at a time so that the results aren’t mixed together. The design of my testing framework is such that this required me to make fewer than a dozen changes in only one procedure and absolutely no changes to any interfaces.
You can find his library here and my testing framework here. We’ve also teamed up to provide training courses together, so if you’re interested in learning about testing, parallelism, and Fortran 2018 please contact us here or here.