run.bash holds a persistent refcount file in the shared state directory
so multiple concurrent tests can share a single container. If a prior
test_all run is killed (e.g. Ctrl-C), the count never reaches zero on
the next run and the container is never stopped - forcing manual
cleanup.
Three fixes, all in fstest/testserver/init.d/run.bash:
- On start, if the refcount is non-zero but no container is running,
treat it as zero. Stops leaking through future runs.
- reset now rm -rfs RUN_ROOT (the per-server state) instead of
RUN_BASE (the shared parent) which was clobbering sibling services.
- New force-stop verb unconditionally stops the container and zeroes
the refcount. This is the primitive that the Go-side cleanup sweep
will call at end-of-run.
Before this fix there were various issues with the test server
framework, most noticeably servers stopping when they shouldn't
causing timeouts. This was caused by the reference counting in the Go
code not being engineered to work in multiple processes so it was not
working at all properly.
This fix moves the reference counting logic to the start scripts and
in turn removes that logic from the Go code. This means that the
reference counting is now global and works correctly over multiple
processes.