Recent commits to derivepassphrase.git (f805c904589ffba6e0828c4aa5f596c56aba7e00)

Refactor the `exporter` tests

2025-08-17T16:42:24+02:00

(This is part 4 of a series of refactorings for the test suite.)

For the `export vault` command-line interface tests, split the tests
into tests for command-line argument support, tests for various
command-line or format-related errors, and the already existing groups
for "storeroom" and "vault v0.2"/"vault v0.3" format tests.  For the
former two groups, factor out the common test operation, which in both
cases is the whole test content, though with differing call conventions.
For the latter two groups, factor out the common environment setup
instead.

For the `exporter` subpackage tests, collect the data generation
strategies in a common class `Strategies` (similar to the `Parametrize`
class).  Split the existing `TestUtilities` class into
a `TestCLIUtilities` and a `TestExportVaultConfigDataHandlerRegistry`
class, and factor out the common environment setup for each respective
group.  Rename the `TestCLI` class into `TestGenericVaultCLIErrors`, and
factor out the common environment setup and the CLI call function.
</pre>

Refactor the `vault` and `sequin` tests

2025-08-17T16:21:27+02:00

(This is part 3 of a series of refactorings for the test suite.)

For the `sequin` tests, merge the two tests for bit extraction from
a big-endian number.  They both test testing infrastructure, and they
already both had the same test parametrization.

For the `vault` tests, factor out common test operations for the
`TestPhraseDependence` and `TestInterchangablePhrases` classes (which in
both cases amounts to the whole test body).  Rewrite the
`TestStringAndBinaryExchangability` tests to a more uniform style,
parametrizing over the specific binary class.  Also, for
`test_binary_service_name` (now `test_binary_service_name_and_phrase`),
cross-check that all combinations of phrase and service name classes
lead to identical results.

Finally, fix the explicit forbidden patterns in
`test_only_numbers_and_very_high_repetition_limit`, which were one
character too short, as well as a typo in the
`test_arbitrary_repetition_limit` docstring.
</pre>

Refactor the types tests

2025-08-17T16:01:53+02:00

(This is part 2 of a series of refactorings for the test suite.)

For the basic tests, collect all test data in a common class
`Strategies` (similar to the `Parametrize` class).  Split them into
groups for validity testing and validation/data cleaning.  For each
group, factor out the common test operation... which in both cases, is
the whole test content, save for the different input data set they run
on.

For the heavy-duty tests, there is only one data generation strategy,
which needn't artifically be wrapped in a class (yet).  There is also
only one test, so there is no need for further grouping either.
Instead, add some missing commentary on the one explicit example in that
test set.
</pre>

Refactor the localization machinery tests

2025-08-17T15:58:47+02:00

(This is part 1 of a series of refactorings for the test suite.)

Collect all test data generation strategies in a common class
`Strategies` (similar to the `Parametrize` class).  Split the tests into
groups for debug translations, operations on translatable strings
(currently only the hashability check) and suppression of interpolation.
Within each group, attempt to factor out common operations, though at
the moment, only the debug translations group has code factored out.
</pre>

Remove the remaining ordinals from the test names

2025-08-15T18:26:34+02:00

The existing system of ordinals – loosely modeled on HTTP status codes,
but subverted repeatedly because of ad hoc additions or `hypothesis`
reimplementations in the test suite – is hopelessly out of sync, and
quite an obstacle while refactoring the test suite.  Remove them, at
least until the test suite is more stable again.

On the other hand, there is some value in ordering tests so that –
absent any randomization or parallelization of the run order –, the
"simpler" or "more fundamental" tests run first: this sometimes provides
the needed insight to track down a problem that only manifests in a more
cryptic way in the other, more complicated tests.  So in the long run,
some level of ordering will likely be introduced.  But whether this is
an ordinal in the test name, or perhaps something more complicated such
as a signalling test fixture, or mark, or a custom collection function,
or whatever, is still open.
</pre>

Annotate all boolean parametrization with sensible test IDs

2025-08-15T17:59:56+02:00

Because the boolean value would otherwise become part of the `pytest`
test ID, and because the value is devoid of any context, give each
boolean parametrization an explicit, hopefully sensible test ID.
</pre>

Refactor the hypothesis strategies for `vault` tests

2025-08-15T06:54:29+02:00

Collect, reorganize and reimplement the `hypothesis` strategies for
generating phrases and service names in the `vault` module tests.
Phrases come in three possible size ranges, are usually binary but
sometimes textual (if short), and at times, we explicitly want pairs
of phrases that are not interchangable under `vault`, and at other
times, we want a second interchangable phrase given a first one.  We
reimplement the (size-dependent) binary phrase and pair of phrases
strategies with `hypothesis.strategies.composite`, and add them to a new
namespace of stategies, similar to the `Parametrize` class.

I find the result very pleasing to read, and also much more amenable to
adjusting the strategies than when the definitions are always included
inline.
</pre>

Implement the TODO tests for the `vault` command-line interface

2025-08-15T06:51:10+02:00

Implement alluded to, but missing, tests for `derivepassphrase vault`:
passphrase usage based on stored configuration, passphrase usage based
on the command-line, and exporting configurations that were originally
smudged upon import.
</pre>

Replace the `vault` repetition tests with a faster version

2025-08-14T22:58:08+02:00

For asserting the correctness of `vault`'s repetition limitation
setting, we used to extract all size `r` substrings of the derived
passphrase (where `r` is the repetition count that is *not* allowed
anymore) and tested whether they contained more than one different
character (by building a set over the characters).  That works, but it
repeatedly builds sets, and scales badly with increased repetition
count.

Instead, we adopt the faster approach that examines the derived
passphrase once, character by character, keeping track of the longest
seen run of identical characters, and asserting that that run is within
the permitted repetition limit.  Although it consists of more
instructions, these are "simpler" instructions that do not involve set
object construction, and in particular, they are independent of the
repetition limit, leading to better scalability.  Sample runs with
Python's `timeit` module also indicate that for length-200 strings and
repetition limit 100, the set-building version takes 2-5 times as long
as the direct run counting version.  Given the nature of this code – it
runs in `hypothesis`, so is executed repeatedly and cannot afford to be
*too* slow –, I posit that the speed gain is worth the slightly indirect
measurement style.
</pre>

Remove hypothesis tests for config dependence of derived passphrases in `vault`

2025-08-14T22:47:35+02:00

Sadly, there exist configurations and pairs of master passphrases (and
presumably, pairs of services names as well) that lead to the same
derived passphrase.  (These are typically short-length derived
passphrases with strongly restricted character sets.)  Once `hypothesis`
has found such a set of inputs, its example database will cause it to
keep rediscovering that example.

Ideally, we want to express that given enough entropy through the
configuration, the chance of deriving the same passphrases with two
different master passphrases or two different service names becomes very
small.  However, this is a statement about the function's state space,
and I do not know how to sensibly express this statement in a unit-test
or `hypothesis`-test-compatible way, short of perhaps enumerating the
whole state space (which is computationally infeasible).  So, we remove
these tests of config dependence; they are clearly non-functional and
misleading.
</pre>