Replace the `vault` repetition tests with a faster version
Marco Ricci

Marco Ricci commited on 2025-08-14 22:58:08
Zeige 1 geänderte Dateien mit 22 Einfügungen und 10 Löschungen.


For asserting the correctness of `vault`'s repetition limitation
setting, we used to extract all size `r` substrings of the derived
passphrase (where `r` is the repetition count that is *not* allowed
anymore) and tested whether they contained more than one different
character (by building a set over the characters).  That works, but it
repeatedly builds sets, and scales badly with increased repetition
count.

Instead, we adopt the faster approach that examines the derived
passphrase once, character by character, keeping track of the longest
seen run of identical characters, and asserting that that run is within
the permitted repetition limit.  Although it consists of more
instructions, these are "simpler" instructions that do not involve set
object construction, and in particular, they are independent of the
repetition limit, leading to better scalability.  Sample runs with
Python's `timeit` module also indicate that for length-200 strings and
repetition limit 100, the set-building version takes 2-5 times as long
as the direct run counting version.  Given the nature of this code – it
runs in `hypothesis`, so is executed repeatedly and cannot afford to be
*too* slow –, I posit that the speed gain is worth the slightly indirect
measurement style.
... ...
@@ -633,8 +633,17 @@ class TestConstraintSatisfactionThoroughness(TestVault):
633 633
         password = vault.Vault(
634 634
             phrase=phrase, length=length, repeat=repeat
635 635
         ).generate(service)
636
-        for i in range((length + 1) - (repeat + 1)):
637
-            assert len(set(password[i : i + repeat + 1])) > 1
636
+        last_char: str | int | None = None
637
+        highest_count = 0
638
+        count = 0
639
+        for ch in password:
640
+            if ch != last_char:
641
+                last_char = ch
642
+                count = 0
643
+            else:
644
+                count += 1
645
+                highest_count = max(highest_count, count)
646
+            assert count <= repeat
638 647
 
639 648
 
640 649
 class TestConstraintSatisfactionHeavyDuty(TestVault):
... ...
@@ -728,16 +737,19 @@ class TestConstraintSatisfactionHeavyDuty(TestVault):
728 737
                     sum(c in vault.Vault.CHARSETS[key] for c in password) == 0
729 738
                 ), "Password does not satisfy character ban constraints."
730 739
 
731
-        T = TypeVar("T", str, bytes)
732
-
733
-        def length_r_substrings(string: T, *, r: int) -> Iterator[T]:
734
-            for i in range(len(string) - (r - 1)):
735
-                yield string[i : i + r]
736
-
737 740
         repeat = config["repeat"]
738 741
         if repeat:
739
-            for snippet in length_r_substrings(password, r=(repeat + 1)):
740
-                assert len(set(snippet)) > 1, (
742
+            last_char: str | int | None = None
743
+            highest_count = 0
744
+            count = 0
745
+            for ch in password:
746
+                if ch != last_char:
747
+                    last_char = ch
748
+                    count = 0
749
+                else:
750
+                    count += 1
751
+                    highest_count = max(highest_count, count)
752
+                assert count <= repeat, (
741 753
                     "Password does not satisfy character repeat constraints."
742 754
                 )
743 755
 
744 756