Added two more test cases with the same dataset (region 4) but different base domains to ensure the shuffle is truly deterministic and that different seeds produce different results. Co-authored-by: kradalby <98431+kradalby@users.noreply.github.com>