Commit Graph

356 Commits

Author SHA1 Message Date
Kristoffer Dalby
5a2ee0c391 db: add comment about removing migrations
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-10 17:32:39 +01:00
Andrey Bobelev
5cd15c3656 fix: make state cookies valid when client uses multiple login URLs
On Windows, if the user clicks the Tailscale icon in the system tray,
it opens a login URL in the browser.

When the login URL is opened, `state/nonce` cookies are set for that particular URL.

If the user clicks the icon again, a new login URL is opened in the browser,
and new cookies are set.

If the user proceeds with auth in the first tab,
the redirect results in a "state did not match" error.

This patch ensures that each opened login URL sets an individual cookie
that remains valid on the `/oidc/callback` page.

`TestOIDCMultipleOpenedLoginUrls` illustrates and tests this behavior.
2025-11-10 16:27:46 +01:00
Kristoffer Dalby
2024219bd1 types: Distinguish subnet and exit node access
When we fixed the issue of node visibility of nodes
that only had access to eachother because of a subnet
route, we gave all nodes access to all exit routes by
accident.

This commit splits exit nodes and subnet routes in the
access.

If a matcher indicates that the node should have access to
any part of the subnet routes, we do not remove it from the
node list.

If a matcher destination is equal to the internet, and the
target node is an exit node, we also do not remove the access.

Fixes #2784
Fixes #2788

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
d9c3eaf8c8 matcher: Add func for comparing Dests and TheInternet
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
bd9cf42b96 types: NodeView CanAccess uses internal
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
d7a43a7cf1 state: use AllApprovedRoutes instead of SubnetRoutes
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
1c0bb0338d types: split SubnetRoutes and ExitRoutes
There are situations where the subnet routes and exit nodes
must be treated differently. This splits it so SubnetRoutes
only returns routes that are not exit routes.

It adds `IsExitRoutes` and `AllApprovedRoutes` for convenience.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
c649c89e00 policy: Reproduce exit node visibility issues
Reproduces #2784 and #2788

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Vitalij Dovhanyc
af2de35b6c chore: fix autogroup:self with other acl rules (#2842) 2025-11-02 10:48:27 +00:00
Copilot
d23fa26395 Fix flaky TestShuffleDERPMapDeterministic by ensuring deterministic map iteration (#2848)
Co-authored-by: kradalby <98431+kradalby@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2025-11-02 10:05:23 +01:00
Andrey
f9bb88ad24 expire nodes with a custom timestamp (#2828) 2025-11-01 08:09:13 +01:00
Kristoffer Dalby
456a5d5cce db: ignore _litestream tables when validating (#2843) 2025-11-01 07:08:22 +00:00
Kristoffer Dalby
ddbd3e14ba db: remove all old, unused tables (#2844) 2025-11-01 08:03:37 +01:00
Kristoffer Dalby
52d27d58f0 hscontrol: add /version HTTP endpoint (#2821) 2025-10-27 10:41:34 +01:00
Kristoffer Dalby
2bf1200483 policy: fix autogroup:self propagation and optimize cache invalidation (#2807) 2025-10-23 17:57:41 +02:00
Kristoffer Dalby
66826232ff integration: add tests for api bypass (#2811) 2025-10-22 16:30:25 +02:00
Kristoffer Dalby
1cdea7ed9b stricter hostname validation and replace (#2383) 2025-10-22 13:50:39 +02:00
Elyas Asmad
2c9e98d3f5 fix: guard every error statement with early return (#2810) 2025-10-22 13:48:07 +02:00
Juanjo Presa
c97d0ff23d Fix fatal error on missing config file by handling viper.ConfigFileNotFoundError
Correctly identify Viper's ConfigFileNotFoundError in LoadConfig to log a warning and use defaults, unifying behavior with empty config files. Fixes fatal error when no config file is present for CLI commands relying on environment variables.
2025-10-19 15:29:47 +02:00
Florian Preinstorfer
46477b8021 Downgrade completed broadcast message to debug 2025-10-18 07:56:59 +02:00
Stavros Kois
c07cc491bf add health command (#2659)
* add health command
* update health check implementation to allow for more checks to added over time
* add change changelog entry
2025-10-16 12:00:11 +00:00
Vitalij Dovhanyc
c2a58a304d feat: add autogroup:self (#2789) 2025-10-16 12:59:52 +02:00
Kristoffer Dalby
fddc7117e4 stability and race conditions in auth and node store (#2781)
This PR addresses some consistency issues that was introduced or discovered with the nodestore.

nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.

auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.

integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)

This PR was assisted, particularly tests, by claude code.
2025-10-16 12:17:43 +02:00
Andrey Bobelev
c4a8c038cd fix: return valid AuthUrl in followup request on expired reg id
- tailscale client gets a new AuthUrl and sets entry in the regcache
- regcache entry expires
- client doesn't know about that
- client always polls followup request а gets error

When user clicks "Login" in the app (after cache expiry), they visit
invalid URL and get "node not found in registration cache". Some clients
on Windows for e.g. can't get a new AuthUrl without restart the app.

To fix that we can issue a new reg id and return user a new valid
AuthUrl.

RegisterNode is refactored to be created with NewRegisterNode() to
autocreate channel and other stuff.
2025-10-11 05:57:39 +02:00
Andrey Bobelev
022098fe4e chore: make reg cache expiry tunable
Mostly for the tests, opts:

- tuning.register_cache_expiration
- tuning.register_cache_cleanup
2025-10-11 05:57:39 +02:00
Kristoffer Dalby
ed3a9c8d6d mapper: send change instead of full update (#2775) 2025-09-17 14:23:21 +02:00
Kristoffer Dalby
2b30a15a68 cmd: add option to get and set policy directly from database (#2765) 2025-09-12 16:55:15 +02:00
Kristoffer Dalby
2938d03878 policy: reject unsupported fields (#2764) 2025-09-12 14:47:56 +02:00
Kristoffer Dalby
1b1c989268 {policy, node}: allow return paths in route reduction (#2767) 2025-09-12 11:47:51 +02:00
Kristoffer Dalby
3950f8f171 cli: use gobuild version handling (#2770) 2025-09-12 11:47:31 +02:00
Kristoffer Dalby
ee0ef396a2 policy: fix ssh usermap, fixing autogroup:nonroot (#2768) 2025-09-12 09:12:30 +02:00
Kristoffer Dalby
7056fbb63b derp: fix flaky shuffle test (#2772) 2025-09-11 13:49:02 +00:00
Kristoffer Dalby
c91b9fc761 poll: add missing godoc (#2763) 2025-09-11 14:15:19 +02:00
Kristoffer Dalby
d41fb4d540 app: fix sigint hanging
When the node notifier was replaced with batcher, we removed
its closing, but forgot to add the batchers so it was never
stopping node connections and waiting forever.

Fixes #2751

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-11 11:53:26 +02:00
Kristoffer Dalby
01c1f6f82a policy: validate error message for asterix in ssh (#2766) 2025-09-10 18:41:43 +02:00
Kristoffer Dalby
476f30ab20 state: ensure netinfo is preserved and not removed
the client will send a lot of fields as `nil` if they have
not changed. NetInfo, which is inside Hostinfo, is one of those
fields and we often would override the whole hostinfo meaning that
we would remove netinfo if it hadnt changed.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
233dffc186 lint and leftover
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
81b3e8f743 util: harden parsing of traceroute
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
50ed24847b debug: add json and improve
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
3b16b75fe6 integration: rework retry for waiting for node sync
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
9d236571f4 state/nodestore: in memory representation of nodes
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.

It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.

Writes will block until commited, while reads are never
blocked.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
38be30b6d4 derp: allow override to ip for debug
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
b6d5788231 mapper: produce map before poll
Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.

Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.

This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.

In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:

Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.

Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.

When the poll recieved the map, it is only responsible for taking it and
sending it to the node.

This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
8e25f7f9dd bunch of qol (#2748) 2025-08-27 17:09:13 +02:00
cuiweixie
a2a6d20218 Refactor to use reflect.TypeFor 2025-08-23 20:43:49 +02:00
Andrey Bobelev
d29feaef79 chore(derp): allow nil regions in DERPMaps
Previously, nil regions were not properly handled. This change allows users to disable regions in DERPMaps.

Particularly useful to disable some official regions.
2025-08-23 06:54:14 +02:00
Andrey Bobelev
630bfd265a chore(derp): prioritize loading DERP maps from URLs
This allows users to override default entries provided via URL
2025-08-23 06:54:14 +02:00
Kristoffer Dalby
b87567628a derp: increase update frequency and harden on failures (#2741) 2025-08-22 10:40:38 +02:00
Florian Preinstorfer
be337c6a33 Enable derp.server.verify_clients by default
This setting is already enabled in example-config.yaml but would default
to false if no key is set.
2025-08-19 11:30:44 +02:00
Shourya Gautam
086fcad7d9 Fix Internal server error on /verify (#2735)
* converted the returned error to an httpError
2025-08-18 14:39:42 +00:00