Fix races in IAM cache
Fixes#19344
On the top level we only grab a read lock, but we write to the cache if we manage to fetch it.
a03dac41eb/cmd/iam-store.go (L446) is also flipped to what it should be AFAICT.
Change the internal cache structure to a concurrency safe implementation.
Bonus: Also switch grid implementation.
* Remove lock for cached operations.
* Rename "Relax" to `ReturnLastGood`.
* Add `CacheError` to allow caching values even on errors.
* Add NoWait that will return current value with async fetching if within 2xTTL.
* Make benchmark somewhat representative.
```
Before: BenchmarkCache-12 16408370 63.12 ns/op 0 B/op
After: BenchmarkCache-12 428282187 2.789 ns/op 0 B/op
```
* Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning.
* Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics.
* Do async fetching of usage caches.
With this change, only a user with `UpdateServiceAccountAdminAction`
permission is able to edit access keys.
We would like to let a user edit their own access keys, however the
feature needs to be re-designed for better security and integration with
external systems like AD/LDAP and OpenID.
This change prevents privilege escalation via service accounts.
AccountInfo is quite frequently called by the Console UI
login attempts, when many users are logging in it is important
that we provide them with better responsiveness.
- ListBuckets information is cached every second
- Bucket usage info is cached for up to 10 seconds
- Prefix usage (optional) info is cached for up to 10 secs
Failure to update after cache expiration, would still
allow login which would end up providing information
previously cached.
This allows for seamless responsiveness for the Console UI
logins, and overall responsiveness on a heavily loaded
system.
A new middleware function is added for admin handlers, including options
for modifying certain behaviors. This admin middleware:
- sets the handler context via reflection in the request and sends AuditLog
- checks for object API availability (skipping it if a flag is passed)
- enables gzip compression (skipping it if a flag is passed)
- enables header tracing (adding body tracing if a flag is passed)
While the new function is a middleware, due to the flags used for
conditional behavior modification, which is used in each route registration
call.
To try to ensure that no regressions are introduced, the following
changes were done mechanically mostly with `sed` and regexp:
- Remove defer logger.AuditLog in admin handlers
- Replace newContext() calls with r.Context()
- Update admin routes registration calls
Bonus: remove unused NetSpeedtestHandler
Since the new adminMiddleware function checks for object layer presence
by default, we need to pass the `noObjLayerFlag` explicitly to admin
handlers that should work even when it is not available. The following
admin handlers do not require it:
- ServerInfoHandler
- StartProfilingHandler
- DownloadProfilingHandler
- ProfileHandler
- SiteReplicationDevNull
- SiteReplicationNetPerf
- TraceHandler
For these handlers adminMiddleware does not check for the object layer
presence (disabled by passing the `noObjLayerFlag`), and for all other
handlers, the pre-check ensures that the handler is not called when the
object layer is not available - the client would get a
ErrServerNotInitialized and can retry later.
This `noObjLayerFlag` is added based on existing behavior for these
handlers only.
This would better to record the correct API name so that
any verification around audit logs to figure out if required
APIs are called required no of times, would be correct.
Here in this case of policy attached, API `AttachDetachPolicyBuiltin`
would be called with `requestPath` as `/minio/admin/v3/idp/builtin/policy/attach`
and in case of detach policy the value would be `/minio/admin/v3/idp/builtin/policy/detach`
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
For policy attach/detach API to work correctly the server should hold a
lock before reading existing policy mapping and until after writing the
updated policy mapping. This is fixed in this change.
A site replication bug, where LDAP policy attach/detach were not
correctly propagated is also fixed in this change.
Bonus: Additionally, the server responds with the actual (or net)
changes performed in the attach/detach API call. For e.g. if a user
already has policy A applied, and a call to attach policies A and B is
performed, the server will respond that B was attached successfully.