Commit Graph

11527 Commits

Author SHA1 Message Date
Harshavardhana
997ba3a574
introduce reader deadlines for net.Conn (#19023)
Bonus: set "retry-after" header for AWS SDKs if possible to honor them.
2024-02-09 13:25:16 -08:00
Klaus Post
8e68ff9321
Add extra disconnect safety (#19022)
Fix reported races that are actually synchronized by network calls.

But this should add some extra safety for untimely disconnects.

Race reported:

```
WARNING: DATA RACE
Read at 0x00c00171c9c0 by goroutine 214:
  github.com/minio/minio/internal/grid.(*muxClient).addResponse()
      e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:519 +0x111
  github.com/minio/minio/internal/grid.(*muxClient).error()
      e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:470 +0x21d
  github.com/minio/minio/internal/grid.(*Connection).handleDisconnectClientMux()
      e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1391 +0x15b
  github.com/minio/minio/internal/grid.(*Connection).handleMsg()
      e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1190 +0x1ab
  github.com/minio/minio/internal/grid.(*Connection).handleMessages.func1()
      e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:981 +0x610

Previous write at 0x00c00171c9c0 by goroutine 1081:
  github.com/minio/minio/internal/grid.(*muxClient).roundtrip()
      e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:94 +0x324
  github.com/minio/minio/internal/grid.(*muxClient).traceRoundtrip()
      e:/gopath/src/github.com/minio/minio/internal/grid/trace.go:74 +0x10e4
  github.com/minio/minio/internal/grid.(*Subroute).Request()
      e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:366 +0x230
  github.com/minio/minio/internal/grid.(*SingleHandler[go.shape.*github.com/minio/minio/cmd.DiskInfoOptions,go.shape.*github.com/minio/minio/cmd.DiskInfo]).Call()
      e:/gopath/src/github.com/minio/minio/internal/grid/handlers.go:554 +0x3fd
  github.com/minio/minio/cmd.(*storageRESTClient).DiskInfo()
      e:/gopath/src/github.com/minio/minio/cmd/storage-rest-client.go:314 +0x270
  github.com/minio/minio/cmd.erasureObjects.getOnlineDisksWithHealingAndInfo.func1()
      e:/gopath/src/github.com/minio/minio/cmd/erasure.go:293 +0x171
```

This read will always happen after the write, since there is a network call in between.

However a disconnect could come in while we are setting up the call, so we protect against that with extra checks.
2024-02-09 08:43:38 -08:00
Harshavardhana
62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Harshavardhana
404d8b3084
fix: dangling objects honor parityBlocks instead of dataBlocks (#19019)
Bonus: do not recreate buckets if NoRecreate is asked.
2024-02-08 15:22:16 -08:00
Klaus Post
6005ad3d48
Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Harshavardhana
035a3ea4ae
optimize startup sequence performance (#19009)
- bucket metadata does not need to look for legacy things
  anymore if b.Created is non-zero

- stagger bucket metadata loads across lots of nodes to
  avoid the current thundering herd problem.

- Remove deadlines for RenameData, RenameFile - these
  calls should not ever be timed out and should wait
  until completion or wait for client timeout. Do not
  choose timeouts for applications during the WRITE phase.

- increase R/W buffer size, increase maxMergeMessages to 30
2024-02-08 11:21:21 -08:00
Klaus Post
7ec43bd177
Fix blocked streams blocking reconnects (#19017)
We have observed cases where a blocked stream will block for cancellations.

This happens when response channel is blocked and we want to push an error.
This will have the response mutex locked, which will prevent all other operations until upstream is unblocked.

Make this behavior non-blocking and if blocked spawn a goroutine that will send the response and close the output.

Still a lot of "dancing". Added a test for this and reviewed.
2024-02-08 10:15:27 -08:00
Aditya Manthramurthy
a29c66ed74
Update IAM access manager plugin demo (#19007)
Now prints the JSON payload for easier debugging.
2024-02-08 09:15:20 -08:00
Aditya Manthramurthy
e104b183d8
fix: skip policy usage validation for cache update (#19008)
When updating the policy cache, we do not need to validate policy usage
as the policy has already been deleted by the node sending the
notification.
2024-02-07 20:39:53 -08:00
Klaus Post
7e082f232e
Add GetBucketInfo toStorageErr conversion (#19005)
Convert error to storageError since it is used for quorum calculations here: ff80cfd83d/cmd/peer-s3-client.go (L339)
2024-02-07 14:24:24 -08:00
Harshavardhana
d28bf71f25
listing must return WalkDir() errors first (#19006) 2024-02-07 13:20:07 -08:00
Harshavardhana
5b1a74b6b2
do not block iam.store registration (#18999)
current implementation would quite simply
block the sys.store registration, making
sys.Initialized() call to be blocked.
2024-02-07 12:41:58 -08:00
Minio Trusted
eead4db1d2 Update yaml files to latest version RELEASE.2024-02-06T21-36-22Z 2024-02-07 12:11:36 +00:00
Shubhendu
980fb5e2ab
Enable expired-object-all-versions (#18954)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-02-06 13:36:22 -08:00
Klaus Post
9bcc46d93d
Fix second muxclient context leak (#18987)
Subrouted requests were also leaking contexts in mux clients.

Similar to #18956
2024-02-06 13:35:16 -08:00
Klaus Post
22687c1f50
Add websocket TCP write timeouts (#18988)
Add 3 second write timeout to writes.

This will make dead TCP connections terminate in a reasonable time.

Fixes writes blocking for reconnection.
2024-02-06 13:34:46 -08:00
Klaus Post
ebc6c9b498
Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana
630963fa6b
protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana
f674168b8b
Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Harshavardhana
7e023f2d50 remove go.mod replace tag 2024-02-06 01:56:57 -08:00
Poorna
27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana
794a7993cb
calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana
6f16d1cb2c
do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch
7aa00bff89
sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post
e046eb1d17
Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch
ba975ca320
Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00
Harshavardhana
fec13b0ec1
remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Harshavardhana
100c35c281
avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Minio Trusted
8414aff424 Update yaml files to latest version RELEASE.2024-02-04T22-36-13Z 2024-02-05 00:46:41 +00:00
Harshavardhana
f225ca3312
Add more advanced cases for dangling (#18968) 2024-02-04 14:36:13 -08:00
Frank Wessels
8b68e0bfdc
Fix typo in api-router.go (#18955) 2024-02-03 14:03:51 -08:00
Anis Eleuch
6ae97aedc9
xl: Disable rename2 in decommissioning/rebalance (#18964)
Always disable rename2 optimization in decom/rebalance
2024-02-03 14:03:30 -08:00
Harshavardhana
960d604013
disconnected returns, an unexpected error to List() returning 500s (#18959)
provide the error string appropriately so that the
matching of error types works.

Also add a string based fallback for the said error.
2024-02-03 01:04:33 -08:00
Klaus Post
63bf5f42a1
Fix mux client memory leak (#18956)
Add missing client cancellation, resulting in memory buildup tracing back to context.WithCancelCause/context.WithCancelDeadlineCause
2024-02-02 15:31:06 -08:00
Harshavardhana
ff80cfd83d
move Make,Delete,Head,Heal bucket calls to websockets (#18951) 2024-02-02 14:54:54 -08:00
Harshavardhana
99fde2ba85
deprecate disk tokens, instead rely on deadlines and active monitoring (#18947)
disk tokens usage is not necessary anymore with the implementation
of deadlines for storage calls and active monitoring of the drive
for I/O timeouts.

Functionality kicking off a bad drive is still supported, it's just that 
we do not have to serialize I/O in the manner tokens would do.
2024-02-02 10:10:54 -08:00
Klaus Post
ce0cb913bc
Fix ineffective recycling (#18952)
Recycle would always be called on the dummy value `any(newRT())` instead of the actual value given to the recycle function.

Caught by race tests, but mostly harmless, except for reduced perf.

Other minor cleanups. Introduced in #18940 (unreleased)
2024-02-02 08:48:12 -08:00
Harshavardhana
d99d16e8c3
simplify deadlineWriter, re-use WithDeadline (#18948) 2024-02-02 03:02:31 -08:00
Frank Wessels
31743789dc
Fix some leftover issues from PR 18936 (#18946) 2024-02-01 19:42:56 -08:00
Anis Eleuch
6fd63e920a
log: Use error log type instead of Application/MinIO type (#18930)
* log: Use error log type instead of Application/MinIO type

Also bump github.com/shirou/gopsutil version to address cross
compilation issues.

* Apply suggestions from code review

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>

---------

Co-authored-by: Anis Eleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2024-02-01 16:13:57 -08:00
Aditya Manthramurthy
59cc3e93d6
fix: null inline policy handling for access keys (#18945)
Interpret `null` inline policy for access keys as inheriting parent
policy. Since MinIO Console currently sends this value, we need to honor it
for now. A larger fix in Console and in the server are required.

Fixes #18939.
2024-02-01 14:45:03 -08:00
Anis Eleuch
61a4bb38cd
batch: Fix a typo while validating smallerThan field (#18942) 2024-02-01 13:53:26 -08:00
Klaus Post
b192bc348c
Improve object reuse for grid messages (#18940)
Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers.

This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool.

Add internal request reuse of internal types. Add for safe (pointerless) types explicitly.

Only log params for internal types. Doing Sprint(obj) is just a bit too messy.
2024-02-01 12:41:20 -08:00
Harshavardhana
6440d0fbf3
move a collection of peer APIs to websockets (#18936) 2024-02-01 10:47:20 -08:00
Minio Trusted
ee0055b929 Update yaml files to latest version RELEASE.2024-01-31T20-20-33Z 2024-01-31 20:35:33 +00:00
Anis Eleuch
24ecc44bac
Keep ServiceV1 admin stop/restart API and mark as deprecated (#18932) 2024-01-31 12:20:33 -08:00
Aditya Manthramurthy
0ae4915a93
fix: permission checks for editing access keys (#18928)
With this change, only a user with `UpdateServiceAccountAdminAction`
permission is able to edit access keys.

We would like to let a user edit their own access keys, however the
feature needs to be re-designed for better security and integration with
external systems like AD/LDAP and OpenID.

This change prevents privilege escalation via service accounts.
2024-01-31 10:56:45 -08:00
Frank Wessels
4cd777a5e0
Correct small typo in pubsub (#18923) 2024-01-31 01:01:53 -08:00
Aditya Manthramurthy
65028d4a35
Update service file version in makefile (#18925) 2024-01-31 01:01:30 -08:00
Harshavardhana
caac9d216e
remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00