Commit Graph

11911 Commits

Author SHA1 Message Date
Klaus Post ebc6c9b498
Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana 630963fa6b
protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana f674168b8b
Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Harshavardhana 7e023f2d50 remove go.mod replace tag 2024-02-06 01:56:57 -08:00
Poorna 27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana 794a7993cb
calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana 6f16d1cb2c
do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch 7aa00bff89
sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post e046eb1d17
Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch ba975ca320
Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00
Harshavardhana fec13b0ec1
remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Harshavardhana 100c35c281
avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Minio Trusted 8414aff424 Update yaml files to latest version RELEASE.2024-02-04T22-36-13Z 2024-02-05 00:46:41 +00:00
Harshavardhana f225ca3312
Add more advanced cases for dangling (#18968) 2024-02-04 14:36:13 -08:00
Frank Wessels 8b68e0bfdc
Fix typo in api-router.go (#18955) 2024-02-03 14:03:51 -08:00
Anis Eleuch 6ae97aedc9
xl: Disable rename2 in decommissioning/rebalance (#18964)
Always disable rename2 optimization in decom/rebalance
2024-02-03 14:03:30 -08:00
Harshavardhana 960d604013
disconnected returns, an unexpected error to List() returning 500s (#18959)
provide the error string appropriately so that the
matching of error types works.

Also add a string based fallback for the said error.
2024-02-03 01:04:33 -08:00
Klaus Post 63bf5f42a1
Fix mux client memory leak (#18956)
Add missing client cancellation, resulting in memory buildup tracing back to context.WithCancelCause/context.WithCancelDeadlineCause
2024-02-02 15:31:06 -08:00
Harshavardhana ff80cfd83d
move Make,Delete,Head,Heal bucket calls to websockets (#18951) 2024-02-02 14:54:54 -08:00
Harshavardhana 99fde2ba85
deprecate disk tokens, instead rely on deadlines and active monitoring (#18947)
disk tokens usage is not necessary anymore with the implementation
of deadlines for storage calls and active monitoring of the drive
for I/O timeouts.

Functionality kicking off a bad drive is still supported, it's just that 
we do not have to serialize I/O in the manner tokens would do.
2024-02-02 10:10:54 -08:00
Klaus Post ce0cb913bc
Fix ineffective recycling (#18952)
Recycle would always be called on the dummy value `any(newRT())` instead of the actual value given to the recycle function.

Caught by race tests, but mostly harmless, except for reduced perf.

Other minor cleanups. Introduced in #18940 (unreleased)
2024-02-02 08:48:12 -08:00
Harshavardhana d99d16e8c3
simplify deadlineWriter, re-use WithDeadline (#18948) 2024-02-02 03:02:31 -08:00
Frank Wessels 31743789dc
Fix some leftover issues from PR 18936 (#18946) 2024-02-01 19:42:56 -08:00
Anis Eleuch 6fd63e920a
log: Use error log type instead of Application/MinIO type (#18930)
* log: Use error log type instead of Application/MinIO type

Also bump github.com/shirou/gopsutil version to address cross
compilation issues.

* Apply suggestions from code review

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>

---------

Co-authored-by: Anis Eleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2024-02-01 16:13:57 -08:00
Aditya Manthramurthy 59cc3e93d6
fix: `null` inline policy handling for access keys (#18945)
Interpret `null` inline policy for access keys as inheriting parent
policy. Since MinIO Console currently sends this value, we need to honor it
for now. A larger fix in Console and in the server are required.

Fixes #18939.
2024-02-01 14:45:03 -08:00
Anis Eleuch 61a4bb38cd
batch: Fix a typo while validating smallerThan field (#18942) 2024-02-01 13:53:26 -08:00
Klaus Post b192bc348c
Improve object reuse for grid messages (#18940)
Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers.

This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool.

Add internal request reuse of internal types. Add for safe (pointerless) types explicitly.

Only log params for internal types. Doing Sprint(obj) is just a bit too messy.
2024-02-01 12:41:20 -08:00
Harshavardhana 6440d0fbf3
move a collection of peer APIs to websockets (#18936) 2024-02-01 10:47:20 -08:00
Minio Trusted ee0055b929 Update yaml files to latest version RELEASE.2024-01-31T20-20-33Z 2024-01-31 20:35:33 +00:00
Anis Eleuch 24ecc44bac
Keep ServiceV1 admin stop/restart API and mark as deprecated (#18932) 2024-01-31 12:20:33 -08:00
Aditya Manthramurthy 0ae4915a93
fix: permission checks for editing access keys (#18928)
With this change, only a user with `UpdateServiceAccountAdminAction`
permission is able to edit access keys.

We would like to let a user edit their own access keys, however the
feature needs to be re-designed for better security and integration with
external systems like AD/LDAP and OpenID.

This change prevents privilege escalation via service accounts.
2024-01-31 10:56:45 -08:00
Frank Wessels 4cd777a5e0
Correct small typo in pubsub (#18923) 2024-01-31 01:01:53 -08:00
Aditya Manthramurthy 65028d4a35
Update service file version in makefile (#18925) 2024-01-31 01:01:30 -08:00
Harshavardhana caac9d216e
remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00
Harshavardhana 057192913c
add total usable capacity, free and used to DataUsageInfo() (#18921) 2024-01-30 17:49:37 -08:00
Harshavardhana f25cbdf43c
use all the available nr_requests for NVMe (#18920) 2024-01-30 14:10:06 -08:00
Klaus Post 6da4a9c7bb
Improve tracing & notification scalability (#18903)
* Perform JSON encoding on remote machines and only forward byte slices.
* Migrate tracing & notification to WebSockets.
2024-01-30 12:49:02 -08:00
Harshavardhana 80ca120088
remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Anis Eleuch a669946357
Add cgroup v2 support for memory limit (#18905) 2024-01-30 11:13:27 -08:00
Poorna 7ffc162ea8
exclude veeam virtual objects from replication (#18918)
Fixes: #18916
2024-01-30 10:43:58 -08:00
Poorna bcfd7fbbcf
reuse transports for callhome and remote tgt validation (#18912) 2024-01-29 23:05:39 -08:00
Harshavardhana 486e2e48ea
enable xattr capture by default (#18911)
- healing must not set the write xattr
  because that is the job of active healing
  to update. what we need to preserve is
  permanent deletes.

- remove older env for drive monitoring and
  enable it accordingly, as a global value.
2024-01-29 23:03:58 -08:00
Harshavardhana 2ddf2ca934
allow configuring maximum idle connections per host (#18908) 2024-01-29 16:50:37 -08:00
Daniel Valdivia 403ec7cf21
fix: metrics URI path in prometheus docs (#18907) 2024-01-29 14:34:21 -08:00
Poorna 29b1a29044
fix metrics panic in node metrics endpoint (#18894) 2024-01-29 12:32:44 -08:00
jiuker b4ab8e095a
fix: preserve bucket metric of data usage for replication info (#18895) 2024-01-29 08:54:20 -08:00
Minio Trusted ff4f4d4649 Update yaml files to latest version RELEASE.2024-01-29T03-56-32Z 2024-01-29 05:33:23 +00:00
Harshavardhana 9987ff570b avoid calling close for nil inbound/outblock channels 2024-01-28 19:56:32 -08:00
Harshavardhana cff8235068 remove getReplicationNodeMetrics() from peer metrics groups 2024-01-28 18:45:20 -08:00
Harshavardhana 9ef132c33b remove excessive logging due to runtime.debugStack 2024-01-28 18:10:42 -08:00