Commit Graph

5883 Commits

Author SHA1 Message Date
Harshavardhana
b6e98aed01 fix: found races in accessing globalLocalDrives (#19069)
make a copy before accessing globalLocalDrives

Bonus: update console v0.46.0

Signed-off-by: Harshavardhana <harsha@minio.io>
2024-02-16 17:15:57 -08:00
Anis Eleuch
00dcba9ddd Fix typo in jwt skewed date/time error (#19066) 2024-02-16 10:48:30 -08:00
Harshavardhana
607cafadbc converge clusterRead health into cluster health (#19063) 2024-02-15 16:48:36 -08:00
Anis Eleuch
68dde2359f log: Add logger.Event to send to console and other logger targets (#19060)
Add a new function logger.Event() to send the log to Console and
http/kafka log webhooks. This will include some internal events such as
disk healing and rebalance/decommissioning
2024-02-15 15:13:30 -08:00
Poorna
f9dbf41e27 sr: add validation to disallow updating bandwidth limit on self (#19062) 2024-02-15 13:03:40 -08:00
Krishnan Parthasarathi
7405760f44 Refresh tier config periodically (#19049)
- Increase the parity for tier-config.bin object
- Refresh globalTierConfigMgr cached value once every 15 mins
2024-02-15 11:52:44 -08:00
Harshavardhana
7e4a6b4bcd remove rename2 entirely, avoids the risk of moving data (#19058) 2024-02-14 17:09:38 -08:00
Harshavardhana
f961ec4aaf fix: revert allow offline disks on fresh start (#19052)
the PR in #16541 was incorrect and hand wrong assumptions
about the overall setup, revert this since this expectation
to have offline servers is wrong and we can end up with a
bigger chicken and egg problem.

This reverts commit 5996c8c4d5.

Bonus:

- preserve disk in globalLocalDrives properly upon connectDisks()
- do not return 'nil' from newXLStorage(), getting it ready for
  the next set of changes for 'format.json' loading.
2024-02-14 10:37:34 -08:00
Harshavardhana
134db72bb7 fix: reject service account access key same as root credentials (#19055) 2024-02-14 10:37:12 -08:00
Harshavardhana
effe21f3eb send correct objectname in audit events for DeleteAll ILM (#19053) 2024-02-14 08:07:58 -08:00
Praveen raj Mani
1118b285d3 fix: race in deleting objects during batch expiry (#19054) 2024-02-14 08:07:44 -08:00
Aditya Manthramurthy
a14e192376 fix: remove unnecessary panic in iam-store (#19050) 2024-02-13 19:29:36 -08:00
Minio Trusted
f8e15e7d09 Update yaml files to latest version RELEASE.2024-02-13T15-35-11Z 2024-02-13 16:01:38 +00:00
Shireesh Anjal
7b9f9e0628 fix incorrect disk io stats in k8s environment (#19016)
The previous logic of calculating per second values for disk io stats
divides the stats by the host uptime. This doesn't work in k8s
environment as the uptime is of the pod, but the stats (from
/proc/diskstats) are from the host.

Fix this by storing the initial values of uptime and the stats at the
timme of server startup, and using the difference between current and
initial values when calculating the per second values.
2024-02-13 07:35:11 -08:00
Praveen raj Mani
ac8e9ce04f Send a bucket notification event on DeleteObject() for non-existing object (#19037)
Send a bucket notification event on DeleteObject for non-existing objects
2024-02-13 07:34:17 -08:00
Praveen raj Mani
cfd8645843 fix: update batch replication stats for snowball uploads (#19045) 2024-02-13 07:33:27 -08:00
Harshavardhana
0c068b15c7 add missing handler for reloading site replication config on peers (#19042) 2024-02-13 06:55:54 -08:00
Anis Eleuch
30a466aa71 sts: Add test for DurationSeconds condition (#19044) 2024-02-13 06:55:37 -08:00
Taran Pelkey
4d94609c44 FIx unexpected behavior when creating service account (#19036) 2024-02-13 02:31:43 -08:00
Poorna
0cc9fb73e1 metrics: fix typo in namespace for proxy tagging metric (#19039)
Relevant PR introducing this metric: #18957
2024-02-12 13:02:27 -08:00
Harshavardhana
eac4e4b279 honor replaced disk properly by updating globalLocalDrives (#19038)
globalLocalDrives seem to be not updated during the
HealFormat() leads to a requirement where the server
needs to be restarted for the healing to continue.
2024-02-12 13:00:20 -08:00
Harshavardhana
6d381f7c0a relax pre-emptive GetBucketInfo() for multi-object delete (#19035) 2024-02-12 08:46:46 -08:00
Anis Eleuch
4fa06aefc6 Convert service account add/update expiration to cond values (#19024)
In order to force some users allowed to create or update a service
account to provide an expiration satifying the user policy conditions.
2024-02-12 08:36:16 -08:00
Harshavardhana
0e177a44e0 preserve conflicting objects when parent object is being deleted (#19034)
a/prefix
a/prefix/1.txt

where `a/prefix` is an object which does not have `/` at the end,
we do not have to aggressively recursively delete all the sub-folders
as well. Instead convert the call into self contained to deleting
'xl.meta' and then subsequently attempting to Remove the parent.
2024-02-12 08:30:40 -08:00
Harshavardhana
afd19de5a9 fix: allow configuring excess versions alerting (#19028)
Bonus: enable audit alerts for object versions
beyond the configured value, default is '100'
versions per object beyond which scanner will
alert for each such objects.
2024-02-11 23:41:53 -08:00
Harshavardhana
e3fbac9e24 do not have to use the same distributionAlgo as first pool (#19031)
when we expand via pools, there is no reason to stick
with the same distributionAlgo as the rest. Since the
algo only makes sense with-in a pool not across pools.

This allows for newer pools to use newer codepaths to
avoid legacy file lookups when they have a pre-existing
deployment from 2019, they can expand their new pool
to be of a newer distribution format, allowing the
pool to be more performant.
2024-02-11 23:21:56 -08:00
Poorna
a9cf32811c Fix panic in tagging request proxying (#19032) 2024-02-11 18:18:43 -08:00
Harshavardhana
53997ecc79 avoid excessive logging for objects that do not exist (#19030)
in replicated setups, that have proxying enabled for
replicated buckets.
2024-02-11 14:21:08 -08:00
Harshavardhana
997ba3a574 introduce reader deadlines for net.Conn (#19023)
Bonus: set "retry-after" header for AWS SDKs if possible to honor them.
2024-02-09 13:25:16 -08:00
Harshavardhana
62761a23e6 remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Harshavardhana
404d8b3084 fix: dangling objects honor parityBlocks instead of dataBlocks (#19019)
Bonus: do not recreate buckets if NoRecreate is asked.
2024-02-08 15:22:16 -08:00
Klaus Post
6005ad3d48 Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Harshavardhana
035a3ea4ae optimize startup sequence performance (#19009)
- bucket metadata does not need to look for legacy things
  anymore if b.Created is non-zero

- stagger bucket metadata loads across lots of nodes to
  avoid the current thundering herd problem.

- Remove deadlines for RenameData, RenameFile - these
  calls should not ever be timed out and should wait
  until completion or wait for client timeout. Do not
  choose timeouts for applications during the WRITE phase.

- increase R/W buffer size, increase maxMergeMessages to 30
2024-02-08 11:21:21 -08:00
Aditya Manthramurthy
e104b183d8 fix: skip policy usage validation for cache update (#19008)
When updating the policy cache, we do not need to validate policy usage
as the policy has already been deleted by the node sending the
notification.
2024-02-07 20:39:53 -08:00
Klaus Post
7e082f232e Add GetBucketInfo toStorageErr conversion (#19005)
Convert error to storageError since it is used for quorum calculations here: ff80cfd83d/cmd/peer-s3-client.go (L339)
2024-02-07 14:24:24 -08:00
Harshavardhana
d28bf71f25 listing must return WalkDir() errors first (#19006) 2024-02-07 13:20:07 -08:00
Harshavardhana
5b1a74b6b2 do not block iam.store registration (#18999)
current implementation would quite simply
block the sys.store registration, making
sys.Initialized() call to be blocked.
2024-02-07 12:41:58 -08:00
Klaus Post
ebc6c9b498 Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana
630963fa6b protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana
f674168b8b Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Poorna
27d02ea6f7 metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana
794a7993cb calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana
6f16d1cb2c do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch
7aa00bff89 sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post
e046eb1d17 Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch
ba975ca320 Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00
Harshavardhana
fec13b0ec1 remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Harshavardhana
100c35c281 avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Harshavardhana
f225ca3312 Add more advanced cases for dangling (#18968) 2024-02-04 14:36:13 -08:00
Frank Wessels
8b68e0bfdc Fix typo in api-router.go (#18955) 2024-02-03 14:03:51 -08:00