Commit Graph

5937 Commits

Author SHA1 Message Date
Harshavardhana
2cc4997d24
fix: crash on 32bit systems during pre-allocation (#19225) 2024-03-08 05:55:28 -08:00
Poorna
934f6cabf6
sr: use site replicator creds to verify temp user claims (#19224)
This PR continues #19209 which did not handle claims verification of
temporary users created by root in site replication scenario.

Fixes: #19217
2024-03-07 14:30:00 -08:00
Anis Eleuch
68dd74c5ab
batch: Separate batch job request and batch job stats (#19205)
Currently, the progress of the batch job is saved in inside the job
request object, which is normally not supported by MinIO. Though there
is no apparent bug, it is better to fix this now.

Batch progress is saved in .minio.sys/batch-jobs/reports/

Co-authored-by: Anis Eleuch <anis@min.io>
2024-03-07 10:58:22 -08:00
Harshavardhana
48b590e14b
fix: same server to be part of multiple pools (#19216)
our PoolNumber calculation was costly,
while we already had this information per
endpoint, we needed to deduce it appropriately.

This PR addresses this by assigning PoolNumbers
field that carries all the pool numbers that
belong to a server.

properties.PoolNumber still carries a valid value
only when len(properties.PoolNumbers) == 1, otherwise
properties.PoolNumber is set to math.MaxInt (indicating
that this value is undefined) and then one must rely
on properties.PoolNumbers for server participation
in multiple pools.

addresses the issue originating from #11327
2024-03-07 10:24:07 -08:00
Poorna
837a2a3d4b
sr: use service account cred for claims check (#19209)
PR #19111 overlaid service account secret with site replicator secret
during token claims check.

Fixes : #19206
2024-03-06 16:19:24 -08:00
Harshavardhana
74ccee6619
avoid too much auditing during decom/rebalance make it more robust (#19174)
there can be a sudden spike in tiny allocations,
due to too much auditing being done, also don't hang
on the

```
h.logCh <- entry
```

after initializing workers if you do not have a way to
dequeue for some reason.
2024-03-06 03:43:16 -08:00
Poorna
89f759566c
bucket import: avoid overwriting bucket creation date (#19207) 2024-03-05 16:05:28 -08:00
Harshavardhana
cd7551031b
fix: a regression in loading replication creds (#19204)
fixes #19200

generating STS credentials fail with site-replicated
setup, with this error on a fresh environment.
2024-03-05 11:06:17 -08:00
Praveen raj Mani
df57bfcd6c
fix: cluster read health check to return proper values (#19203)
Fixes #19202
2024-03-05 10:25:49 -08:00
Justin Griffin
dfb1f39b57
Support custom endpoint for Azure remote storage tier (#19188)
This commits adds support for using the `--endpoint` arg when creating a
tier of type `azure`. This is needed to connect to Azure's Gov Cloud
instance.  For example,

```
mc ilm tier add azure TARGET TIER_NAME \
   --account-name ACCOUNT \
   --account-key KEY \
   --bucket CONTAINER \
   --endpoint https://ACCOUNT.blob.core.usgovcloudapi.net
   --prefix PREFIX \
   --storage-class STORAGE_CLASS
```

Prior to this, the endpoint was hardcoded to `https://ACCOUNT.blob.core.windows.net`.
The docs were even explicit about this, stating that `--endpoint` is:

  "Required for `s3` or `minio` tier types. This option has no effect for any
  other value of `TIER_TYPE`."

Now, if the endpoint arg is present it will be used.  If not, it will
fall back to the same default behavior of `ACCOUNT.blob.core.windows.net`.
2024-03-05 08:44:08 -08:00
Harshavardhana
1b5f28e99b
fix: skip local disks properly in cluster health maintenance check (#19184) 2024-03-04 20:48:44 -08:00
Krishnan Parthasarathi
b69bcdcdc4
Fix ilm config at startup (#19189)
Remove api.expiration_workers config setting which was inadvertently left behind. Per review comment 

https://github.com/minio/minio/pull/18926, expiration_workers can be configured via ilm.expiration_workers.
2024-03-04 18:50:24 -08:00
Harshavardhana
e385f54185
fix: nLink is unreliable on all filesystems (#19187)
ext4, xfs support this behavior however
btrfs, nfs may not support it properly.

in-case when we see Nlink < 2 then we know
that we need to fallback on readdir()

fixes a regression from #19100

fixes #19181
2024-03-04 15:58:35 -08:00
Aditya Manthramurthy
9a4d003ac7
Add common middleware to S3 API handlers (#19171)
The middleware sets up tracing, throttling, gzipped responses and
collecting API stats.

Additionally, this change updates the names of handler functions in
metric labels to be the same as the name derived from Go lang reflection
on the handler name.

The metric api labels are now stored in memory the same as the handler
name - they will be camelcased, e.g. `GetObject` instead of `getobject`.

For compatibility, we lowercase the metric api label values when emitting the metrics.
2024-03-04 10:05:56 -08:00
Praveen raj Mani
d5656eeb65
fix: healthcheck to fail even if one erasure set doesn't have quorum (#19180)
fix: healthcheck to return false even if one erasure set doesn't have quorum
2024-03-04 08:34:14 -08:00
Harshavardhana
6d08af61a0
for root disks add additional information in the error log (#19177) 2024-03-02 23:45:39 -08:00
Krishnan Parthasarathi
a7577da768
Improve expiration of tiered objects (#18926)
- Use a shared worker pool for all ILM expiry tasks
- Free version cleanup executes in a separate goroutine
- Add a free version only if removing the remote object fails
- Add ILM expiry metrics to the node namespace
- Move tier journal tasks to expiryState
- Remove unused on-disk journal for tiered objects pending deletion
- Distribute expiry tasks across workers such that the expiry of versions of
  the same object serialized
- Ability to resize worker pool without server restart
- Make scaling down of expiryState workers' concurrency safe; Thanks
  @klauspost
- Add error logs when expiryState and transition state are not
  initialized (yet)
* metrics: Add missed tier journal entry tasks
* Initialize the ILM worker pool after the object layer
2024-03-01 21:11:03 -08:00
Harshavardhana
325fd80687
add retry logic upto 3 times for policy map and policy (#19173) 2024-03-01 16:21:34 -08:00
Andreas Auernhammer
09626d78ff
automatically generate root credentials with KMS (#19025)
With this commit, MinIO generates root credentials automatically
and deterministically if:

 - No root credentials have been set.
 - A KMS (KES) is configured.
 - API access for the root credentials is disabled (lockdown mode).

Before, MinIO defaults to `minioadmin` for both the access and
secret keys. Now, MinIO generates unique root credentials
automatically on startup using the KMS.

Therefore, it uses the KMS HMAC function to generate pseudo-random
values. These values never change as long as the KMS key remains
the same, and the KMS key must continue to exist since all IAM data
is encrypted with it.

Backward compatibility:

This commit should not cause existing deployments to break. It only
changes the root credentials of deployments that have a KMS configured
(KES, not a static key) but have not set any admin credentials. Such
implementations should be rare or not exist at all.

Even if the worst case would be updating root credentials in mc
or other clients used to administer the cluster. Root credentials
are anyway not intended for regular S3 operations.

Signed-off-by: Andreas Auernhammer <github@aead.dev>
2024-03-01 13:09:42 -08:00
Anis Eleuch
8f03c6e0db
xl: Avoid called getdents for folders in listing (#19100) 2024-03-01 08:01:28 -08:00
Harshavardhana
2c2f5d871c
debug: introduce support for configuring client connect WRITE deadline (#19170)
just like client-conn-read-deadline, added a new flag that does
client-conn-write-deadline as well.

Both are not configured by default, since we do not yet know
what is the right value. Allow this to be configurable if needed.
2024-03-01 08:00:42 -08:00
Harshavardhana
c599c11e70
fix: relax metadata checks for healing (#19165)
we should do this to ensure that we focus on
data healing as primary focus, fixing metadata
as part of healing must be done but making
data available is the main focus.

the main reason is metadata inconsistencies can
cause data availability issues, which must be
avoided at all cost.

will be bringing in an additional healing mechanism
that involves "metadata-only" heal, for now we do
not expect to have these checks.

continuation of #19154

Bonus: add a pro-active healthcheck to perform a connection
2024-02-29 22:49:01 -08:00
Aditya Manthramurthy
6769d4dd54
Update API label names for metrics (#19162)
This change makes the label names consistent with the handler names.
This is in preparation to use reflection based API handler function
names for the api labels so they will be the same as tracing, auditing
and logging names for these API calls.
2024-02-29 16:14:27 -08:00
Harshavardhana
d7520f0ae6
fix: make sure maintenance=true is honored properly (#19156)
fixes a regression from #18700
2024-02-29 08:37:57 -08:00
Harshavardhana
44b70eb646
allow creating missing parent folders during moveToTrash() (#19155) 2024-02-29 08:28:33 -08:00
Harshavardhana
467714f33b
ignore x-amz-storage-class when its set to STANDARD (#19154)
fixes #19135
2024-02-28 17:44:30 -08:00
Harshavardhana
f8696cc8f6 fallback to globalLocalDrives for non-distributed setups 2024-02-28 14:56:08 -08:00
Anis Eleuch
9a7c7ab2d0
fix: parsing v2 and v1 cgroup memory limit (#19153)
Trim the newline at the end of the sysfs memory limit.
2024-02-28 14:52:20 -08:00
Harshavardhana
51874a5776
fix: allow DNS disconnection events to happen in k8s (#19145)
in k8s things really do come online very asynchronously,
we need to use implementation that allows this randomness.

To facilitate this move WriteAll() as part of the
websocket layer instead.

Bonus: avoid instances of dnscache usage on k8s
2024-02-28 09:54:52 -08:00
Aditya Manthramurthy
62ce52c8fd
cachevalue: simplify exported interface (#19137)
- Also add cache options type
2024-02-28 09:09:09 -08:00
Anis Eleuch
2bdb9511bd
heal: Add skipped objects to the heal summary (#19142)
New disk healing code skips/expires objects that ILM supposed to expire.
Add more visibility to the user about this activity by calculating those
objects and print it at the end of healing activity.
2024-02-28 09:05:40 -08:00
Harshavardhana
9a012a53ef
initialize the disk healer early on (#19143)
This PR fixes a bug that perhaps has been long introduced,
with no visible workarounds. In any deployment, if an entire
erasure set is deleted, there is no way the cluster recovers.
2024-02-27 23:02:14 -08:00
Harshavardhana
1dd8ef09a6
remove unnecessary 'recreate' code (#19136) 2024-02-27 01:47:58 -08:00
Poorna
b1351e2dee sr: use site replicator svcacct to sign STS session tokens (#19111)
This change is to decouple need for root credentials to match between
 site replication deployments.

 Also ensuring site replication config initialization is re-tried until
 it succeeds, this deoendency is critical to STS flow in site replication
 scenario.
2024-02-26 13:30:28 -08:00
Praveen raj Mani
30c2596512
Read drive IO stats from sysfs instead of procfs (#19131)
Currently, we read from `/proc/diskstats` which is found to be
un-reliable in k8s environments. We can read from `sysfs` instead.

Also, cache the latest drive io stats to find the diff and update
the metrics.
2024-02-26 11:34:50 -08:00
Klaus Post
2b5e4b853c
Improve caching (#19130)
* Remove lock for cached operations.
* Rename "Relax" to `ReturnLastGood`.
* Add `CacheError` to allow caching values even on errors.
* Add NoWait that will return current value with async fetching if within 2xTTL.
* Make benchmark somewhat representative.

```
Before: BenchmarkCache-12       16408370                63.12 ns/op            0 B/op
After:  BenchmarkCache-12       428282187                2.789 ns/op           0 B/op
```

* Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning.
* Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics.
* Do async fetching of usage caches.
2024-02-26 10:49:19 -08:00
Harshavardhana
92788e4cf4
fix: re-arrange console-sys to log properly in k8s/docker (#19129)
fixes #19125
2024-02-26 01:33:48 -08:00
Harshavardhana
8a698fef71
fix: crash in ResourceMetrics RPC handling concurrent writers (#19123)
Continuation of #19103 that had fixed the crash in peer metrics for cluster endpoint.
2024-02-25 00:51:38 -08:00
Harshavardhana
c2b54d92f6
allow all disk full errors to be handled (#19117) 2024-02-24 09:11:14 -08:00
Harshavardhana
f965434022
fix: re-use endpoint strings to avoid allocation during audit (#19116) 2024-02-23 16:19:13 -08:00
Harshavardhana
a3ac62596c
move timedValue -> cachevalue package (#19114) 2024-02-23 13:28:14 -08:00
Harshavardhana
2faba02d6b
fix: allow diskInfo at storageRPC to be cached (#19112)
Bonus: convert timedValue into a typed implementation
2024-02-23 09:21:38 -08:00
Krishnan Parthasarathi
ee158e1610
ilm: Update action count only on success (#19093)
It also fixes a long-standing bug in expiring transitioned objects.
The expiration action was deleting the current version in the case'
of tiered objects instead of adding a delete marker.
2024-02-22 15:00:32 -08:00
Anis Eleuch
fa68efb1e7
s3: CopyObject to disallow invalid dest object names (#19110)
By not doing so, objects can risk being in a wrong erasure set if the
destination object name contains e.g. '//'
2024-02-22 10:05:17 -08:00
Anis Eleuch
8c53a4405a
Add audit for folder excess (#19109)
Also replace ilm:expiry with scanner to avoid user confusion
2024-02-22 08:18:13 -08:00
Harshavardhana
c32f699105
turn-off md5sum for SSE-KMS/SSE-C as optimization for multipart (#19106)
only enable md5sum if explicitly asked by the client, otherwise
its not necessary to compute md5sum when SSE-KMS/SSE-C is enabled.

this is continuation of #17958
2024-02-22 04:24:11 -08:00
Harshavardhana
53aa8f5650
use typos instead of codespell (#19088) 2024-02-21 22:26:06 -08:00
Klaus Post
92180bc793
Add array recycling safety (#19103)
Nil entries when recycling arrays.
2024-02-21 12:27:35 -08:00
Poorna
526b829a09
site replication: Disallow removal of site-replicator account (#19092) 2024-02-21 02:09:33 -08:00
Anis Eleuch
9ea5d08ecd
site-repl: Fix endpoint in the error with unexpected deployment-id (#19086) 2024-02-20 15:02:35 -08:00
Harshavardhana
35deb1a8e2
do not block on send channels under high load (#19090)
all send channels must compete with `ctx` if not
they will perpetually stay alive.
2024-02-20 15:00:35 -08:00
Harshavardhana
c7f7c47388
allow renames() for inlined writes without data-dir (#18801)
data-dir not being present is okay, however we can still
rely on the `rename()` atomic call instead of relying on
write xl.meta write which may truncate the io.EOF.
2024-02-20 07:05:57 -08:00
Klaus Post
e06168596f
Convert more peer <--> peer REST calls (#19004)
* Convert more peer <--> peer REST calls
* Clean up in general.
* Add JSON wrapper.
* Add slice wrapper.
* Add option to make handler return nil error if no connection is given, `IgnoreNilConn`.

Converts the following:

```
+	HandlerGetMetrics
+	HandlerGetResourceMetrics
+	HandlerGetMemInfo
+	HandlerGetProcInfo
+	HandlerGetOSInfo
+	HandlerGetPartitions
+	HandlerGetNetInfo
+	HandlerGetCPUs
+	HandlerServerInfo
+	HandlerGetSysConfig
+	HandlerGetSysServices
+	HandlerGetSysErrors
+	HandlerGetAllBucketStats
+	HandlerGetBucketStats
+	HandlerGetSRMetrics
+	HandlerGetPeerMetrics
+	HandlerGetMetacacheListing
+	HandlerUpdateMetacacheListing
+	HandlerGetPeerBucketMetrics
+	HandlerStorageInfo
+	HandlerGetLocks
+	HandlerBackgroundHealStatus
+	HandlerGetLastDayTierStats
+	HandlerSignalService
+	HandlerGetBandwidth
```
2024-02-19 14:54:46 -08:00
Harshavardhana
4c8197a119
reject expired STS credentials early without decoding sessionToken (#19072) 2024-02-19 07:34:10 -08:00
Harshavardhana
b6e98aed01
fix: found races in accessing globalLocalDrives (#19069)
make a copy before accessing globalLocalDrives

Bonus: update console v0.46.0

Signed-off-by: Harshavardhana <harsha@minio.io>
2024-02-16 17:15:57 -08:00
Anis Eleuch
00dcba9ddd
Fix typo in jwt skewed date/time error (#19066) 2024-02-16 10:48:30 -08:00
Harshavardhana
607cafadbc
converge clusterRead health into cluster health (#19063) 2024-02-15 16:48:36 -08:00
Anis Eleuch
68dde2359f
log: Add logger.Event to send to console and other logger targets (#19060)
Add a new function logger.Event() to send the log to Console and
http/kafka log webhooks. This will include some internal events such as
disk healing and rebalance/decommissioning
2024-02-15 15:13:30 -08:00
Poorna
f9dbf41e27
sr: add validation to disallow updating bandwidth limit on self (#19062) 2024-02-15 13:03:40 -08:00
Krishnan Parthasarathi
7405760f44
Refresh tier config periodically (#19049)
- Increase the parity for tier-config.bin object
- Refresh globalTierConfigMgr cached value once every 15 mins
2024-02-15 11:52:44 -08:00
Harshavardhana
7e4a6b4bcd
remove rename2 entirely, avoids the risk of moving data (#19058) 2024-02-14 17:09:38 -08:00
Harshavardhana
f961ec4aaf
fix: revert allow offline disks on fresh start (#19052)
the PR in #16541 was incorrect and hand wrong assumptions
about the overall setup, revert this since this expectation
to have offline servers is wrong and we can end up with a
bigger chicken and egg problem.

This reverts commit 5996c8c4d5.

Bonus:

- preserve disk in globalLocalDrives properly upon connectDisks()
- do not return 'nil' from newXLStorage(), getting it ready for
  the next set of changes for 'format.json' loading.
2024-02-14 10:37:34 -08:00
Harshavardhana
134db72bb7
fix: reject service account access key same as root credentials (#19055) 2024-02-14 10:37:12 -08:00
Harshavardhana
effe21f3eb
send correct objectname in audit events for DeleteAll ILM (#19053) 2024-02-14 08:07:58 -08:00
Praveen raj Mani
1118b285d3
fix: race in deleting objects during batch expiry (#19054) 2024-02-14 08:07:44 -08:00
Aditya Manthramurthy
a14e192376
fix: remove unnecessary panic in iam-store (#19050) 2024-02-13 19:29:36 -08:00
Minio Trusted
f8e15e7d09 Update yaml files to latest version RELEASE.2024-02-13T15-35-11Z 2024-02-13 16:01:38 +00:00
Shireesh Anjal
7b9f9e0628
fix incorrect disk io stats in k8s environment (#19016)
The previous logic of calculating per second values for disk io stats
divides the stats by the host uptime. This doesn't work in k8s
environment as the uptime is of the pod, but the stats (from
/proc/diskstats) are from the host.

Fix this by storing the initial values of uptime and the stats at the
timme of server startup, and using the difference between current and
initial values when calculating the per second values.
2024-02-13 07:35:11 -08:00
Praveen raj Mani
ac8e9ce04f
Send a bucket notification event on DeleteObject() for non-existing object (#19037)
Send a bucket notification event on DeleteObject for non-existing objects
2024-02-13 07:34:17 -08:00
Praveen raj Mani
cfd8645843
fix: update batch replication stats for snowball uploads (#19045) 2024-02-13 07:33:27 -08:00
Harshavardhana
0c068b15c7
add missing handler for reloading site replication config on peers (#19042) 2024-02-13 06:55:54 -08:00
Anis Eleuch
30a466aa71
sts: Add test for DurationSeconds condition (#19044) 2024-02-13 06:55:37 -08:00
Taran Pelkey
4d94609c44
FIx unexpected behavior when creating service account (#19036) 2024-02-13 02:31:43 -08:00
Poorna
0cc9fb73e1
metrics: fix typo in namespace for proxy tagging metric (#19039)
Relevant PR introducing this metric: #18957
2024-02-12 13:02:27 -08:00
Harshavardhana
eac4e4b279
honor replaced disk properly by updating globalLocalDrives (#19038)
globalLocalDrives seem to be not updated during the
HealFormat() leads to a requirement where the server
needs to be restarted for the healing to continue.
2024-02-12 13:00:20 -08:00
Harshavardhana
6d381f7c0a
relax pre-emptive GetBucketInfo() for multi-object delete (#19035) 2024-02-12 08:46:46 -08:00
Anis Eleuch
4fa06aefc6
Convert service account add/update expiration to cond values (#19024)
In order to force some users allowed to create or update a service
account to provide an expiration satifying the user policy conditions.
2024-02-12 08:36:16 -08:00
Harshavardhana
0e177a44e0
preserve conflicting objects when parent object is being deleted (#19034)
a/prefix
a/prefix/1.txt

where `a/prefix` is an object which does not have `/` at the end,
we do not have to aggressively recursively delete all the sub-folders
as well. Instead convert the call into self contained to deleting
'xl.meta' and then subsequently attempting to Remove the parent.
2024-02-12 08:30:40 -08:00
Harshavardhana
afd19de5a9
fix: allow configuring excess versions alerting (#19028)
Bonus: enable audit alerts for object versions
beyond the configured value, default is '100'
versions per object beyond which scanner will
alert for each such objects.
2024-02-11 23:41:53 -08:00
Harshavardhana
e3fbac9e24
do not have to use the same distributionAlgo as first pool (#19031)
when we expand via pools, there is no reason to stick
with the same distributionAlgo as the rest. Since the
algo only makes sense with-in a pool not across pools.

This allows for newer pools to use newer codepaths to
avoid legacy file lookups when they have a pre-existing
deployment from 2019, they can expand their new pool
to be of a newer distribution format, allowing the
pool to be more performant.
2024-02-11 23:21:56 -08:00
Poorna
a9cf32811c
Fix panic in tagging request proxying (#19032) 2024-02-11 18:18:43 -08:00
Harshavardhana
53997ecc79
avoid excessive logging for objects that do not exist (#19030)
in replicated setups, that have proxying enabled for
replicated buckets.
2024-02-11 14:21:08 -08:00
Harshavardhana
997ba3a574
introduce reader deadlines for net.Conn (#19023)
Bonus: set "retry-after" header for AWS SDKs if possible to honor them.
2024-02-09 13:25:16 -08:00
Harshavardhana
62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Harshavardhana
404d8b3084
fix: dangling objects honor parityBlocks instead of dataBlocks (#19019)
Bonus: do not recreate buckets if NoRecreate is asked.
2024-02-08 15:22:16 -08:00
Klaus Post
6005ad3d48
Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Harshavardhana
035a3ea4ae
optimize startup sequence performance (#19009)
- bucket metadata does not need to look for legacy things
  anymore if b.Created is non-zero

- stagger bucket metadata loads across lots of nodes to
  avoid the current thundering herd problem.

- Remove deadlines for RenameData, RenameFile - these
  calls should not ever be timed out and should wait
  until completion or wait for client timeout. Do not
  choose timeouts for applications during the WRITE phase.

- increase R/W buffer size, increase maxMergeMessages to 30
2024-02-08 11:21:21 -08:00
Aditya Manthramurthy
e104b183d8
fix: skip policy usage validation for cache update (#19008)
When updating the policy cache, we do not need to validate policy usage
as the policy has already been deleted by the node sending the
notification.
2024-02-07 20:39:53 -08:00
Klaus Post
7e082f232e
Add GetBucketInfo toStorageErr conversion (#19005)
Convert error to storageError since it is used for quorum calculations here: ff80cfd83d/cmd/peer-s3-client.go (L339)
2024-02-07 14:24:24 -08:00
Harshavardhana
d28bf71f25
listing must return WalkDir() errors first (#19006) 2024-02-07 13:20:07 -08:00
Harshavardhana
5b1a74b6b2
do not block iam.store registration (#18999)
current implementation would quite simply
block the sys.store registration, making
sys.Initialized() call to be blocked.
2024-02-07 12:41:58 -08:00
Klaus Post
ebc6c9b498
Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana
630963fa6b
protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana
f674168b8b
Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Poorna
27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana
794a7993cb
calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana
6f16d1cb2c
do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch
7aa00bff89
sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post
e046eb1d17
Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch
ba975ca320
Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00