Commit Graph

5912 Commits

Author SHA1 Message Date
Harshavardhana 467714f33b
ignore x-amz-storage-class when its set to STANDARD (#19154)
fixes #19135
2024-02-28 17:44:30 -08:00
Harshavardhana f8696cc8f6 fallback to globalLocalDrives for non-distributed setups 2024-02-28 14:56:08 -08:00
Anis Eleuch 9a7c7ab2d0
fix: parsing v2 and v1 cgroup memory limit (#19153)
Trim the newline at the end of the sysfs memory limit.
2024-02-28 14:52:20 -08:00
Harshavardhana 51874a5776
fix: allow DNS disconnection events to happen in k8s (#19145)
in k8s things really do come online very asynchronously,
we need to use implementation that allows this randomness.

To facilitate this move WriteAll() as part of the
websocket layer instead.

Bonus: avoid instances of dnscache usage on k8s
2024-02-28 09:54:52 -08:00
Aditya Manthramurthy 62ce52c8fd
cachevalue: simplify exported interface (#19137)
- Also add cache options type
2024-02-28 09:09:09 -08:00
Anis Eleuch 2bdb9511bd
heal: Add skipped objects to the heal summary (#19142)
New disk healing code skips/expires objects that ILM supposed to expire.
Add more visibility to the user about this activity by calculating those
objects and print it at the end of healing activity.
2024-02-28 09:05:40 -08:00
Harshavardhana 9a012a53ef
initialize the disk healer early on (#19143)
This PR fixes a bug that perhaps has been long introduced,
with no visible workarounds. In any deployment, if an entire
erasure set is deleted, there is no way the cluster recovers.
2024-02-27 23:02:14 -08:00
Harshavardhana 1dd8ef09a6
remove unnecessary 'recreate' code (#19136) 2024-02-27 01:47:58 -08:00
Poorna b1351e2dee sr: use site replicator svcacct to sign STS session tokens (#19111)
This change is to decouple need for root credentials to match between
 site replication deployments.

 Also ensuring site replication config initialization is re-tried until
 it succeeds, this deoendency is critical to STS flow in site replication
 scenario.
2024-02-26 13:30:28 -08:00
Praveen raj Mani 30c2596512
Read drive IO stats from sysfs instead of procfs (#19131)
Currently, we read from `/proc/diskstats` which is found to be
un-reliable in k8s environments. We can read from `sysfs` instead.

Also, cache the latest drive io stats to find the diff and update
the metrics.
2024-02-26 11:34:50 -08:00
Klaus Post 2b5e4b853c
Improve caching (#19130)
* Remove lock for cached operations.
* Rename "Relax" to `ReturnLastGood`.
* Add `CacheError` to allow caching values even on errors.
* Add NoWait that will return current value with async fetching if within 2xTTL.
* Make benchmark somewhat representative.

```
Before: BenchmarkCache-12       16408370                63.12 ns/op            0 B/op
After:  BenchmarkCache-12       428282187                2.789 ns/op           0 B/op
```

* Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning.
* Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics.
* Do async fetching of usage caches.
2024-02-26 10:49:19 -08:00
Harshavardhana 92788e4cf4
fix: re-arrange console-sys to log properly in k8s/docker (#19129)
fixes #19125
2024-02-26 01:33:48 -08:00
Harshavardhana 8a698fef71
fix: crash in ResourceMetrics RPC handling concurrent writers (#19123)
Continuation of #19103 that had fixed the crash in peer metrics for cluster endpoint.
2024-02-25 00:51:38 -08:00
Harshavardhana c2b54d92f6
allow all disk full errors to be handled (#19117) 2024-02-24 09:11:14 -08:00
Harshavardhana f965434022
fix: re-use endpoint strings to avoid allocation during audit (#19116) 2024-02-23 16:19:13 -08:00
Harshavardhana a3ac62596c
move timedValue -> cachevalue package (#19114) 2024-02-23 13:28:14 -08:00
Harshavardhana 2faba02d6b
fix: allow diskInfo at storageRPC to be cached (#19112)
Bonus: convert timedValue into a typed implementation
2024-02-23 09:21:38 -08:00
Krishnan Parthasarathi ee158e1610
ilm: Update action count only on success (#19093)
It also fixes a long-standing bug in expiring transitioned objects.
The expiration action was deleting the current version in the case'
of tiered objects instead of adding a delete marker.
2024-02-22 15:00:32 -08:00
Anis Eleuch fa68efb1e7
s3: CopyObject to disallow invalid dest object names (#19110)
By not doing so, objects can risk being in a wrong erasure set if the
destination object name contains e.g. '//'
2024-02-22 10:05:17 -08:00
Anis Eleuch 8c53a4405a
Add audit for folder excess (#19109)
Also replace ilm:expiry with scanner to avoid user confusion
2024-02-22 08:18:13 -08:00
Harshavardhana c32f699105
turn-off md5sum for SSE-KMS/SSE-C as optimization for multipart (#19106)
only enable md5sum if explicitly asked by the client, otherwise
its not necessary to compute md5sum when SSE-KMS/SSE-C is enabled.

this is continuation of #17958
2024-02-22 04:24:11 -08:00
Harshavardhana 53aa8f5650
use typos instead of codespell (#19088) 2024-02-21 22:26:06 -08:00
Klaus Post 92180bc793
Add array recycling safety (#19103)
Nil entries when recycling arrays.
2024-02-21 12:27:35 -08:00
Poorna 526b829a09
site replication: Disallow removal of site-replicator account (#19092) 2024-02-21 02:09:33 -08:00
Anis Eleuch 9ea5d08ecd
site-repl: Fix endpoint in the error with unexpected deployment-id (#19086) 2024-02-20 15:02:35 -08:00
Harshavardhana 35deb1a8e2
do not block on send channels under high load (#19090)
all send channels must compete with `ctx` if not
they will perpetually stay alive.
2024-02-20 15:00:35 -08:00
Harshavardhana c7f7c47388
allow renames() for inlined writes without data-dir (#18801)
data-dir not being present is okay, however we can still
rely on the `rename()` atomic call instead of relying on
write xl.meta write which may truncate the io.EOF.
2024-02-20 07:05:57 -08:00
Klaus Post e06168596f
Convert more peer <--> peer REST calls (#19004)
* Convert more peer <--> peer REST calls
* Clean up in general.
* Add JSON wrapper.
* Add slice wrapper.
* Add option to make handler return nil error if no connection is given, `IgnoreNilConn`.

Converts the following:

```
+	HandlerGetMetrics
+	HandlerGetResourceMetrics
+	HandlerGetMemInfo
+	HandlerGetProcInfo
+	HandlerGetOSInfo
+	HandlerGetPartitions
+	HandlerGetNetInfo
+	HandlerGetCPUs
+	HandlerServerInfo
+	HandlerGetSysConfig
+	HandlerGetSysServices
+	HandlerGetSysErrors
+	HandlerGetAllBucketStats
+	HandlerGetBucketStats
+	HandlerGetSRMetrics
+	HandlerGetPeerMetrics
+	HandlerGetMetacacheListing
+	HandlerUpdateMetacacheListing
+	HandlerGetPeerBucketMetrics
+	HandlerStorageInfo
+	HandlerGetLocks
+	HandlerBackgroundHealStatus
+	HandlerGetLastDayTierStats
+	HandlerSignalService
+	HandlerGetBandwidth
```
2024-02-19 14:54:46 -08:00
Harshavardhana 4c8197a119
reject expired STS credentials early without decoding sessionToken (#19072) 2024-02-19 07:34:10 -08:00
Harshavardhana b6e98aed01
fix: found races in accessing globalLocalDrives (#19069)
make a copy before accessing globalLocalDrives

Bonus: update console v0.46.0

Signed-off-by: Harshavardhana <harsha@minio.io>
2024-02-16 17:15:57 -08:00
Anis Eleuch 00dcba9ddd
Fix typo in jwt skewed date/time error (#19066) 2024-02-16 10:48:30 -08:00
Harshavardhana 607cafadbc
converge clusterRead health into cluster health (#19063) 2024-02-15 16:48:36 -08:00
Anis Eleuch 68dde2359f
log: Add logger.Event to send to console and other logger targets (#19060)
Add a new function logger.Event() to send the log to Console and
http/kafka log webhooks. This will include some internal events such as
disk healing and rebalance/decommissioning
2024-02-15 15:13:30 -08:00
Poorna f9dbf41e27
sr: add validation to disallow updating bandwidth limit on self (#19062) 2024-02-15 13:03:40 -08:00
Krishnan Parthasarathi 7405760f44
Refresh tier config periodically (#19049)
- Increase the parity for tier-config.bin object
- Refresh globalTierConfigMgr cached value once every 15 mins
2024-02-15 11:52:44 -08:00
Harshavardhana 7e4a6b4bcd
remove rename2 entirely, avoids the risk of moving data (#19058) 2024-02-14 17:09:38 -08:00
Harshavardhana f961ec4aaf
fix: revert allow offline disks on fresh start (#19052)
the PR in #16541 was incorrect and hand wrong assumptions
about the overall setup, revert this since this expectation
to have offline servers is wrong and we can end up with a
bigger chicken and egg problem.

This reverts commit 5996c8c4d5.

Bonus:

- preserve disk in globalLocalDrives properly upon connectDisks()
- do not return 'nil' from newXLStorage(), getting it ready for
  the next set of changes for 'format.json' loading.
2024-02-14 10:37:34 -08:00
Harshavardhana 134db72bb7
fix: reject service account access key same as root credentials (#19055) 2024-02-14 10:37:12 -08:00
Harshavardhana effe21f3eb
send correct objectname in audit events for DeleteAll ILM (#19053) 2024-02-14 08:07:58 -08:00
Praveen raj Mani 1118b285d3
fix: race in deleting objects during batch expiry (#19054) 2024-02-14 08:07:44 -08:00
Aditya Manthramurthy a14e192376
fix: remove unnecessary panic in iam-store (#19050) 2024-02-13 19:29:36 -08:00
Minio Trusted f8e15e7d09 Update yaml files to latest version RELEASE.2024-02-13T15-35-11Z 2024-02-13 16:01:38 +00:00
Shireesh Anjal 7b9f9e0628
fix incorrect disk io stats in k8s environment (#19016)
The previous logic of calculating per second values for disk io stats
divides the stats by the host uptime. This doesn't work in k8s
environment as the uptime is of the pod, but the stats (from
/proc/diskstats) are from the host.

Fix this by storing the initial values of uptime and the stats at the
timme of server startup, and using the difference between current and
initial values when calculating the per second values.
2024-02-13 07:35:11 -08:00
Praveen raj Mani ac8e9ce04f
Send a bucket notification event on DeleteObject() for non-existing object (#19037)
Send a bucket notification event on DeleteObject for non-existing objects
2024-02-13 07:34:17 -08:00
Praveen raj Mani cfd8645843
fix: update batch replication stats for snowball uploads (#19045) 2024-02-13 07:33:27 -08:00
Harshavardhana 0c068b15c7
add missing handler for reloading site replication config on peers (#19042) 2024-02-13 06:55:54 -08:00
Anis Eleuch 30a466aa71
sts: Add test for DurationSeconds condition (#19044) 2024-02-13 06:55:37 -08:00
Taran Pelkey 4d94609c44
FIx unexpected behavior when creating service account (#19036) 2024-02-13 02:31:43 -08:00
Poorna 0cc9fb73e1
metrics: fix typo in namespace for proxy tagging metric (#19039)
Relevant PR introducing this metric: #18957
2024-02-12 13:02:27 -08:00
Harshavardhana eac4e4b279
honor replaced disk properly by updating globalLocalDrives (#19038)
globalLocalDrives seem to be not updated during the
HealFormat() leads to a requirement where the server
needs to be restarted for the healing to continue.
2024-02-12 13:00:20 -08:00
Harshavardhana 6d381f7c0a
relax pre-emptive GetBucketInfo() for multi-object delete (#19035) 2024-02-12 08:46:46 -08:00
Anis Eleuch 4fa06aefc6
Convert service account add/update expiration to cond values (#19024)
In order to force some users allowed to create or update a service
account to provide an expiration satifying the user policy conditions.
2024-02-12 08:36:16 -08:00
Harshavardhana 0e177a44e0
preserve conflicting objects when parent object is being deleted (#19034)
a/prefix
a/prefix/1.txt

where `a/prefix` is an object which does not have `/` at the end,
we do not have to aggressively recursively delete all the sub-folders
as well. Instead convert the call into self contained to deleting
'xl.meta' and then subsequently attempting to Remove the parent.
2024-02-12 08:30:40 -08:00
Harshavardhana afd19de5a9
fix: allow configuring excess versions alerting (#19028)
Bonus: enable audit alerts for object versions
beyond the configured value, default is '100'
versions per object beyond which scanner will
alert for each such objects.
2024-02-11 23:41:53 -08:00
Harshavardhana e3fbac9e24
do not have to use the same distributionAlgo as first pool (#19031)
when we expand via pools, there is no reason to stick
with the same distributionAlgo as the rest. Since the
algo only makes sense with-in a pool not across pools.

This allows for newer pools to use newer codepaths to
avoid legacy file lookups when they have a pre-existing
deployment from 2019, they can expand their new pool
to be of a newer distribution format, allowing the
pool to be more performant.
2024-02-11 23:21:56 -08:00
Poorna a9cf32811c
Fix panic in tagging request proxying (#19032) 2024-02-11 18:18:43 -08:00
Harshavardhana 53997ecc79
avoid excessive logging for objects that do not exist (#19030)
in replicated setups, that have proxying enabled for
replicated buckets.
2024-02-11 14:21:08 -08:00
Harshavardhana 997ba3a574
introduce reader deadlines for net.Conn (#19023)
Bonus: set "retry-after" header for AWS SDKs if possible to honor them.
2024-02-09 13:25:16 -08:00
Harshavardhana 62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Harshavardhana 404d8b3084
fix: dangling objects honor parityBlocks instead of dataBlocks (#19019)
Bonus: do not recreate buckets if NoRecreate is asked.
2024-02-08 15:22:16 -08:00
Klaus Post 6005ad3d48
Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Harshavardhana 035a3ea4ae
optimize startup sequence performance (#19009)
- bucket metadata does not need to look for legacy things
  anymore if b.Created is non-zero

- stagger bucket metadata loads across lots of nodes to
  avoid the current thundering herd problem.

- Remove deadlines for RenameData, RenameFile - these
  calls should not ever be timed out and should wait
  until completion or wait for client timeout. Do not
  choose timeouts for applications during the WRITE phase.

- increase R/W buffer size, increase maxMergeMessages to 30
2024-02-08 11:21:21 -08:00
Aditya Manthramurthy e104b183d8
fix: skip policy usage validation for cache update (#19008)
When updating the policy cache, we do not need to validate policy usage
as the policy has already been deleted by the node sending the
notification.
2024-02-07 20:39:53 -08:00
Klaus Post 7e082f232e
Add GetBucketInfo toStorageErr conversion (#19005)
Convert error to storageError since it is used for quorum calculations here: ff80cfd83d/cmd/peer-s3-client.go (L339)
2024-02-07 14:24:24 -08:00
Harshavardhana d28bf71f25
listing must return WalkDir() errors first (#19006) 2024-02-07 13:20:07 -08:00
Harshavardhana 5b1a74b6b2
do not block iam.store registration (#18999)
current implementation would quite simply
block the sys.store registration, making
sys.Initialized() call to be blocked.
2024-02-07 12:41:58 -08:00
Klaus Post ebc6c9b498
Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana 630963fa6b
protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana f674168b8b
Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Poorna 27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana 794a7993cb
calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana 6f16d1cb2c
do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch 7aa00bff89
sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post e046eb1d17
Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch ba975ca320
Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00
Harshavardhana fec13b0ec1
remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Harshavardhana 100c35c281
avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Harshavardhana f225ca3312
Add more advanced cases for dangling (#18968) 2024-02-04 14:36:13 -08:00
Frank Wessels 8b68e0bfdc
Fix typo in api-router.go (#18955) 2024-02-03 14:03:51 -08:00
Anis Eleuch 6ae97aedc9
xl: Disable rename2 in decommissioning/rebalance (#18964)
Always disable rename2 optimization in decom/rebalance
2024-02-03 14:03:30 -08:00
Harshavardhana 960d604013
disconnected returns, an unexpected error to List() returning 500s (#18959)
provide the error string appropriately so that the
matching of error types works.

Also add a string based fallback for the said error.
2024-02-03 01:04:33 -08:00
Harshavardhana ff80cfd83d
move Make,Delete,Head,Heal bucket calls to websockets (#18951) 2024-02-02 14:54:54 -08:00
Harshavardhana 99fde2ba85
deprecate disk tokens, instead rely on deadlines and active monitoring (#18947)
disk tokens usage is not necessary anymore with the implementation
of deadlines for storage calls and active monitoring of the drive
for I/O timeouts.

Functionality kicking off a bad drive is still supported, it's just that 
we do not have to serialize I/O in the manner tokens would do.
2024-02-02 10:10:54 -08:00
Frank Wessels 31743789dc
Fix some leftover issues from PR 18936 (#18946) 2024-02-01 19:42:56 -08:00
Anis Eleuch 6fd63e920a
log: Use error log type instead of Application/MinIO type (#18930)
* log: Use error log type instead of Application/MinIO type

Also bump github.com/shirou/gopsutil version to address cross
compilation issues.

* Apply suggestions from code review

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>

---------

Co-authored-by: Anis Eleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2024-02-01 16:13:57 -08:00
Aditya Manthramurthy 59cc3e93d6
fix: `null` inline policy handling for access keys (#18945)
Interpret `null` inline policy for access keys as inheriting parent
policy. Since MinIO Console currently sends this value, we need to honor it
for now. A larger fix in Console and in the server are required.

Fixes #18939.
2024-02-01 14:45:03 -08:00
Anis Eleuch 61a4bb38cd
batch: Fix a typo while validating smallerThan field (#18942) 2024-02-01 13:53:26 -08:00
Klaus Post b192bc348c
Improve object reuse for grid messages (#18940)
Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers.

This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool.

Add internal request reuse of internal types. Add for safe (pointerless) types explicitly.

Only log params for internal types. Doing Sprint(obj) is just a bit too messy.
2024-02-01 12:41:20 -08:00
Harshavardhana 6440d0fbf3
move a collection of peer APIs to websockets (#18936) 2024-02-01 10:47:20 -08:00
Anis Eleuch 24ecc44bac
Keep ServiceV1 admin stop/restart API and mark as deprecated (#18932) 2024-01-31 12:20:33 -08:00
Aditya Manthramurthy 0ae4915a93
fix: permission checks for editing access keys (#18928)
With this change, only a user with `UpdateServiceAccountAdminAction`
permission is able to edit access keys.

We would like to let a user edit their own access keys, however the
feature needs to be re-designed for better security and integration with
external systems like AD/LDAP and OpenID.

This change prevents privilege escalation via service accounts.
2024-01-31 10:56:45 -08:00
Harshavardhana caac9d216e
remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00
Harshavardhana 057192913c
add total usable capacity, free and used to DataUsageInfo() (#18921) 2024-01-30 17:49:37 -08:00
Harshavardhana f25cbdf43c
use all the available nr_requests for NVMe (#18920) 2024-01-30 14:10:06 -08:00
Klaus Post 6da4a9c7bb
Improve tracing & notification scalability (#18903)
* Perform JSON encoding on remote machines and only forward byte slices.
* Migrate tracing & notification to WebSockets.
2024-01-30 12:49:02 -08:00
Harshavardhana 80ca120088
remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Anis Eleuch a669946357
Add cgroup v2 support for memory limit (#18905) 2024-01-30 11:13:27 -08:00
Poorna 7ffc162ea8
exclude veeam virtual objects from replication (#18918)
Fixes: #18916
2024-01-30 10:43:58 -08:00
Poorna bcfd7fbbcf
reuse transports for callhome and remote tgt validation (#18912) 2024-01-29 23:05:39 -08:00
Harshavardhana 486e2e48ea
enable xattr capture by default (#18911)
- healing must not set the write xattr
  because that is the job of active healing
  to update. what we need to preserve is
  permanent deletes.

- remove older env for drive monitoring and
  enable it accordingly, as a global value.
2024-01-29 23:03:58 -08:00