Commit Graph

11602 Commits

Author SHA1 Message Date
Anis Eleuch
828d4df6f0 debug: Add --search to print only specific goroutines (#19158)
Easier to filter goroutines belonging to a specific subsystem
2024-02-29 08:28:18 -08:00
Harshavardhana
467714f33b ignore x-amz-storage-class when its set to STANDARD (#19154)
fixes #19135
2024-02-28 17:44:30 -08:00
Harshavardhana
f8696cc8f6 fallback to globalLocalDrives for non-distributed setups 2024-02-28 14:56:08 -08:00
Anis Eleuch
9a7c7ab2d0 fix: parsing v2 and v1 cgroup memory limit (#19153)
Trim the newline at the end of the sysfs memory limit.
2024-02-28 14:52:20 -08:00
Klaus Post
40fb3371fa Mux: Send async mux ack and fix stream error responses (#19149)
Streams can return errors if the cancelation is picked up before the response 
stream close is picked up. Under extreme load, this could lead to missing 
responses.

Send server mux ack async so a blocked send cannot block newMuxStream 
call. Stream will not progress until mux has been acked.
2024-02-28 10:05:18 -08:00
Harshavardhana
51874a5776 fix: allow DNS disconnection events to happen in k8s (#19145)
in k8s things really do come online very asynchronously,
we need to use implementation that allows this randomness.

To facilitate this move WriteAll() as part of the
websocket layer instead.

Bonus: avoid instances of dnscache usage on k8s
2024-02-28 09:54:52 -08:00
Aditya Manthramurthy
62ce52c8fd cachevalue: simplify exported interface (#19137)
- Also add cache options type
2024-02-28 09:09:09 -08:00
Anis Eleuch
2bdb9511bd heal: Add skipped objects to the heal summary (#19142)
New disk healing code skips/expires objects that ILM supposed to expire.
Add more visibility to the user about this activity by calculating those
objects and print it at the end of healing activity.
2024-02-28 09:05:40 -08:00
Harshavardhana
9a012a53ef initialize the disk healer early on (#19143)
This PR fixes a bug that perhaps has been long introduced,
with no visible workarounds. In any deployment, if an entire
erasure set is deleted, there is no way the cluster recovers.
2024-02-27 23:02:14 -08:00
jiuker
0aae0180fb feat: add userCredentials for nats (#19139) 2024-02-27 10:11:55 -08:00
Harshavardhana
1dd8ef09a6 remove unnecessary 'recreate' code (#19136) 2024-02-27 01:47:58 -08:00
Anis Eleuch
95032e4710 ilm: Select an object when all AND tags are satisfied (#19134)
Currently, if one object tag matches with one lifecycle tag filter, ILM
will select it, however, this is wrong. All the Tag filters in the
lifecycle document should be satisfied.
2024-02-26 16:01:20 -08:00
Poorna
b1351e2dee sr: use site replicator svcacct to sign STS session tokens (#19111)
This change is to decouple need for root credentials to match between
 site replication deployments.

 Also ensuring site replication config initialization is re-tried until
 it succeeds, this deoendency is critical to STS flow in site replication
 scenario.
2024-02-26 13:30:28 -08:00
Praveen raj Mani
30c2596512 Read drive IO stats from sysfs instead of procfs (#19131)
Currently, we read from `/proc/diskstats` which is found to be
un-reliable in k8s environments. We can read from `sysfs` instead.

Also, cache the latest drive io stats to find the diff and update
the metrics.
2024-02-26 11:34:50 -08:00
Klaus Post
2b5e4b853c Improve caching (#19130)
* Remove lock for cached operations.
* Rename "Relax" to `ReturnLastGood`.
* Add `CacheError` to allow caching values even on errors.
* Add NoWait that will return current value with async fetching if within 2xTTL.
* Make benchmark somewhat representative.

```
Before: BenchmarkCache-12       16408370                63.12 ns/op            0 B/op
After:  BenchmarkCache-12       428282187                2.789 ns/op           0 B/op
```

* Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning.
* Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics.
* Do async fetching of usage caches.
2024-02-26 10:49:19 -08:00
Minio Trusted
85bcb5874a Update yaml files to latest version RELEASE.2024-02-26T09-33-48Z 2024-02-26 10:28:02 +00:00
Harshavardhana
92788e4cf4 fix: re-arrange console-sys to log properly in k8s/docker (#19129)
fixes #19125
RELEASE.2024-02-26T09-33-48Z
2024-02-26 01:33:48 -08:00
Harshavardhana
8a698fef71 fix: crash in ResourceMetrics RPC handling concurrent writers (#19123)
Continuation of #19103 that had fixed the crash in peer metrics for cluster endpoint.
2024-02-25 00:51:38 -08:00
Minio Trusted
b49ce1713f Update yaml files to latest version RELEASE.2024-02-24T17-11-14Z 2024-02-25 05:23:21 +00:00
Harshavardhana
c2b54d92f6 allow all disk full errors to be handled (#19117) RELEASE.2024-02-24T17-11-14Z 2024-02-24 09:11:14 -08:00
Harshavardhana
f965434022 fix: re-use endpoint strings to avoid allocation during audit (#19116) 2024-02-23 16:19:13 -08:00
Harshavardhana
a3ac62596c move timedValue -> cachevalue package (#19114) 2024-02-23 13:28:14 -08:00
Harshavardhana
2faba02d6b fix: allow diskInfo at storageRPC to be cached (#19112)
Bonus: convert timedValue into a typed implementation
2024-02-23 09:21:38 -08:00
Krishnan Parthasarathi
ee158e1610 ilm: Update action count only on success (#19093)
It also fixes a long-standing bug in expiring transitioned objects.
The expiration action was deleting the current version in the case'
of tiered objects instead of adding a delete marker.
2024-02-22 15:00:32 -08:00
Anis Eleuch
fa68efb1e7 s3: CopyObject to disallow invalid dest object names (#19110)
By not doing so, objects can risk being in a wrong erasure set if the
destination object name contains e.g. '//'
2024-02-22 10:05:17 -08:00
Anis Eleuch
8c53a4405a Add audit for folder excess (#19109)
Also replace ilm:expiry with scanner to avoid user confusion
2024-02-22 08:18:13 -08:00
Harshavardhana
c32f699105 turn-off md5sum for SSE-KMS/SSE-C as optimization for multipart (#19106)
only enable md5sum if explicitly asked by the client, otherwise
its not necessary to compute md5sum when SSE-KMS/SSE-C is enabled.

this is continuation of #17958
2024-02-22 04:24:11 -08:00
Harshavardhana
53aa8f5650 use typos instead of codespell (#19088) 2024-02-21 22:26:06 -08:00
Shubhendu
56887f3208 Add DeleteAll with expiry days non zero value only (#19095)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-02-21 12:28:34 -08:00
Klaus Post
92180bc793 Add array recycling safety (#19103)
Nil entries when recycling arrays.
2024-02-21 12:27:35 -08:00
Klaus Post
22aa16ab12 Fix grid reconnection deadlock (#19101)
If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang.

A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled.

B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.
2024-02-21 07:49:34 -08:00
Poorna
526b829a09 site replication: Disallow removal of site-replicator account (#19092) 2024-02-21 02:09:33 -08:00
schmittey
c44f311c4f Add missing yaml syntax highlighting in prometheus README.md (#19087) 2024-02-20 16:22:37 -08:00
Anis Eleuch
9ea5d08ecd site-repl: Fix endpoint in the error with unexpected deployment-id (#19086) 2024-02-20 15:02:35 -08:00
Harshavardhana
35deb1a8e2 do not block on send channels under high load (#19090)
all send channels must compete with `ctx` if not
they will perpetually stay alive.
2024-02-20 15:00:35 -08:00
Harshavardhana
c7f7c47388 allow renames() for inlined writes without data-dir (#18801)
data-dir not being present is okay, however we can still
rely on the `rename()` atomic call instead of relying on
write xl.meta write which may truncate the io.EOF.
2024-02-20 07:05:57 -08:00
Shubhendu
cb7dab17cb Graph cluster and bucket replication proxied requests (#19078)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-02-20 01:45:00 -08:00
Harshavardhana
cd419a35fe simplify broker healthcheck by following kafka guidelines (#19082)
fixes #19081
2024-02-20 00:16:35 -08:00
Klaus Post
e06168596f Convert more peer <--> peer REST calls (#19004)
* Convert more peer <--> peer REST calls
* Clean up in general.
* Add JSON wrapper.
* Add slice wrapper.
* Add option to make handler return nil error if no connection is given, `IgnoreNilConn`.

Converts the following:

```
+	HandlerGetMetrics
+	HandlerGetResourceMetrics
+	HandlerGetMemInfo
+	HandlerGetProcInfo
+	HandlerGetOSInfo
+	HandlerGetPartitions
+	HandlerGetNetInfo
+	HandlerGetCPUs
+	HandlerServerInfo
+	HandlerGetSysConfig
+	HandlerGetSysServices
+	HandlerGetSysErrors
+	HandlerGetAllBucketStats
+	HandlerGetBucketStats
+	HandlerGetSRMetrics
+	HandlerGetPeerMetrics
+	HandlerGetMetacacheListing
+	HandlerUpdateMetacacheListing
+	HandlerGetPeerBucketMetrics
+	HandlerStorageInfo
+	HandlerGetLocks
+	HandlerBackgroundHealStatus
+	HandlerGetLastDayTierStats
+	HandlerSignalService
+	HandlerGetBandwidth
```
2024-02-19 14:54:46 -08:00
Harshavardhana
4c8197a119 reject expired STS credentials early without decoding sessionToken (#19072) 2024-02-19 07:34:10 -08:00
Minio Trusted
23c10350f3 Update yaml files to latest version RELEASE.2024-02-17T01-15-57Z 2024-02-17 01:37:10 +00:00
Harshavardhana
b6e98aed01 fix: found races in accessing globalLocalDrives (#19069)
make a copy before accessing globalLocalDrives

Bonus: update console v0.46.0

Signed-off-by: Harshavardhana <harsha@minio.io>
RELEASE.2024-02-17T01-15-57Z
2024-02-16 17:15:57 -08:00
Anis Eleuch
00dcba9ddd Fix typo in jwt skewed date/time error (#19066) 2024-02-16 10:48:30 -08:00
Harshavardhana
607cafadbc converge clusterRead health into cluster health (#19063) 2024-02-15 16:48:36 -08:00
Anis Eleuch
68dde2359f log: Add logger.Event to send to console and other logger targets (#19060)
Add a new function logger.Event() to send the log to Console and
http/kafka log webhooks. This will include some internal events such as
disk healing and rebalance/decommissioning
2024-02-15 15:13:30 -08:00
Poorna
f9dbf41e27 sr: add validation to disallow updating bandwidth limit on self (#19062) 2024-02-15 13:03:40 -08:00
Krishnan Parthasarathi
7405760f44 Refresh tier config periodically (#19049)
- Increase the parity for tier-config.bin object
- Refresh globalTierConfigMgr cached value once every 15 mins
2024-02-15 11:52:44 -08:00
Harshavardhana
7e4a6b4bcd remove rename2 entirely, avoids the risk of moving data (#19058) 2024-02-14 17:09:38 -08:00
Minio Trusted
b5791e6f28 Update yaml files to latest version RELEASE.2024-02-14T21-36-02Z 2024-02-14 21:51:12 +00:00
Harshavardhana
00cb58eaf3 add customer specific hotfixes to 'registry.min.dev' (#19057)
```
REPO="registry.min.dev/<customer>" CRED_DIR=/media/builder/minio make docker-hotfix-push
```
RELEASE.2024-02-14T21-36-02Z
2024-02-14 13:36:02 -08:00