Commit Graph

268 Commits

Author SHA1 Message Date
Harshavardhana
9588978028
fix: HealBucket regression for empty buckets, simplify it (#18815) 2024-01-17 15:19:09 -08:00
Shubhendu
e31081d79d
Heal buckets at node level (#18612)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-01-09 20:34:04 -08:00
Anis Eleuch
414bcb0c73
prom: Add read quorum per erasure set metric (#18736) 2024-01-04 15:05:13 -08:00
Harshavardhana
a50ea92c64
feat: introduce list_quorum="auto" to prefer quorum drives (#18084)
NOTE: This feature is not retro-active; it will not cater to previous transactions
on existing setups. 

To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment
variable as part of systemd service or k8s configmap. 

Once this has been enabled, you need to also set `list_quorum`. 

```
~ mc admin config set alias/ api list_quorum=auto` 
```

A new debugging tool is available to check for any missing counters.
2023-12-29 15:52:41 -08:00
Anis Eleuch
8432fd5ac2
prom: Add online and healing drives metrics per erasure set (#18700) 2023-12-21 16:56:43 -08:00
Harshavardhana
7c948adf88
allow pre-allocating buffers to reduce frequent GCs during growth (#18686)
This PR also increases per node bpool memory from 1024 entries
to 2048 entries; along with that, it also moves the byte pool
centrally instead of being per pool.
2023-12-21 08:59:38 -08:00
Harshavardhana
b3314e97a6
re-use the same local drive used by remote-peer (#18645)
historically, we have always kept storage-rest-server
and a local storage API separate without much trouble,
since they both can independently operate due to no
special state() between them.

however, over some time, we have added state()
such as

- drive monitoring threads now there will be "2" of
  them per drive instead of just 1.

- concurrent tokens available per drive are now twice
  instead of just single shared, allowing unexpectedly
  high amount of I/O to go through.

- applying serialization by using walkMutexes can now
  be adequately honored for both remote callers and local
  callers.
2023-12-13 19:27:55 -08:00
Harshavardhana
196e7e072b
allow bitrot files to be healed in MRF (#18618)
bitrot scanMode was ignored in MRF,
allow it to heal relevant content if
needed when seen as an error.
2023-12-08 12:26:01 -08:00
Harshavardhana
e30c0e7ca3 Revert "Heal buckets at node level (#18504)"
This reverts commit 708296ae1b.
2023-12-05 22:34:46 -08:00
Shubhendu
708296ae1b
Heal buckets at node level (#18504) 2023-12-05 02:17:35 -08:00
Krishnan Parthasarathi
a50f26b7f5
Implement batch-expiration for objects (#17946)
Based on an initial PR from -
https://github.com/minio/minio/pull/17792

But fully completes it with newer finalized YAML spec.
2023-12-02 02:51:33 -08:00
Klaus Post
5f971fea6e
Fix Mux Connect Error (#18567)
`OpMuxConnectError` was not handled correctly.

Remove local checks for single request handlers so they can 
run before being registered locally.

Bonus: Only log IAM bootstrap on startup.
2023-12-01 00:18:04 -08:00
Harshavardhana
bd0819330d
avoid Walk() API listing objects without quorum (#18535)
This allows batch replication to basically do not
attempt to copy objects that do not have read quorum.

This PR also allows walk() to provide custom
values for quorum under batch replication, and
key rotation.
2023-11-27 17:20:04 -08:00
Harshavardhana
0a286153bb
remove checking for BucketInfo() peer call for every PUT() (#18464)
we already validate if the bucket doesn't exist in RenameData()
which can handle this cleanly, instead of making a network call
and returning errors.
2023-11-17 05:29:50 -08:00
Anis Eleuch
fe63664164
prom: Add drive failure tolerance per erasure set (#18424) 2023-11-13 00:59:48 -08:00
Harshavardhana
5c8339e1e8
fix: veeam SOS API to higher layers (#18287)
- support populating usage info from scanner info
- support populating quota for the bucket via quota
  settings for the bucket
2023-10-23 13:55:45 -07:00
Klaus Post
9a877734b2
Fix various poolmeta races (#18230)
There is a fundamental race condition in `newErasureServerPools`, where setObjectLayer is 
called before the poolMeta has been loaded/populated.

We add a placeholder value to this field but disable all saving of the value, so we don't risk 
overwriting the value on disk. Once the value has been loaded or created, it is replaced with 
the proper value, which will also be saved.

Also fixes various accesses of `poolMeta` that were done without locks.

We make the `poolMeta.IsSuspended` return false, even if we shouldn't risk out-of-bounds 
reads anymore.
2023-10-12 15:30:42 -07:00
Harshavardhana
dcce83b288
avoid rebalance state for getObjectTags if any (#18197)
fixes #18190
2023-10-09 23:56:26 -07:00
Poorna
9dc29d7687
Avoid ILM expiry on deleted versions that are yet to replicate (#18175)
Fixes #18167
2023-10-06 06:55:15 -06:00
Harshavardhana
a2ab21e91c
add max-keys=2 optimization for spark workloads (#18154)
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.

this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
2023-10-02 07:52:59 -06:00
Harshavardhana
c3d70e0795
cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051)
AccountInfo is quite frequently called by the Console UI 
login attempts, when many users are logging in it is important
that we provide them with better responsiveness.

- ListBuckets information is cached every second
- Bucket usage info is cached for up to 10 seconds
- Prefix usage (optional) info is cached for up to 10 secs

Failure to update after cache expiration, would still
allow login which would end up providing information
previously cached.

This allows for seamless responsiveness for the Console UI
logins, and overall responsiveness on a heavily loaded
system.
2023-09-18 22:13:03 -07:00
Aditya Manthramurthy
1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana
dde1a12819
fix: validate incoming uploadID to be base64 encoded (#17865)
Bonus fixes include

- do not have to write final xl.meta (renameData) does this
  already, saves some IOPs.

- make sure to purge the multipart directory properly using
  a recursive delete, otherwise this can easily pile up and
  rely on the stale uploads cleanup.

fixes #17863
2023-08-17 09:37:55 -07:00
Harshavardhana
64aa7feabd
allow specifying lower disks for Walk() (#17829)
useful when you may want Walk() with
reduced quorum requirements.
2023-08-14 21:32:39 -07:00
Anis Eleuch
7fcfde7f07
s3: Pick a pool with >85% if all other pools are in suspended state (#17826) 2023-08-10 11:06:31 -07:00
Anis Eleuch
a436fd513b
track client disconnections properly for all ListObjects calls (#17804)
Currently ListObjects* calls were returning 200 OK for timed-out clients,
this makes debugging via `mc admin trace` very hard.
2023-08-04 15:57:27 -07:00
Harshavardhana
114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
Harshavardhana
bdddf597f6
shuffle buckets randomly before being scanned (#17644)
this randomness is needed to avoid scanning
the same buckets across different erasure sets,
in the same order.

allow random buckets to be scanned instead
allowing a wider spread of ILM, replication
checks.

Additionally do not loop over twice to fill
the channel, fill the channel regardless of
having bucket new or old.
2023-07-14 02:25:40 -07:00
Kaan Kabalak
f64d62b01d
Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Harshavardhana
aae6846413
feat: allow expiration of all versions via ILM Expiration action (#17521)
Following extension allows users to specify immediate purge of
all versions as soon as the latest version of this object has
expired.

```
<LifecycleConfiguration>
    <Rule>
        <ID>ClassADocRule</ID>
        <Filter>
           <Prefix>classA/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Expiration>
             <Days>3650</Days>
	     <ExpiredObjectAllVersions>true</ExpiredObjectAllVersions>
        </Expiration>
    </Rule>
    ...
```
2023-06-28 22:12:28 -07:00
Kaan Kabalak
21fbe88e1f
Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Harshavardhana
1f8b9b4bd5
fix: do not listAndHeal() inline with PutObject() (#17499)
there is a possibility that slow drives can actually add latency
to the overall call, leading to a large spike in latency.

this can happen if there are other parallel listObjects()
calls to the same drive, in-turn causing each other to sort
of serialize.

this potentially improves performance and makes PutObject()
also non-blocking.
2023-06-24 19:31:04 -07:00
Harshavardhana
65c31fab12
fix: do not crash rebalance code instead set the object layer (#17465)
fixes #17421
2023-06-20 09:28:23 -07:00
Aditya Manthramurthy
5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Poorna
c4d0c49a5f
ensure metadata updates go to same pool where version exists (#17451)
This PR also returns the replication status in 
proxy calls and defers replication attempt if 
HEAD on object version returned a error different
from NoSuchKey
2023-06-17 07:30:53 -07:00
Klaus Post
c839b64f6a
fix: compressed+encrypted block overhead (#17289) 2023-05-26 10:57:07 -07:00
Krishnan Parthasarathi
3e128c116e
Add lifecycle event source to audit log tags (#17248) 2023-05-22 15:28:56 -07:00
Harshavardhana
fc03be7891
simplify bucket metadata lookups for versioning/object locking (#17253) 2023-05-22 12:05:14 -07:00
Harshavardhana
5569acd95c
disallow EC:0 if not set during server startup (#17141) 2023-05-04 14:44:30 -07:00
Harshavardhana
1d0211d395
allow deletes on directory objects to perform permanent deletes (#17132) 2023-05-04 14:43:52 -07:00
Harshavardhana
b53376a3a4
change directory objects to never create new versions (#17109) 2023-05-02 16:09:33 -07:00
Krishnan Parthasarathi
e7cac8acef
Add tags to auditLogLifecycle (#17081) 2023-04-26 17:49:00 -07:00
Praveen raj Mani
72802a5972
Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Harshavardhana
b1f3935c5b
allow ListObjects() when a prefix is an object (#17074) 2023-04-25 22:41:54 -07:00
Harshavardhana
6825bd7e75
fix: inlined objects don't need to honor long locks (#17039) 2023-04-17 12:16:37 -07:00
Harshavardhana
f3682b6149
allow writes to pools with inconsistent xl.meta (#17008) 2023-04-11 11:17:46 -07:00
Poorna
dc8fdcb9c9
fix: error checking in DeleteBucket (#16929) 2023-03-30 11:54:08 -07:00
Anis Eleuch
1346561b9d
return quorum error instead of insufficient storage error (#16874) 2023-03-22 16:22:37 -07:00
Minio Trusted
4bc52897b2 Update yaml files to latest version RELEASE.2023-03-22T06-36-24Z 2023-03-22 21:16:15 +00:00
Harshavardhana
12047702f5
fix: tweak the maintenance=true to satisfy baremetal first (#16864) 2023-03-21 08:48:38 -07:00
Poorna
d1e775313d
support decommissioning of tiered objects (#16751) 2023-03-16 07:48:05 -07:00
Klaus Post
628042e65e
tests: Protect globalLocalDrives against races (#16800) 2023-03-13 06:04:20 -07:00
Harshavardhana
b984bf8d1a
allow expiration of all versions during Listing() (#16757) 2023-03-09 15:15:30 -08:00
Poorna
fb6ab1cca2
fix: allow replication of 'null' delete markers (#16773) 2023-03-08 07:03:29 -08:00
ferhat elmas
714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Klaus Post
d07089ceac
Fix scanner deadlock on lost global lock (#16726) 2023-02-28 21:34:45 -08:00
Klaus Post
9acf1024e4
Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Klaus Post
d0f4cc89a5
Re-add Veeam Listing workaround (#16593) 2023-02-10 10:48:39 -08:00
Klaus Post
03b94f907f
fix: deleted object names for directory objects (#16448) 2023-01-20 21:16:06 +05:30
Harshavardhana
b4ef5ff294
remove unnecessary code checking for supported features (#16423) 2023-01-17 19:37:47 +05:30
jiuker
c8e1154f1e
fix: reading from erasureDisks must be protected via read lock() (#16407) 2023-01-13 04:16:23 -08:00
Anis Elleuch
2146ed4033
xl: Quit early when EC config is incorrect (#16390)
Co-authored-by: Anis Elleuch <anis@min.io>
2023-01-09 23:07:45 -08:00
Harshavardhana
a15a2556c3
converge listBuckets() as a peer call (#16346) 2023-01-03 23:39:40 -08:00
Harshavardhana
f1bbb7fef5
vectorize cluster-wide calls such as bucket operations (#16313) 2023-01-03 08:16:39 -08:00
Harshavardhana
5b8fe2e89a
allow locks with object affinity to spread across pools (#16312) 2022-12-23 20:55:45 -08:00
Anis Elleuch
acc9c033ed
debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Harshavardhana
b882310e2b
avoid locks for internal and invalid buckets in MakeBucket() (#16302) 2022-12-23 07:46:00 -08:00
Anis Elleuch
89db3fdb5d
Do not return an error when version disparity is detected (#16269) 2022-12-16 08:52:12 -08:00
Aditya Manthramurthy
a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Klaus Post
a713aee3d5
Run staticcheck on CI (#16170) 2022-12-05 11:18:50 -08:00
Harshavardhana
5a8df7efb3
re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Klaus Post
cc1d8f0057
Check for abandoned data when healing (#16122) 2022-11-28 10:20:55 -08:00
Klaus Post
f96fe9773c
fix: duplicated shared prefix with custom delimiter when listing (#16111) 2022-11-22 08:51:04 -08:00
Harshavardhana
6aea950d74
avoid partID lock validating uploadID exists prematurely (#16086) 2022-11-18 03:09:35 -08:00
Harshavardhana
6d76db9d6c
improve server startup error when pools are incorrect (#16056) 2022-11-11 19:40:45 -08:00
Harshavardhana
0d49b365ff
converge SNSD deployments into single code (#15988) 2022-11-01 16:41:01 -07:00
Harshavardhana
fd6f6fc8df
cleanup stale parent multipart directories (#15980) 2022-11-01 08:00:02 -07:00
Krishnan Parthasarathi
4523da6543
feat: introduce pool-level rebalance (#15483) 2022-10-25 12:36:57 -07:00
Harshavardhana
23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Anis Elleuch
ac85c2af76
lifecycle: refactor rules filtering and tagging support (#15914) 2022-10-21 10:46:53 -07:00
Harshavardhana
c68910005b
validate bucket before attempting batch replication (#15861) 2022-10-15 11:58:31 -07:00
Harshavardhana
928feb0889
remove unused debug param from evalActionFromLifecycle (#15813) 2022-10-07 10:24:12 -07:00
Harshavardhana
2a13cc28f2 feat: implement support batch replication (#15554) 2022-10-05 23:00:43 -07:00
Klaus Post
a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
Harshavardhana
e9055e9ef7
fix: walk() should cancel itself upon context cancellation (#15553)
This PR fixes possible leaks that may emanate from not
listening on context cancelation or timeouts.

```
goroutine 60957610 [chan send, 16 minutes]:
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1.1(...)
        github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368
github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},...
        github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1()
        github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1
        github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7
```
2022-08-18 17:49:08 -07:00
Harshavardhana
d350b666ff
feat: add idempotent delete marker support (#15521)
The bottom line is delete markers are a nuisance,
most applications are not version aware and this
has simply complicated the version management.

AWS S3 gave an unnecessary complication overhead
for customers, they need to now manage these
markers by applying ILM settings and clean
them up on a regular basis.

To make matters worse all these delete markers
get replicated as well in a replicated setup,
requiring two ILM settings on each site.

This PR is an attempt to address this inferior
implementation by deviating MinIO towards an
idempotent delete marker implementation i.e
MinIO will never create any more than single
consecutive delete markers.

This significantly reduces operational overhead
by making versioning more useful for real data.

This is an S3 spec deviation for pragmatic reasons.
2022-08-18 16:41:59 -07:00
Harshavardhana
bf38c0c0d1
fix: increase concurrency of DeleteObjects() to N/10th (#15546)
instead of keeping the value 10 and static, make
the concurrency a function of incoming number of
objects being deleted.
2022-08-18 09:33:56 -07:00
Poorna
21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
ebozduman
b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Harshavardhana
a6e0ec4e6f
Add support converting non-inlined to inlined (#15444)
This is a feature to allow for inode compaction on
large clusters that use a lot of small files spread
across a large heirarchy.
2022-08-02 23:10:22 -07:00
Harshavardhana
cbd70d26b5
optimize speedtest for smaller setups (#15414)
this has been observed in multiple environments
where the setups are small `speedtest` naturally
fails with default '10s' and the concurrency
of '32' is big for such clusters.

choose a smaller value i.e equal to number of
drives in such clusters and let 'autotune'
increase the concurrency instead.
2022-07-27 14:41:59 -07:00
Poorna
426c902b87
site replication: fix healing of bucket deletes. (#15377)
This PR changes the handling of bucket deletes for site 
replicated setups to hold on to deleted bucket state until 
it syncs to all the clusters participating in site replication.
2022-07-25 17:51:32 -07:00
Harshavardhana
7da9e3a6f8
support encrypted/compressed objects properly during decommission (#15320)
fixes #15314
2022-07-16 19:35:24 -07:00
Harshavardhana
1b339ea062
allow force delete on decom pool (#15302)
Bonus:

- skip suspended pool from being
  considered for multipart uploads

- add more context for decomErrors()
2022-07-14 20:44:22 -07:00
Anis Elleuch
996cac5fed
Avoid listing buckets from a suspended pool (#15283)
Make bucket requests sent after decommissioning is started are not
created in a suspended pool. Therefore listing buckets should avoid
suspended pools as well.
2022-07-13 07:44:50 -07:00
Harshavardhana
ae92521310
remove unnecessary nAgreed value in partial() func (#15242) 2022-07-07 13:45:34 -07:00
Anis Elleuch
8d98282afd
Better reporting of total/free usable capacity of the cluster (#15230)
The current code uses approximation using a ratio. The approximation 
can skew if we have multiple pools with different disk capacities.

Replace the algorithm with a simpler one which counts data 
disks and ignore parity disks.
2022-07-06 13:29:49 -07:00
Harshavardhana
2518af5f9e
fix: allow certain mutations on objects during decommissioning (#15231)
fix: allow certain mutation on objects during decommission

currently by mistake deletion of objects was skipped,
if the object resided on the pool being decommissioned.

delete's are okay to be allowed since decommission is
designed to run on a cluster with active I/O.
2022-07-06 09:53:16 -07:00
Harshavardhana
9d80ff5a05
fix: decommission delete markers for non-current objects (#15225)
versioned buckets were not creating the delete markers
present in the versioned stack of an object, this essentially
would stop decommission to succeed.

This PR fixes creating such delete markers properly during
a decommissioning process, adds tests as well.
2022-07-05 07:37:24 -07:00
Harshavardhana
9c605ad153
allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171)
allows for further granular setups

- 2 drives (1 parity, 1 data)
- 3 drives (1 parity, 2 data)

Bonus: allows '0' parity as well.
2022-06-27 20:22:18 -07:00