minio

Commit Graph

Author	SHA1	Message	Date
Krishnan Parthasarathi	a7577da768	Improve expiration of tiered objects (#18926 ) - Use a shared worker pool for all ILM expiry tasks - Free version cleanup executes in a separate goroutine - Add a free version only if removing the remote object fails - Add ILM expiry metrics to the node namespace - Move tier journal tasks to expiryState - Remove unused on-disk journal for tiered objects pending deletion - Distribute expiry tasks across workers such that the expiry of versions of the same object serialized - Ability to resize worker pool without server restart - Make scaling down of expiryState workers' concurrency safe; Thanks @klauspost - Add error logs when expiryState and transition state are not initialized (yet) * metrics: Add missed tier journal entry tasks * Initialize the ILM worker pool after the object layer	2024-03-01 21:11:03 -08:00
Harshavardhana	d7520f0ae6	fix: make sure maintenance=true is honored properly (#19156 ) fixes a regression from #18700	2024-02-29 08:37:57 -08:00
Aditya Manthramurthy	62ce52c8fd	cachevalue: simplify exported interface (#19137 ) - Also add cache options type	2024-02-28 09:09:09 -08:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Klaus Post	2b5e4b853c	Improve caching (#19130 ) * Remove lock for cached operations. * Rename "Relax" to `ReturnLastGood`. * Add `CacheError` to allow caching values even on errors. * Add NoWait that will return current value with async fetching if within 2xTTL. * Make benchmark somewhat representative. ``` Before: BenchmarkCache-12 16408370 63.12 ns/op 0 B/op After: BenchmarkCache-12 428282187 2.789 ns/op 0 B/op ``` * Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning. * Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics. * Do async fetching of usage caches.	2024-02-26 10:49:19 -08:00
Harshavardhana	a3ac62596c	move timedValue -> cachevalue package (#19114 )	2024-02-23 13:28:14 -08:00
Harshavardhana	2faba02d6b	fix: allow diskInfo at storageRPC to be cached (#19112 ) Bonus: convert timedValue into a typed implementation	2024-02-23 09:21:38 -08:00
Anis Eleuch	fa68efb1e7	s3: CopyObject to disallow invalid dest object names (#19110 ) By not doing so, objects can risk being in a wrong erasure set if the destination object name contains e.g. '//'	2024-02-22 10:05:17 -08:00
Harshavardhana	607cafadbc	converge clusterRead health into cluster health (#19063 )	2024-02-15 16:48:36 -08:00
Harshavardhana	e3fbac9e24	do not have to use the same distributionAlgo as first pool (#19031 ) when we expand via pools, there is no reason to stick with the same distributionAlgo as the rest. Since the algo only makes sense with-in a pool not across pools. This allows for newer pools to use newer codepaths to avoid legacy file lookups when they have a pre-existing deployment from 2019, they can expand their new pool to be of a newer distribution format, allowing the pool to be more performant.	2024-02-11 23:21:56 -08:00
Harshavardhana	404d8b3084	fix: dangling objects honor parityBlocks instead of dataBlocks (#19019 ) Bonus: do not recreate buckets if NoRecreate is asked.	2024-02-08 15:22:16 -08:00
Harshavardhana	035a3ea4ae	optimize startup sequence performance (#19009 ) - bucket metadata does not need to look for legacy things anymore if b.Created is non-zero - stagger bucket metadata loads across lots of nodes to avoid the current thundering herd problem. - Remove deadlines for RenameData, RenameFile - these calls should not ever be timed out and should wait until completion or wait for client timeout. Do not choose timeouts for applications during the WRITE phase. - increase R/W buffer size, increase maxMergeMessages to 30	2024-02-08 11:21:21 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Anis Eleuch	a669946357	Add cgroup v2 support for memory limit (#18905 )	2024-01-30 11:13:27 -08:00
Harshavardhana	944f3c1477	remove local disk metrics from cluster metrics (#18886 ) local disk metrics were polluting cluster metrics Please remove them instead of adding relevant ones. - batch job metrics were incorrectly kept at bucket metrics endpoint, move it to cluster metrics. - add tier metrics to cluster peer metrics from the node. - fix missing set level cluster health metrics	2024-01-28 12:53:59 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	e377bb949a	migrate bootstrap logic directly to websockets (#18855 ) improve performance for startup sequences by 2x for 300+ nodes.	2024-01-24 13:36:44 -08:00
Harshavardhana	f78d677ab6	pre-allocate EC memory by default at startup (#18846 )	2024-01-23 20:41:11 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Harshavardhana	9588978028	fix: HealBucket regression for empty buckets, simplify it (#18815 )	2024-01-17 15:19:09 -08:00
Shubhendu	e31081d79d	Heal buckets at node level (#18612 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-09 20:34:04 -08:00
Anis Eleuch	414bcb0c73	prom: Add read quorum per erasure set metric (#18736 )	2024-01-04 15:05:13 -08:00
Harshavardhana	a50ea92c64	feat: introduce list_quorum="auto" to prefer quorum drives (#18084 ) NOTE: This feature is not retro-active; it will not cater to previous transactions on existing setups. To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment variable as part of systemd service or k8s configmap. Once this has been enabled, you need to also set `list_quorum`. ``` ~ mc admin config set alias/ api list_quorum=auto` ``` A new debugging tool is available to check for any missing counters.	2023-12-29 15:52:41 -08:00
Anis Eleuch	8432fd5ac2	prom: Add online and healing drives metrics per erasure set (#18700 )	2023-12-21 16:56:43 -08:00
Harshavardhana	7c948adf88	allow pre-allocating buffers to reduce frequent GCs during growth (#18686 ) This PR also increases per node bpool memory from 1024 entries to 2048 entries; along with that, it also moves the byte pool centrally instead of being per pool.	2023-12-21 08:59:38 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Harshavardhana	196e7e072b	allow bitrot files to be healed in MRF (#18618 ) bitrot scanMode was ignored in MRF, allow it to heal relevant content if needed when seen as an error.	2023-12-08 12:26:01 -08:00
Harshavardhana	e30c0e7ca3	Revert "Heal buckets at node level (#18504 )" This reverts commit `708296ae1b`.	2023-12-05 22:34:46 -08:00
Shubhendu	708296ae1b	Heal buckets at node level (#18504 )	2023-12-05 02:17:35 -08:00
Krishnan Parthasarathi	a50f26b7f5	Implement batch-expiration for objects (#17946 ) Based on an initial PR from - https://github.com/minio/minio/pull/17792 But fully completes it with newer finalized YAML spec.	2023-12-02 02:51:33 -08:00
Klaus Post	5f971fea6e	Fix Mux Connect Error (#18567 ) `OpMuxConnectError` was not handled correctly. Remove local checks for single request handlers so they can run before being registered locally. Bonus: Only log IAM bootstrap on startup.	2023-12-01 00:18:04 -08:00
Harshavardhana	bd0819330d	avoid Walk() API listing objects without quorum (#18535 ) This allows batch replication to basically do not attempt to copy objects that do not have read quorum. This PR also allows walk() to provide custom values for quorum under batch replication, and key rotation.	2023-11-27 17:20:04 -08:00
Harshavardhana	0a286153bb	remove checking for BucketInfo() peer call for every PUT() (#18464 ) we already validate if the bucket doesn't exist in RenameData() which can handle this cleanly, instead of making a network call and returning errors.	2023-11-17 05:29:50 -08:00
Anis Eleuch	fe63664164	prom: Add drive failure tolerance per erasure set (#18424 )	2023-11-13 00:59:48 -08:00
Harshavardhana	5c8339e1e8	fix: veeam SOS API to higher layers (#18287 ) - support populating usage info from scanner info - support populating quota for the bucket via quota settings for the bucket	2023-10-23 13:55:45 -07:00
Klaus Post	9a877734b2	Fix various poolmeta races (#18230 ) There is a fundamental race condition in `newErasureServerPools`, where setObjectLayer is called before the poolMeta has been loaded/populated. We add a placeholder value to this field but disable all saving of the value, so we don't risk overwriting the value on disk. Once the value has been loaded or created, it is replaced with the proper value, which will also be saved. Also fixes various accesses of `poolMeta` that were done without locks. We make the `poolMeta.IsSuspended` return false, even if we shouldn't risk out-of-bounds reads anymore.	2023-10-12 15:30:42 -07:00
Harshavardhana	dcce83b288	avoid rebalance state for getObjectTags if any (#18197 ) fixes #18190	2023-10-09 23:56:26 -07:00
Poorna	9dc29d7687	Avoid ILM expiry on deleted versions that are yet to replicate (#18175 ) Fixes #18167	2023-10-06 06:55:15 -06:00
Harshavardhana	a2ab21e91c	add max-keys=2 optimization for spark workloads (#18154 ) comment in the code provides more detailed explanation on what this PR entails and its assumptions. this PR reduces the amount of listing() by an order of magnitude, however there are other such calls that still needs further optimization that shall be done in subsequent PRs.	2023-10-02 07:52:59 -06:00
Harshavardhana	c3d70e0795	cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051 ) AccountInfo is quite frequently called by the Console UI login attempts, when many users are logging in it is important that we provide them with better responsiveness. - ListBuckets information is cached every second - Bucket usage info is cached for up to 10 seconds - Prefix usage (optional) info is cached for up to 10 secs Failure to update after cache expiration, would still allow login which would end up providing information previously cached. This allows for seamless responsiveness for the Console UI logins, and overall responsiveness on a heavily loaded system.	2023-09-18 22:13:03 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	dde1a12819	fix: validate incoming uploadID to be base64 encoded (#17865 ) Bonus fixes include - do not have to write final xl.meta (renameData) does this already, saves some IOPs. - make sure to purge the multipart directory properly using a recursive delete, otherwise this can easily pile up and rely on the stale uploads cleanup. fixes #17863	2023-08-17 09:37:55 -07:00
Harshavardhana	64aa7feabd	allow specifying lower disks for Walk() (#17829 ) useful when you may want Walk() with reduced quorum requirements.	2023-08-14 21:32:39 -07:00
Anis Eleuch	7fcfde7f07	s3: Pick a pool with >85% if all other pools are in suspended state (#17826 )	2023-08-10 11:06:31 -07:00
Anis Eleuch	a436fd513b	track client disconnections properly for all ListObjects calls (#17804 ) Currently ListObjects* calls were returning 200 OK for timed-out clients, this makes debugging via `mc admin trace` very hard.	2023-08-04 15:57:27 -07:00
Harshavardhana	114fab4c70	export cluster health as prometheus metrics (#17741 )	2023-07-28 01:16:53 -07:00
Harshavardhana	bdddf597f6	shuffle buckets randomly before being scanned (#17644 ) this randomness is needed to avoid scanning the same buckets across different erasure sets, in the same order. allow random buckets to be scanned instead allowing a wider spread of ILM, replication checks. Additionally do not loop over twice to fill the channel, fill the channel regardless of having bucket new or old.	2023-07-14 02:25:40 -07:00
Kaan Kabalak	f64d62b01d	Fix style of logOnceIf calls w/unique identifiers (#17631 )	2023-07-11 13:17:45 -07:00
Harshavardhana	aae6846413	feat: allow expiration of all versions via ILM Expiration action (#17521 ) Following extension allows users to specify immediate purge of all versions as soon as the latest version of this object has expired. ``` <LifecycleConfiguration> <Rule> <ID>ClassADocRule</ID> <Filter> <Prefix>classA/</Prefix> </Filter> <Status>Enabled</Status> <Expiration> <Days>3650</Days> <ExpiredObjectAllVersions>true</ExpiredObjectAllVersions> </Expiration> </Rule> ... ```	2023-06-28 22:12:28 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00

1 2 3 4 5

237 Commits