minio

Commit Graph

Author	SHA1	Message	Date
Anis Eleuch	f85c28e960	heal: large objects fix and avoid .healing.bin corner case premature exit (#20577 ) xlStorage.Healing() returns nil if there is an error reading .healing.bin or if this latter is empty. healing.bin update() call returns early if .healing.bin is empty; hence, no further update of .healing.bin is possible. A .healing.bin can be empty if os.Open() with O_TRUNC is successful but the next Write returns an error. To avoid this weird situation, avoid making healingTracker.update() to return early if .healing.bin is empty, so write again. This commit also fixes wrong error log printing when an object is healed in another drive in the same erasure set but not in the drive that is actively healing by fresh drive healing code. Currently, it prints <nil> instead of a factual error. * heal: Scan .minio.sys metadata only during site-wide heal (#137) mc admin heal always invoke .minio.sys heal, but sometimes, this latter contains a lot of data, many service accounts, STS accounts etc, which makes mc admin heal command very slow. Only invoke .minio.sys healing when no bucket was specified in `mc admin heal` command.	2024-10-26 02:58:27 -07:00
Anis Eleuch	f7e176d4ca	heal: Avoid deadline error with very large objects (#140 ) (#20586 ) Healing a large object with a normal scan mode where no parts read is involved can still fail after 30 seconds if an object has There are too many parts when hard disks are being used mainly. The reason is there is a general deadline that checks for all parts we do a deadline per part.	2024-10-26 02:56:26 -07:00
Harshavardhana	f6f0807c86	cleanup existing part.N's before renamePart() (#20466 ) this is a safety-net to avoid any unexpected parts to show up.	2024-09-24 04:26:41 -07:00
Poorna	cdd7512a2e	use rename() safety for in-place 'xl.meta' updates (#20414 )	2024-09-11 09:08:51 -07:00
Harshavardhana	85f08d7752	verify part.N exists before reading part.N.meta (#20383 ) if part.N doesn't exist we do not have to complete the multipart transaction, it simply means that we have some partial upload situation at hand.	2024-09-05 13:37:19 -07:00
Harshavardhana	fb2360ff88	when a drive is closed cancel the cleanupTrash goroutine (#20337 ) when a hung drive is hot-unplugged, the server might go into a loop where the previous `format.json` is somehow still accessible to the process, we try to re-init() drives, but that seems to cause a previous goroutine to hang around since it is not canceled away when the drive is closed. Bonus: add deadline for immediate purge routine, to unblock it if the drive is blocking mutations.	2024-08-28 08:31:42 -07:00
jiuker	1e1bd3afd9	use io.NopCloser replace closeWrapper (#20287 )	2024-08-21 05:20:54 -07:00
Anis Eleuch	85c3db3a93	heal: Add finished flag to .healing.bin to avoid removing this latter (#20250 ) Sometimes, we need historical information in .healing.bin, such as the number of expired objects that the healing avoids to heal and that can create drive usage disparency in the same erasure set. For that reason, this commit will not remove .healing.bin anymore and it will have a new field called Finished so we know healing is finished in that drive.	2024-08-20 08:42:49 -07:00
Harshavardhana	2e0fd2cba9	implement a safer completeMultipart implementation (#20227 ) - optimize writing part.N.meta by writing both part.N and its meta in sequence without network component. - remove part.N.meta, part.N which were partially success ful, in quorum loss situations during renamePart() - allow for strict read quorum check arbitrated via ETag for the given part number, this makes it double safer upon final commit. - return an appropriate error when read quorum is missing, instead of returning InvalidPart{}, which is non-retryable error. This kind of situation can happen when many nodes are going offline in rotation, an example of such a restart() behavior is statefulset updates in k8s. fixes #20091	2024-08-12 01:38:15 -07:00
Harshavardhana	89c58ce87d	enhance getActualSize() to return valid values for most situations (#20228 )	2024-08-08 08:29:58 -07:00
Harshavardhana	80ff907d08	add DeleteBulk support, add sufficient deadlines per rename() (#20185 ) deadlines per moveToTrash() allows for a more granular timeout approach for syscalls, instead of an aggregate timeout. This PR also enhances multipart state cleanup to be optimal by removing 100's of multipart network rename() calls into single network call.	2024-07-29 18:56:40 -07:00
Harshavardhana	a16193bb50	remove fdatasync() discard, we write with O_SYNC (#20168 ) fdatasync() discard for page-cached READs is not needed, it would seem like this can cause latencies in situations when things are loaded.	2024-07-26 10:27:56 -07:00
Harshavardhana	6fe2b3f901	avoid sendFile() for ranges or object lengths < 4MiB (#20141 )	2024-07-24 03:22:50 -07:00
Harshavardhana	91805bcab6	add optimizations to bring performance on unversioned READS (#20128 ) allow non-inlined on disk to be inlined via an unversioned ReadVersion() call, we only need ReadXL() to resolve objects with multiple versions only. The choice of this block makes it to be dynamic and chosen by the user via `mc admin config set` Other bonus things - Start measuring internode TTFB performance. - Set TCP_NODELAY, TCP_CORK for low latency	2024-07-23 03:53:03 -07:00
Anis Eleuch	2ec1f404ac	info: Always refresh the root disk status (#20023 ) Add root drive status in the disk info cache function, so unmounting a drive without restarting a local node reflects the correct value.	2024-07-02 13:41:29 -07:00
Klaus Post	2f9018f03b	Do regular checks for healing status while scanning (#19946 )	2024-06-18 09:11:04 -07:00
Harshavardhana	bbb64eaade	skip healing properly in the scanner when a drive is hotplugged (#19939 ) skip healing properly in scanner when drive is hotplugged due to how the state is passed around the SkipHealing might not be the true state() of the system always, causing a situation where we might healing from the scanner on the same drive which is being. Due to this competing heals get triggered that slow each other down.	2024-06-17 16:39:11 -07:00
Harshavardhana	7bd1d899bc	remove overzealous check during HEAD() (#19940 ) due to a historic bug in CopyObject() where an inlined object loses its metadata, the check causes an incorrect fallback verifying data-dir. CopyObject() bug was fixed in `ffa91f9794` however the occurrence of this problem is historic, so the aforementioned check is stretching too much. Bonus: simplify fileInfoRaw() to read xl.json as well, also recreate buckets properly.	2024-06-17 07:29:18 -07:00
Anis Eleuch	789cbc6fb2	heal: Dangling check to evaluate object parts separately (#19797 )	2024-06-10 08:51:27 -07:00
Harshavardhana	29a25a538f	fix: make sure we list freeVersions like DEL marker with --versions (#19878 ) freeVersions() was being incorrectly skipped; list it as valid objects properly. Co-authored-by: Krishnan Parthasarathi <Krishnan Parthasarathi>	2024-06-07 15:18:44 -07:00
Harshavardhana	3549e583a6	results must be a single channel to avoid overwriting `healing.bin` (#19702 )	2024-05-09 10:15:03 -07:00
Harshavardhana	9a267f9270	allow caller context during reloads() to cancel (#19687 ) canceled callers might linger around longer, can potentially overwhelm the system. Instead provider a caller context and canceled callers don't hold on to them. Bonus: we have no reason to cache errors, we should never cache errors otherwise we can potentially have quorum errors creeping in unexpectedly. We should let the cache when invalidating hit the actual resources instead.	2024-05-08 17:51:34 -07:00
Harshavardhana	a372c6a377	a bunch of fixes for error handling (#19627 ) - handle errFileCorrupt properly - micro-optimization of sending done() response quicker to close the goroutine. - fix logger.Event() usage in a couple of places - handle the rest of the client to return a different error other than lastErr() when the client is closed.	2024-04-28 10:53:50 -07:00
Harshavardhana	1d03bea965	support preserving renameData() on inlined content during overwrites (#19609 ) extending #19548 to inlined-data as well.	2024-04-24 18:14:08 -07:00
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Harshavardhana	03767d26da	fix: get rid of large buffers (#19549 ) these lead to run-away usage of memory beyond which the Go's GC can handle, we have to re-visit this differently, remove this for now.	2024-04-19 04:26:59 -07:00
Harshavardhana	d1c58fc2eb	remove older deploymentID fix behavior to speed up startup (#19497 ) since mid 2018 we do not have any deployments without deployment-id, it is time to put this code to rest, this PR removes this old code as its no longer valuable. on setups with 1000's of drives these are all quite expensive operations.	2024-04-15 01:25:46 -07:00
Harshavardhana	074febd9e1	remove SetDiskLoc() rely on the endpoint values instead (#19475 ) the disk location never changes in the lifetime of a MinIO cluster, even if it did validate this close to the disk instead at the higher layer. Return appropriate errors indicating an invalid drive, so that the drive is not recognized as part of a valid drive.	2024-04-11 10:45:28 -07:00
Harshavardhana	c957e0d426	fix: increase the tiering part size to 128MiB (#19424 ) also introduce 8MiB buffer to read from for bigger parts	2024-04-08 02:22:27 -07:00
Harshavardhana	a207bd6790	turn-off Nlink readdir() optimization for NFS/CIFS (#19420 ) fixes #19418 fixes #19416	2024-04-05 08:17:08 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Anis Eleuch	235edd88aa	xl: Purge instead of moving to trash with near filled disks (#19294 ) Immediately remove objects from the trash when the disk is 95% full	2024-03-19 13:26:24 -07:00
Harshavardhana	c201d8bda9	write anything beyond 4k to be written in 4k pages (#19269 ) we were prematurely not writing 4k pages while we could have due to the fact that most buffers would be multiples of 4k upto some number and there shall be some remainder. We only need to write the remainder without O_DIRECT.	2024-03-15 12:27:59 -07:00
Harshavardhana	88a89213ff	make immediate purge non-blocking up to 100,000 entries per drive (#19231 ) make immediate purge non-blocking upto 100000 entries per drive Bonus: turn-off O_DIRECT verification when FSType is 'XFS'	2024-03-09 18:53:48 -08:00
Harshavardhana	6d08af61a0	for root disks add additional information in the error log (#19177 )	2024-03-02 23:45:39 -08:00
Krishnan Parthasarathi	a7577da768	Improve expiration of tiered objects (#18926 ) - Use a shared worker pool for all ILM expiry tasks - Free version cleanup executes in a separate goroutine - Add a free version only if removing the remote object fails - Add ILM expiry metrics to the node namespace - Move tier journal tasks to expiryState - Remove unused on-disk journal for tiered objects pending deletion - Distribute expiry tasks across workers such that the expiry of versions of the same object serialized - Ability to resize worker pool without server restart - Make scaling down of expiryState workers' concurrency safe; Thanks @klauspost - Add error logs when expiryState and transition state are not initialized (yet) * metrics: Add missed tier journal entry tasks * Initialize the ILM worker pool after the object layer	2024-03-01 21:11:03 -08:00
Anis Eleuch	8f03c6e0db	xl: Avoid called getdents for folders in listing (#19100 )	2024-03-01 08:01:28 -08:00
Harshavardhana	44b70eb646	allow creating missing parent folders during moveToTrash() (#19155 )	2024-02-29 08:28:33 -08:00
Aditya Manthramurthy	62ce52c8fd	cachevalue: simplify exported interface (#19137 ) - Also add cache options type	2024-02-28 09:09:09 -08:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Harshavardhana	c2b54d92f6	allow all disk full errors to be handled (#19117 )	2024-02-24 09:11:14 -08:00
Harshavardhana	a3ac62596c	move timedValue -> cachevalue package (#19114 )	2024-02-23 13:28:14 -08:00
Harshavardhana	2faba02d6b	fix: allow diskInfo at storageRPC to be cached (#19112 ) Bonus: convert timedValue into a typed implementation	2024-02-23 09:21:38 -08:00
Harshavardhana	53aa8f5650	use typos instead of codespell (#19088 )	2024-02-21 22:26:06 -08:00
Harshavardhana	35deb1a8e2	do not block on send channels under high load (#19090 ) all send channels must compete with `ctx` if not they will perpetually stay alive.	2024-02-20 15:00:35 -08:00
Harshavardhana	c7f7c47388	allow renames() for inlined writes without data-dir (#18801 ) data-dir not being present is okay, however we can still rely on the `rename()` atomic call instead of relying on write xl.meta write which may truncate the io.EOF.	2024-02-20 07:05:57 -08:00
Harshavardhana	7e4a6b4bcd	remove rename2 entirely, avoids the risk of moving data (#19058 )	2024-02-14 17:09:38 -08:00
Harshavardhana	f961ec4aaf	fix: revert allow offline disks on fresh start (#19052 ) the PR in #16541 was incorrect and hand wrong assumptions about the overall setup, revert this since this expectation to have offline servers is wrong and we can end up with a bigger chicken and egg problem. This reverts commit `5996c8c4d5`. Bonus: - preserve disk in globalLocalDrives properly upon connectDisks() - do not return 'nil' from newXLStorage(), getting it ready for the next set of changes for 'format.json' loading.	2024-02-14 10:37:34 -08:00
Harshavardhana	6d381f7c0a	relax pre-emptive GetBucketInfo() for multi-object delete (#19035 )	2024-02-12 08:46:46 -08:00
Harshavardhana	0e177a44e0	preserve conflicting objects when parent object is being deleted (#19034 ) a/prefix a/prefix/1.txt where `a/prefix` is an object which does not have `/` at the end, we do not have to aggressively recursively delete all the sub-folders as well. Instead convert the call into self contained to deleting 'xl.meta' and then subsequently attempting to Remove the parent.	2024-02-12 08:30:40 -08:00

1 2 3 4 5 ...

378 Commits