minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	cdeab19673	fix: always check error upon w.Close() in Write() (#18111 ) not checking w.Close() can prematurely make us think that the w.Write() actually succeeded, apparently Write() may or may not return an error but sometimes only during a Close() call to the fd we may see the error from Write() propagate. Fdatasync(w) on the FD would return an error requiring Close() error handling is less of a concern, however it may happen such that fdatasync() did not return an error, where as Close() would.	2023-09-26 11:04:00 -07:00
Harshavardhana	ac3a19138a	fix: set scanning details locally to avoid cached values (#18092 ) atomic variable results such as scanning must not use cached values, instead rely on real-time information.	2023-09-25 08:26:29 -07:00
Harshavardhana	8b8be2695f	optimize mkdir calls to avoid base-dir `Mkdir` attempts (#18021 ) Currently we have IOPs of these patterns ``` [OS] os.Mkdir play.min.io:9000 /disk1 2.718µs [OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs ``` It can be seen that we can save quite Nx levels such as if your drive is mounted at `/disk1/minio` you can simply skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`. Since they are expected to exist already, this PR adds a way for us to ignore all paths upto the mount or a directory which ever has been provided to MinIO setup.	2023-09-13 08:14:36 -07:00
Harshavardhana	1df5e31706	optimize MRF replication queue to avoid memory leaks (#18007 )	2023-09-11 20:59:11 -07:00
Anis Eleuch	41de53996b	heal: calculate the number of workers based on NRRequests (#17945 )	2023-09-11 14:48:54 -07:00
Harshavardhana	3995355150	avoid repeated large allocations for large parts (#17968 ) objects with 10,000 parts and many of them can cause a large memory spike which can potentially lead to OOM due to lack of GC. with previous PR reducing the memory usage significantly in #17963, this PR reduces this further by 80% under repeated calls. Scanner sub-system has no use for the slice of Parts(), it is better left empty. ``` benchmark old ns/op new ns/op delta BenchmarkToFileInfo/ToFileInfo-8 295658 188143 -36.36% benchmark old allocs new allocs delta BenchmarkToFileInfo/ToFileInfo-8 61 60 -1.64% benchmark old bytes new bytes delta BenchmarkToFileInfo/ToFileInfo-8 1097210 227255 -79.29% ```	2023-09-02 07:49:24 -07:00
Harshavardhana	8208bcb896	remove all unnecessary logging, logOnce when absolutely needed (#17965 )	2023-09-01 16:19:18 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Harshavardhana	7cafdc0512	fix: skip access checks further for known buckets (#17934 )	2023-08-28 15:16:41 -07:00
Harshavardhana	8a57b6bced	use renameat2 Linux extension syscall (#17757 ) this is a faster and safer alternative on newer kernel versions.	2023-08-27 09:57:11 -07:00
Harshavardhana	124e28578c	remove strict persistence requirements for List() .metacache objects (#17917 ) .metacache objects are transient in nature, and are better left to use page-cache effectively to avoid using more IOPs on the disks. this allows for incoming calls to be not taxed heavily due to multiple large batch listings.	2023-08-25 07:58:11 -07:00
Harshavardhana	dde1a12819	fix: validate incoming uploadID to be base64 encoded (#17865 ) Bonus fixes include - do not have to write final xl.meta (renameData) does this already, saves some IOPs. - make sure to purge the multipart directory properly using a recursive delete, otherwise this can easily pile up and rely on the stale uploads cleanup. fixes #17863	2023-08-17 09:37:55 -07:00
Harshavardhana	6e860b6dc5	count all versions as part of DeleteAllVersionsAction (#17821 )	2023-08-09 08:55:19 -07:00
Harshavardhana	cb089dcb52	error out by default beyond 10000 versions per object (#17803 ) ``` You've exceeded the limit on the number of versions you can create on this object ```	2023-08-04 10:40:21 -07:00
Harshavardhana	0153f96a20	add deadlines for readMetadata() in listing (#17776 ) Bonus: also skip spending time looking for xl.json - Listing() - Delete()	2023-08-01 21:52:31 -07:00
Harshavardhana	81be718674	fix: optimize DiskInfo() call avoid metrics when not needed (#17763 )	2023-07-31 15:20:48 -07:00
Harshavardhana	73edd5b8fd	introduce 'mc admin config set alias/ api odirect=on' (#17753 ) change disable_odirect=off -> odirect=on to make it easier to understand, instead of making it double negative.	2023-07-31 00:12:53 -07:00
Harshavardhana	f13cfcb83e	allow disabling O_DIRECT for write ops (#17751 ) on really slow systems, O_DIRECT simply kills the drives allow for a way to disable them.	2023-07-29 15:17:56 -07:00
Harshavardhana	731e03fe5a	add ReadFileStream deadline for disk call (#17745 ) timeout the reader side if hung via disk max timeout	2023-07-28 15:37:53 -07:00
Harshavardhana	47dcfcbdd4	introduce deadlines on READ operations (#17724 )	2023-07-27 07:33:05 -07:00
Harshavardhana	b28bcad11b	avoid Access() calls on known bucket paths (#17719 )	2023-07-26 11:31:40 -07:00
Poorna	f95129894d	Use decrypted object size while computing object size summary (#17717 ) Corrects an issue with encrypted versioned objects being reported under `unversioned` bin in the object version histogram	2023-07-24 17:13:25 -07:00
Harshavardhana	14e1ace552	remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707 ) slower drives get knocked off because they are too slow via active monitoring, we do not need to block calls arbitrarily. Serializing adds latencies for already slow calls, remove it for SSDs/NVMEs Also, add a selection with context when writing to `out <-` channel, to avoid any potential blocks.	2023-07-24 09:30:19 -07:00
Krishnan Parthasarathi	0120ff93bc	admin-info: add DeleteMarkers count (#17659 )	2023-07-18 10:49:40 -07:00
Harshavardhana	24e86d0c59	avoid passing around poolIdx, setIdx instead pass the relevant disks (#17660 )	2023-07-17 09:52:05 -07:00
Kaan Kabalak	f64d62b01d	Fix style of logOnceIf calls w/unique identifiers (#17631 )	2023-07-11 13:17:45 -07:00
Harshavardhana	82075e8e3a	use strconv variants to improve on performance per 'op' (#17626 ) ``` BenchmarkItoa BenchmarkItoa-8 673628088 1.946 ns/op 0 B/op 0 allocs/op BenchmarkFormatInt BenchmarkFormatInt-8 592919769 2.012 ns/op 0 B/op 0 allocs/op BenchmarkSprint BenchmarkSprint-8 26149144 49.06 ns/op 2 B/op 1 allocs/op BenchmarkSprintBool BenchmarkSprintBool-8 26440180 45.92 ns/op 4 B/op 1 allocs/op BenchmarkFormatBool BenchmarkFormatBool-8 1000000000 0.2558 ns/op 0 B/op 0 allocs/op ```	2023-07-11 07:46:58 -07:00
Kaan Kabalak	bd6842d917	Further print log messages once per error (#17618 )	2023-07-10 07:59:57 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
jiuker	5a21b1f353	fix: Delete dir failed when .DS_Store in it (#17352 )	2023-06-06 10:12:06 -07:00
Klaus Post	bb6f4d7633	Remove redundant checkFormatJSON logging (#17134 )	2023-05-04 07:28:37 -07:00
Klaus Post	e8c0a50862	optimization use small blocks up to 64KB (#17107 )	2023-05-01 09:47:49 -07:00
Harshavardhana	8c874884fc	fix: do not copy context in DiskInfo cache (#17085 )	2023-04-26 12:13:54 -07:00
Anis Eleuch	224d9a752f	fix: the race in healing tracker code (#17048 )	2023-04-18 14:49:56 -07:00
Krishnan Parthasarathi	6877578bbc	Update minio_node_bucket_scans_finished metrics (#17006 )	2023-04-11 19:21:34 -07:00
ferhat elmas	056ca0c68e	refactor: reuse open file in storage interface (#16970 )	2023-04-09 23:09:28 -07:00
Krishnan Parthasarathi	25f7a8e406	Indicate RenameData is called by healObject (#16997 )	2023-04-09 10:25:37 -07:00
Shubhendu	4c204707fd	Correct to remove `null` version while ILM rule application (#16971 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2023-04-06 14:10:01 -07:00
jiuker	5fa3665074	add isSysErrNotDir to openFileNoSync (#16958 )	2023-04-04 08:00:08 -07:00
Harshavardhana	b984bf8d1a	allow expiration of all versions during Listing() (#16757 )	2023-03-09 15:15:30 -08:00
ferhat elmas	714283fae2	cleanup ignored static analysis (#16767 )	2023-03-06 08:56:10 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Klaus Post	fd6622458b	Add detailed scanner trace output and notifications (#16668 )	2023-02-21 09:33:33 -08:00
Klaus Post	6a04067514	fix: tweak read buffer size to reduce over-reading (#16338 )	2023-01-01 08:14:20 -08:00
Krishnan Parthasarathi	2fa35def2c	Fix DeleteObject when only free versions remain (#16289 )	2022-12-21 16:24:07 -08:00
Harshavardhana	20ef5e7a6a	avoid double deletes() when no more versions (#16206 )	2022-12-12 01:40:04 -08:00
Klaus Post	3eb2d086b2	Replace filepathx with fork (#16192 )	2022-12-08 10:42:44 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Anis Elleuch	1bae32dc96	xl: Delete older data-dir when replacing an existing version-id (#16176 )	2022-12-06 13:43:18 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Krishnan Parthasarathi	6eef9b4a23	lifecycle: simplify Eval and HasActiveRules (#16036 )	2022-11-10 07:17:45 -08:00
Klaus Post	ecc932d5dd	Clean entire tmp-old on restart (#15979 )	2022-10-31 07:27:50 -07:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Harshavardhana	58a8275e84	do not assume invalid buf to be non-xl.meta (#15843 )	2022-10-17 09:39:21 -07:00
Anis Elleuch	6e84283c66	fix: ignoring O_DIRECT in case of erasure single disk (#15734 ) fixes #15733 fixes #15735	2022-09-22 10:41:06 -07:00
Klaus Post	ff12080ff5	Remove deprecated io/ioutil (#15707 )	2022-09-19 11:05:16 -07:00
Harshavardhana	2d9b5a65f1	verify RenameData() versions to be consistent (#15649 ) xl.meta gets written and never rolled back, however we definitely need to validate the state that is persisted on the disk, if there are inconsistencies - more than write quorum we should return an error to the client - if write quorum was achieved however there are inconsistent xl.meta's we should simply trigger an MRF on them	2022-09-05 16:51:37 -07:00
Harshavardhana	97376f6e8f	improve performance for inlined data (#15603 ) inlined data often is bigger than the allowed O_DIRECT alignment, so potentially we can write 'xl.meta' without O_DSYNC instead we can rely on O_DIRECT + fdatasync() instead. This PR allows O_DIRECT on inlined data that would gain the benefits of performing O_DIRECT, eventually performing an fdatasync() at the end. Performance boost can be observed here for small objects < 128KiB. The performance boost is mainly seen on HDD, and marginal on NVMe setups.	2022-08-29 11:19:29 -07:00
Anis Elleuch	5682685c80	Introduce disk io stats metrics (#15512 )	2022-08-16 07:13:49 -07:00
Harshavardhana	3bd9615d0e	fix: log if there is readDir() failure with ListBuckets (#15461 ) This is actionable and must be logged. Bonus: also honor umask by using 0o666 for all Open() syscalls.	2022-08-04 07:23:05 -07:00
Harshavardhana	043aaa792d	fix: intrument os.OpenFile differently for Reads and Writes (#15449 ) allows us to trace latency for READs or WRITEs	2022-08-01 13:22:43 -07:00
Harshavardhana	7725425e05	fix: fork os.MkdirAll to optimize cases where parent exists (#15379 ) a/b/c/d/ where `a/b/c/` exists results in additional syscalls such as an Lstat() call to verify if the `a/b/c/` exists and its a directory. We do not need to do this on MinIO since the parent prefixes if exist, we can simply return success without spending additional syscalls. Also this implementation attempts to simply use Access() calls to avoid os.Stat() calls since the latter does memory allocation for things we do not need to use. Access() is simpler since we have a predictable structure on the backend and we know exactly how our path structures are.	2022-07-24 00:43:11 -07:00
Klaus Post	69bf39f42e	fix: make complete multipart uploads faster encrypted/compressed backends (#15375 ) - Only fetch the parts we need and abort as soon as one is missing. - Only fetch the number of parts requested by "ListObjectParts".	2022-07-21 16:47:58 -07:00
Klaus Post	f939d1c183	Independent Multipart Uploads (#15346 ) Do completely independent multipart uploads. In distributed mode, a lock was held to merge each multipart upload as it was added. This lock was highly contested and retries are expensive (timewise) in distributed mode. Instead, each part adds its metadata information uniquely. This eliminates the per object lock required for each to merge. The metadata is read back and merged by "CompleteMultipartUpload" without locks when constructing final object. Co-authored-by: Harshavardhana <harsha@minio.io>	2022-07-19 08:35:29 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	bd099f5e71	fix: change timedValue to return the previously cached value (#15169 ) fix: change timedvalue to return previous cached value caller can interpret the underlying error and decide accordingly, places where we do not interpret the errors upon timedValue.Get() - we should simply use the previously cached value instead of returning "empty". Bonus: remove some unused code	2022-06-25 08:50:16 -07:00
Harshavardhana	d55efc791f	relax O_DIRECT in single drive mode if unsupported (#15045 )	2022-06-07 06:44:01 -07:00
Harshavardhana	df9eeb7f8f	fix: do not log concurrently when multiple disks return errors (#15044 ) since the values inside 'context' are mutated internally by logger, make sure to log serially upon errors not concurrently.	2022-06-06 15:15:11 -07:00
Harshavardhana	5afdc56796	allow single drive mode to run on root disk (#15037 ) for practical reasons, allow root disk based installs for single drive mode.	2022-06-03 12:53:42 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Anis Elleuch	ca69e54cb6	tests: Fix sporadic failure of TestXLStorageDeleteFile (#14911 ) The test expects from DeleteFile to return errDiskNotFound when the disk is not available. It calls os.RemoveAll() to remove one disk after XL storage initialization. However, this latter contains some goroutines which can race with os.RemoveAll() and then the test fails sporadically with returning random errors. The commit will tweak the initialization routine of the XL storage to only run deletion of temporary and metacache data in the background, so TestXLStorageDeleteFile won't fail anymore.	2022-05-12 15:24:58 -07:00
Anis Elleuch	df50eda811	Add number of versions in server info API (#14812 ) The goal is to show the number of versions in the server info API.	2022-04-25 22:04:10 -07:00
Poorna	3a64580663	Add support for site replication healing (#14572 ) heal bucket metadata and IAM entries for sites participating in site replication from the site with the most updated entry. Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <aditya@minio.io>	2022-04-24 02:36:31 -07:00
Harshavardhana	507f993075	attempt to real resolve when there is a quorum failure on reads (#14613 )	2022-04-20 12:49:05 -07:00
Harshavardhana	73a6a60785	fix: replication deleteObject() regression and CopyObject() behavior (#14780 ) This PR fixes two issues - The first fix is a regression from #14555, the fix itself in #14555 is correct but the interpretation of that information by the object layer code for "replication" was not correct. This PR tries to fix this situation by making sure the "Delete" replication works as expected when "VersionPurgeStatus" is already set. Without this fix, there is a DELETE marker created incorrectly on the source where the "DELETE" was triggered. - The second fix is perhaps an older problem started since we inlined-data on the disk for small objects, CopyObject() incorrectly inline's a non-inlined data. This is due to the fact that we have code where we read the `part.1` under certain conditions where the size of the `part.1` is less than the specific "threshold". This eventually causes problems when we are "deleting" the data that is only inlined, which means dataDir is ignored leaving such dataDir on the disk, that looks like an inconsistent content on the namespace. fixes #14767	2022-04-20 10:22:05 -07:00
Anis Elleuch	16431d222c	heal: Enable periodic bitrot scan configuration (#14464 )	2022-04-07 08:10:40 -07:00
Harshavardhana	cf94d1f1f1	do not crash readXLMetaNoData - if the `xl.meta` has incorrect content (#14538 ) ``` tmp = buf[want:] ``` Would potentially crash when `buf` is truncated for some reason and does not have the expected bytes, this is of course considered not normal and is an odd situation. But we do not need to crash here instead allow for errors to be returned and let callers handle the errors.	2022-03-14 09:07:46 -07:00
Harshavardhana	91d419ee6c	warn issues about large block I/O performance for Linux older than 4.0.0 (#14524 ) This PR simply adds a warning message when it detects older kernel versions and warn's them about potential performance issues on this kernel. The issue can be seen only with parallel I/O across all drives on denser setups such as 90 drives or 45 drives per server configurations.	2022-03-10 17:36:13 -08:00
Klaus Post	b890bbfa63	Add local disk health checks (#14447 ) The main goal of this PR is to solve the situation where disks stop responding to operations. This generally causes an FD build-up and eventually will crash the server. This adds detection of hung disks, where calls on disk get stuck. We add functionality to `xlStorageDiskIDCheck` where it keeps track of the number of concurrent requests on a given disk. A total number of 100 operations are allowed. If this limit is reached we will block (but not reject) new requests, but we will monitor the state of the disk. If no requests have been completed or updated within a 15-second window, we mark the disk as offline. Requests that are blocked will be unblocked and return an error as "faulty disk". New requests will be rejected until the disk is marked OK again. Once a disk has been marked faulty, a check will run every 5 seconds that will attempt to write and read back a file. As long as this fails the disk will remain faulty. To prevent lots of long-running requests to mark the disk faulty we implement a callback feature that allows updating the status as parts of these operations are running. We add a reader and writer wrapper that will update the status of each successful read/write operation. This should allow fine enough granularity that a slow, but still operational disk will not reach 15 seconds where 50 operations have not progressed. Note that errors themselves are not enough to mark a disk faulty. A nil (or io.EOF) error will mark a disk as "good". * Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`. * de-couple IsOnline() from disk health tracker The purpose of IsOnline() is to ensure that we reconnect the drive only when the "drive" was - disconnected from network we need to validate if the drive is "correct" and is the same drive which belongs to this server. - drive was replaced we have to format it - we support hot swapping of the drives. IsOnline() is not meant for taking the drive offline when it is hung, it is not useful we can let the drive be online instead "return" errors for relevant calls. * return errFaultyDisk for DiskInfo() call Co-authored-by: Harshavardhana <harsha@minio.io> Possible future Improvements: * Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly. * Allow reads/writes to be aborted by the context. * Add usage stats, concurrent count, blocked operations, etc.	2022-03-09 11:38:54 -08:00
Harshavardhana	b0c84e3de7	fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483 ) This is a side-affect of the optimization done in PR #13544 which causes a certain type of delete operations on given object versions can cause lastVersion indication to be skipped, which leads to an `xl.meta` where Versions[] slice is empty while the entire file is intact by itself. This PR tries to ensure that such files are visible and deletable by regular means of listing as null 'delete-marker' and also avoid the situation where this potential issue might arise.	2022-03-04 20:01:26 -08:00
Harshavardhana	66afa16aed	canceled PUTs throw frivolous logs (#14475 ) remote drives might throw frivolous logs, if the caller canceled the PUT operation in such scenarios there is no reason to log.	2022-03-04 10:31:33 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Anis Elleuch	5dcf1d13a9	ci: Always set disks as non root disks (#14389 ) In the testing mode, reformatting disks will fail because the healing code will complain if one disk is in root mode. This commit will automatically set all disks as non-root if MINIO_CI_CD is set.	2022-02-23 10:11:33 -08:00
Krishnan Parthasarathi	5a0c0079a1	Don't add free-version on restore-object (#14340 )	2022-02-17 15:05:19 -08:00
Harshavardhana	03a6e8aee2	fix: creating steep directory structure on trash folder (#14314 ) weird directory structures get created on the '.trash' folder upon server restarts, this PR fixes this.	2022-02-15 16:34:03 -08:00
Anis Elleuch	1f92fc3fc0	Always check for root disks unless MINIO_CI_CD is set (#14232 ) The current code considers a pool with all root disks to be as part of a testing environment even if there are other pools with mounted disks. This will result to illegitimate writing in root disks. Fix this by simplifing the logic: require MINIO_CI_CD in order to skip root disk check.	2022-02-13 15:42:07 -08:00
Harshavardhana	6123377e66	speedup getFormatErasureInQuorum use driveCount (#14239 ) startup speed-up, currently getFormatErasureInQuorum() would spend up to 2-3secs when there are 3000+ drives for example in a setup, simplify this implementation to use drive counts.	2022-02-04 12:21:21 -08:00
Harshavardhana	24657859a8	when o_direct is disabled do not attempt fadvise call (#14230 )	2022-02-02 08:54:52 -08:00
Harshavardhana	57118919d2	cached diskIDs are not needed for scanner healing (#14170 ) This PR removes an unnecessary state that gets passed around for DiskIDs, which is not necessary since each disk exactly knows which pool and which set it belongs to on a running system. Currently cached DiskId's won't work properly because it always ends up skipping offline disks and never runs healing when disks are offline, as it expects all the cached diskIDs to be present always. This also sort of made things in-flexible in terms perhaps a new diskID for `format.json`. (however this is not a big issue) This is an unnecessary requirement that healing via scanner needs all drives to be online, instead healing should trigger even when partial nodes and drives are available this ensures that we keep the SLA in-tact on the objects when disks are offline for a prolonged period of time.	2022-01-26 08:34:56 -08:00
Krishnan Parthasarathi	ebc3627c73	further improvements to newXLStorage (#14166 ) - create internal erasure volumes only if the disk is unformatted - return a copy of format data in xlStorage.ReadAll - parse env vars only once, to be re-used by xl-storage	2022-01-24 17:09:12 -08:00
Harshavardhana	5a9f133491	speed up startup sequence for all operations (#14148 ) This speed-up is intended for faster startup times for almost all MinIO operations. Changes here are - Drives are not re-read for 'format.json' on a regular basis once read during init is remembered and refreshed at 5 second intervals. - Do not do O_DIRECT tests on drives with existing 'format.json' only fresh setups need this check. - Parallelize initializing erasureSets for multiple sets. - Avoid re-reading format.json when migrating 'format.json' from really old V1->V2->V3 - Keep a copy of local drives for any given server in memory for a quick lookup.	2022-01-24 11:28:45 -08:00
Harshavardhana	70e1cbda21	allow disabling O_DIRECT in certain environments for reads (#14115 ) repeated reads on single large objects in HPC like workloads, need the following option to disable O_DIRECT for a more effective usage of the kernel page-cache. However this optional should be used in very specific situations only, and shouldn't be enabled on all servers. NVMe servers benefit always from keeping O_DIRECT on.	2022-01-17 08:34:14 -08:00
Klaus Post	64d4da5a37	Add Put input readahead (#14084 ) When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs. This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers. We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from. This improves PutObject and PutObjectParts for these cases.	2022-01-14 10:01:25 -08:00
Klaus Post	a2fd8caa69	Ignore version not found in deleteVersions (#14093 ) When deleting multiple versions it "gives" up with an errFileVersionNotFound if a version cannot be found. This effectively skips deleting other versions sent in the same request. This can happen on inconsistent objects. We should ignore errFileVersionNotFound and continue with others. We already ignore these at the caller level, this PR is continuation of `54a9877`	2022-01-13 14:28:07 -08:00
Harshavardhana	f546636c52	fix: use renameAll instead of deleteObject() for purging temporary files (#14096 ) This PR simplifies few things - Multipart parts are renamed, upon failure are unrenamed() keep this multipart specific behavior it is needed and works fine. - AbortMultipart should blindly delete once lock is acquired instead of re-reading metadata and calculating quorum, abort is a delete() operation and client has no business looking for errors on this. - Skip Access() calls to folders that are operating on `.minio.sys/multipart` folder as well.	2022-01-13 11:07:41 -08:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
Harshavardhana	737a3f0bad	fix: decommission bugfixes found during migration of .minio.sys/config (#14078 )	2022-01-10 17:26:00 -08:00

1 2 3 4 5 ...

347 Commits