minio

Commit Graph

Author	SHA1	Message	Date
Poorna	f95129894d	Use decrypted object size while computing object size summary (#17717 ) Corrects an issue with encrypted versioned objects being reported under `unversioned` bin in the object version histogram	2023-07-24 17:13:25 -07:00
Harshavardhana	14e1ace552	remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707 ) slower drives get knocked off because they are too slow via active monitoring, we do not need to block calls arbitrarily. Serializing adds latencies for already slow calls, remove it for SSDs/NVMEs Also, add a selection with context when writing to `out <-` channel, to avoid any potential blocks.	2023-07-24 09:30:19 -07:00
Krishnan Parthasarathi	0120ff93bc	admin-info: add DeleteMarkers count (#17659 )	2023-07-18 10:49:40 -07:00
Harshavardhana	24e86d0c59	avoid passing around poolIdx, setIdx instead pass the relevant disks (#17660 )	2023-07-17 09:52:05 -07:00
Kaan Kabalak	f64d62b01d	Fix style of logOnceIf calls w/unique identifiers (#17631 )	2023-07-11 13:17:45 -07:00
Harshavardhana	82075e8e3a	use strconv variants to improve on performance per 'op' (#17626 ) ``` BenchmarkItoa BenchmarkItoa-8 673628088 1.946 ns/op 0 B/op 0 allocs/op BenchmarkFormatInt BenchmarkFormatInt-8 592919769 2.012 ns/op 0 B/op 0 allocs/op BenchmarkSprint BenchmarkSprint-8 26149144 49.06 ns/op 2 B/op 1 allocs/op BenchmarkSprintBool BenchmarkSprintBool-8 26440180 45.92 ns/op 4 B/op 1 allocs/op BenchmarkFormatBool BenchmarkFormatBool-8 1000000000 0.2558 ns/op 0 B/op 0 allocs/op ```	2023-07-11 07:46:58 -07:00
Kaan Kabalak	bd6842d917	Further print log messages once per error (#17618 )	2023-07-10 07:59:57 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
jiuker	5a21b1f353	fix: Delete dir failed when .DS_Store in it (#17352 )	2023-06-06 10:12:06 -07:00
Klaus Post	bb6f4d7633	Remove redundant checkFormatJSON logging (#17134 )	2023-05-04 07:28:37 -07:00
Klaus Post	e8c0a50862	optimization use small blocks up to 64KB (#17107 )	2023-05-01 09:47:49 -07:00
Harshavardhana	8c874884fc	fix: do not copy context in DiskInfo cache (#17085 )	2023-04-26 12:13:54 -07:00
Anis Eleuch	224d9a752f	fix: the race in healing tracker code (#17048 )	2023-04-18 14:49:56 -07:00
Krishnan Parthasarathi	6877578bbc	Update minio_node_bucket_scans_finished metrics (#17006 )	2023-04-11 19:21:34 -07:00
ferhat elmas	056ca0c68e	refactor: reuse open file in storage interface (#16970 )	2023-04-09 23:09:28 -07:00
Krishnan Parthasarathi	25f7a8e406	Indicate RenameData is called by healObject (#16997 )	2023-04-09 10:25:37 -07:00
Shubhendu	4c204707fd	Correct to remove `null` version while ILM rule application (#16971 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2023-04-06 14:10:01 -07:00
jiuker	5fa3665074	add isSysErrNotDir to openFileNoSync (#16958 )	2023-04-04 08:00:08 -07:00
Harshavardhana	b984bf8d1a	allow expiration of all versions during Listing() (#16757 )	2023-03-09 15:15:30 -08:00
ferhat elmas	714283fae2	cleanup ignored static analysis (#16767 )	2023-03-06 08:56:10 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Klaus Post	fd6622458b	Add detailed scanner trace output and notifications (#16668 )	2023-02-21 09:33:33 -08:00
Klaus Post	6a04067514	fix: tweak read buffer size to reduce over-reading (#16338 )	2023-01-01 08:14:20 -08:00
Krishnan Parthasarathi	2fa35def2c	Fix DeleteObject when only free versions remain (#16289 )	2022-12-21 16:24:07 -08:00
Harshavardhana	20ef5e7a6a	avoid double deletes() when no more versions (#16206 )	2022-12-12 01:40:04 -08:00
Klaus Post	3eb2d086b2	Replace filepathx with fork (#16192 )	2022-12-08 10:42:44 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Anis Elleuch	1bae32dc96	xl: Delete older data-dir when replacing an existing version-id (#16176 )	2022-12-06 13:43:18 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Krishnan Parthasarathi	6eef9b4a23	lifecycle: simplify Eval and HasActiveRules (#16036 )	2022-11-10 07:17:45 -08:00
Klaus Post	ecc932d5dd	Clean entire tmp-old on restart (#15979 )	2022-10-31 07:27:50 -07:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Harshavardhana	58a8275e84	do not assume invalid buf to be non-xl.meta (#15843 )	2022-10-17 09:39:21 -07:00
Anis Elleuch	6e84283c66	fix: ignoring O_DIRECT in case of erasure single disk (#15734 ) fixes #15733 fixes #15735	2022-09-22 10:41:06 -07:00
Klaus Post	ff12080ff5	Remove deprecated io/ioutil (#15707 )	2022-09-19 11:05:16 -07:00
Harshavardhana	2d9b5a65f1	verify RenameData() versions to be consistent (#15649 ) xl.meta gets written and never rolled back, however we definitely need to validate the state that is persisted on the disk, if there are inconsistencies - more than write quorum we should return an error to the client - if write quorum was achieved however there are inconsistent xl.meta's we should simply trigger an MRF on them	2022-09-05 16:51:37 -07:00
Harshavardhana	97376f6e8f	improve performance for inlined data (#15603 ) inlined data often is bigger than the allowed O_DIRECT alignment, so potentially we can write 'xl.meta' without O_DSYNC instead we can rely on O_DIRECT + fdatasync() instead. This PR allows O_DIRECT on inlined data that would gain the benefits of performing O_DIRECT, eventually performing an fdatasync() at the end. Performance boost can be observed here for small objects < 128KiB. The performance boost is mainly seen on HDD, and marginal on NVMe setups.	2022-08-29 11:19:29 -07:00
Anis Elleuch	5682685c80	Introduce disk io stats metrics (#15512 )	2022-08-16 07:13:49 -07:00
Harshavardhana	3bd9615d0e	fix: log if there is readDir() failure with ListBuckets (#15461 ) This is actionable and must be logged. Bonus: also honor umask by using 0o666 for all Open() syscalls.	2022-08-04 07:23:05 -07:00
Harshavardhana	043aaa792d	fix: intrument os.OpenFile differently for Reads and Writes (#15449 ) allows us to trace latency for READs or WRITEs	2022-08-01 13:22:43 -07:00
Harshavardhana	7725425e05	fix: fork os.MkdirAll to optimize cases where parent exists (#15379 ) a/b/c/d/ where `a/b/c/` exists results in additional syscalls such as an Lstat() call to verify if the `a/b/c/` exists and its a directory. We do not need to do this on MinIO since the parent prefixes if exist, we can simply return success without spending additional syscalls. Also this implementation attempts to simply use Access() calls to avoid os.Stat() calls since the latter does memory allocation for things we do not need to use. Access() is simpler since we have a predictable structure on the backend and we know exactly how our path structures are.	2022-07-24 00:43:11 -07:00
Klaus Post	69bf39f42e	fix: make complete multipart uploads faster encrypted/compressed backends (#15375 ) - Only fetch the parts we need and abort as soon as one is missing. - Only fetch the number of parts requested by "ListObjectParts".	2022-07-21 16:47:58 -07:00
Klaus Post	f939d1c183	Independent Multipart Uploads (#15346 ) Do completely independent multipart uploads. In distributed mode, a lock was held to merge each multipart upload as it was added. This lock was highly contested and retries are expensive (timewise) in distributed mode. Instead, each part adds its metadata information uniquely. This eliminates the per object lock required for each to merge. The metadata is read back and merged by "CompleteMultipartUpload" without locks when constructing final object. Co-authored-by: Harshavardhana <harsha@minio.io>	2022-07-19 08:35:29 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	bd099f5e71	fix: change timedValue to return the previously cached value (#15169 ) fix: change timedvalue to return previous cached value caller can interpret the underlying error and decide accordingly, places where we do not interpret the errors upon timedValue.Get() - we should simply use the previously cached value instead of returning "empty". Bonus: remove some unused code	2022-06-25 08:50:16 -07:00
Harshavardhana	d55efc791f	relax O_DIRECT in single drive mode if unsupported (#15045 )	2022-06-07 06:44:01 -07:00
Harshavardhana	df9eeb7f8f	fix: do not log concurrently when multiple disks return errors (#15044 ) since the values inside 'context' are mutated internally by logger, make sure to log serially upon errors not concurrently.	2022-06-06 15:15:11 -07:00
Harshavardhana	5afdc56796	allow single drive mode to run on root disk (#15037 ) for practical reasons, allow root disk based installs for single drive mode.	2022-06-03 12:53:42 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Anis Elleuch	ca69e54cb6	tests: Fix sporadic failure of TestXLStorageDeleteFile (#14911 ) The test expects from DeleteFile to return errDiskNotFound when the disk is not available. It calls os.RemoveAll() to remove one disk after XL storage initialization. However, this latter contains some goroutines which can race with os.RemoveAll() and then the test fails sporadically with returning random errors. The commit will tweak the initialization routine of the XL storage to only run deletion of temporary and metacache data in the background, so TestXLStorageDeleteFile won't fail anymore.	2022-05-12 15:24:58 -07:00
Anis Elleuch	df50eda811	Add number of versions in server info API (#14812 ) The goal is to show the number of versions in the server info API.	2022-04-25 22:04:10 -07:00
Poorna	3a64580663	Add support for site replication healing (#14572 ) heal bucket metadata and IAM entries for sites participating in site replication from the site with the most updated entry. Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <aditya@minio.io>	2022-04-24 02:36:31 -07:00
Harshavardhana	507f993075	attempt to real resolve when there is a quorum failure on reads (#14613 )	2022-04-20 12:49:05 -07:00
Harshavardhana	73a6a60785	fix: replication deleteObject() regression and CopyObject() behavior (#14780 ) This PR fixes two issues - The first fix is a regression from #14555, the fix itself in #14555 is correct but the interpretation of that information by the object layer code for "replication" was not correct. This PR tries to fix this situation by making sure the "Delete" replication works as expected when "VersionPurgeStatus" is already set. Without this fix, there is a DELETE marker created incorrectly on the source where the "DELETE" was triggered. - The second fix is perhaps an older problem started since we inlined-data on the disk for small objects, CopyObject() incorrectly inline's a non-inlined data. This is due to the fact that we have code where we read the `part.1` under certain conditions where the size of the `part.1` is less than the specific "threshold". This eventually causes problems when we are "deleting" the data that is only inlined, which means dataDir is ignored leaving such dataDir on the disk, that looks like an inconsistent content on the namespace. fixes #14767	2022-04-20 10:22:05 -07:00
Anis Elleuch	16431d222c	heal: Enable periodic bitrot scan configuration (#14464 )	2022-04-07 08:10:40 -07:00
Harshavardhana	cf94d1f1f1	do not crash readXLMetaNoData - if the `xl.meta` has incorrect content (#14538 ) ``` tmp = buf[want:] ``` Would potentially crash when `buf` is truncated for some reason and does not have the expected bytes, this is of course considered not normal and is an odd situation. But we do not need to crash here instead allow for errors to be returned and let callers handle the errors.	2022-03-14 09:07:46 -07:00
Harshavardhana	91d419ee6c	warn issues about large block I/O performance for Linux older than 4.0.0 (#14524 ) This PR simply adds a warning message when it detects older kernel versions and warn's them about potential performance issues on this kernel. The issue can be seen only with parallel I/O across all drives on denser setups such as 90 drives or 45 drives per server configurations.	2022-03-10 17:36:13 -08:00
Klaus Post	b890bbfa63	Add local disk health checks (#14447 ) The main goal of this PR is to solve the situation where disks stop responding to operations. This generally causes an FD build-up and eventually will crash the server. This adds detection of hung disks, where calls on disk get stuck. We add functionality to `xlStorageDiskIDCheck` where it keeps track of the number of concurrent requests on a given disk. A total number of 100 operations are allowed. If this limit is reached we will block (but not reject) new requests, but we will monitor the state of the disk. If no requests have been completed or updated within a 15-second window, we mark the disk as offline. Requests that are blocked will be unblocked and return an error as "faulty disk". New requests will be rejected until the disk is marked OK again. Once a disk has been marked faulty, a check will run every 5 seconds that will attempt to write and read back a file. As long as this fails the disk will remain faulty. To prevent lots of long-running requests to mark the disk faulty we implement a callback feature that allows updating the status as parts of these operations are running. We add a reader and writer wrapper that will update the status of each successful read/write operation. This should allow fine enough granularity that a slow, but still operational disk will not reach 15 seconds where 50 operations have not progressed. Note that errors themselves are not enough to mark a disk faulty. A nil (or io.EOF) error will mark a disk as "good". * Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`. * de-couple IsOnline() from disk health tracker The purpose of IsOnline() is to ensure that we reconnect the drive only when the "drive" was - disconnected from network we need to validate if the drive is "correct" and is the same drive which belongs to this server. - drive was replaced we have to format it - we support hot swapping of the drives. IsOnline() is not meant for taking the drive offline when it is hung, it is not useful we can let the drive be online instead "return" errors for relevant calls. * return errFaultyDisk for DiskInfo() call Co-authored-by: Harshavardhana <harsha@minio.io> Possible future Improvements: * Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly. * Allow reads/writes to be aborted by the context. * Add usage stats, concurrent count, blocked operations, etc.	2022-03-09 11:38:54 -08:00
Harshavardhana	b0c84e3de7	fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483 ) This is a side-affect of the optimization done in PR #13544 which causes a certain type of delete operations on given object versions can cause lastVersion indication to be skipped, which leads to an `xl.meta` where Versions[] slice is empty while the entire file is intact by itself. This PR tries to ensure that such files are visible and deletable by regular means of listing as null 'delete-marker' and also avoid the situation where this potential issue might arise.	2022-03-04 20:01:26 -08:00
Harshavardhana	66afa16aed	canceled PUTs throw frivolous logs (#14475 ) remote drives might throw frivolous logs, if the caller canceled the PUT operation in such scenarios there is no reason to log.	2022-03-04 10:31:33 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Anis Elleuch	5dcf1d13a9	ci: Always set disks as non root disks (#14389 ) In the testing mode, reformatting disks will fail because the healing code will complain if one disk is in root mode. This commit will automatically set all disks as non-root if MINIO_CI_CD is set.	2022-02-23 10:11:33 -08:00
Krishnan Parthasarathi	5a0c0079a1	Don't add free-version on restore-object (#14340 )	2022-02-17 15:05:19 -08:00
Harshavardhana	03a6e8aee2	fix: creating steep directory structure on trash folder (#14314 ) weird directory structures get created on the '.trash' folder upon server restarts, this PR fixes this.	2022-02-15 16:34:03 -08:00
Anis Elleuch	1f92fc3fc0	Always check for root disks unless MINIO_CI_CD is set (#14232 ) The current code considers a pool with all root disks to be as part of a testing environment even if there are other pools with mounted disks. This will result to illegitimate writing in root disks. Fix this by simplifing the logic: require MINIO_CI_CD in order to skip root disk check.	2022-02-13 15:42:07 -08:00
Harshavardhana	6123377e66	speedup getFormatErasureInQuorum use driveCount (#14239 ) startup speed-up, currently getFormatErasureInQuorum() would spend up to 2-3secs when there are 3000+ drives for example in a setup, simplify this implementation to use drive counts.	2022-02-04 12:21:21 -08:00
Harshavardhana	24657859a8	when o_direct is disabled do not attempt fadvise call (#14230 )	2022-02-02 08:54:52 -08:00
Harshavardhana	57118919d2	cached diskIDs are not needed for scanner healing (#14170 ) This PR removes an unnecessary state that gets passed around for DiskIDs, which is not necessary since each disk exactly knows which pool and which set it belongs to on a running system. Currently cached DiskId's won't work properly because it always ends up skipping offline disks and never runs healing when disks are offline, as it expects all the cached diskIDs to be present always. This also sort of made things in-flexible in terms perhaps a new diskID for `format.json`. (however this is not a big issue) This is an unnecessary requirement that healing via scanner needs all drives to be online, instead healing should trigger even when partial nodes and drives are available this ensures that we keep the SLA in-tact on the objects when disks are offline for a prolonged period of time.	2022-01-26 08:34:56 -08:00
Krishnan Parthasarathi	ebc3627c73	further improvements to newXLStorage (#14166 ) - create internal erasure volumes only if the disk is unformatted - return a copy of format data in xlStorage.ReadAll - parse env vars only once, to be re-used by xl-storage	2022-01-24 17:09:12 -08:00
Harshavardhana	5a9f133491	speed up startup sequence for all operations (#14148 ) This speed-up is intended for faster startup times for almost all MinIO operations. Changes here are - Drives are not re-read for 'format.json' on a regular basis once read during init is remembered and refreshed at 5 second intervals. - Do not do O_DIRECT tests on drives with existing 'format.json' only fresh setups need this check. - Parallelize initializing erasureSets for multiple sets. - Avoid re-reading format.json when migrating 'format.json' from really old V1->V2->V3 - Keep a copy of local drives for any given server in memory for a quick lookup.	2022-01-24 11:28:45 -08:00
Harshavardhana	70e1cbda21	allow disabling O_DIRECT in certain environments for reads (#14115 ) repeated reads on single large objects in HPC like workloads, need the following option to disable O_DIRECT for a more effective usage of the kernel page-cache. However this optional should be used in very specific situations only, and shouldn't be enabled on all servers. NVMe servers benefit always from keeping O_DIRECT on.	2022-01-17 08:34:14 -08:00
Klaus Post	64d4da5a37	Add Put input readahead (#14084 ) When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs. This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers. We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from. This improves PutObject and PutObjectParts for these cases.	2022-01-14 10:01:25 -08:00
Klaus Post	a2fd8caa69	Ignore version not found in deleteVersions (#14093 ) When deleting multiple versions it "gives" up with an errFileVersionNotFound if a version cannot be found. This effectively skips deleting other versions sent in the same request. This can happen on inconsistent objects. We should ignore errFileVersionNotFound and continue with others. We already ignore these at the caller level, this PR is continuation of `54a9877`	2022-01-13 14:28:07 -08:00
Harshavardhana	f546636c52	fix: use renameAll instead of deleteObject() for purging temporary files (#14096 ) This PR simplifies few things - Multipart parts are renamed, upon failure are unrenamed() keep this multipart specific behavior it is needed and works fine. - AbortMultipart should blindly delete once lock is acquired instead of re-reading metadata and calculating quorum, abort is a delete() operation and client has no business looking for errors on this. - Skip Access() calls to folders that are operating on `.minio.sys/multipart` folder as well.	2022-01-13 11:07:41 -08:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
Harshavardhana	737a3f0bad	fix: decommission bugfixes found during migration of .minio.sys/config (#14078 )	2022-01-10 17:26:00 -08:00
Klaus Post	0e31cff762	fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038 ) * Process sets concurrently. * Disconnect context from request. * Insert context cancellation checks. * errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.	2022-01-06 10:47:49 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
Harshavardhana	5f7e6d03ff	copy bucket slice to avoid skipping .minio.sys/buckets (#13912 ) healing was skipping `.minio.sys/buckets` path so essentially not healing `.usage.json` - fix this by making a copy of `buckets` slice.	2021-12-15 09:18:09 -08:00
Harshavardhana	3b79f7e4ae	ignore if volume exists in MakeVolBulk, return other errors (#13866 )	2021-12-09 15:55:42 -08:00
Harshavardhana	a7c430355a	fix: throw appropriate errors when all disks fail (#13820 ) when all disks fail with same error, fail server startup anyways - we cannot proceed. fixes #13818	2021-12-03 09:25:17 -08:00
Klaus Post	3db931dc0e	Improve listing consistency with version merging (#13723 )	2021-12-02 11:29:16 -08:00
Harshavardhana	b280a37c4d	add delete-marker proactively in DeleteObject() (#13795 ) single object delete was not working properly on a bucket when versioning was suspended, current version 'null' object was never removed. added unit tests to cover the behavior fixes #13783	2021-11-30 18:30:06 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Krishnan Parthasarathi	3da9ee15d3	Add MaxNoncurrentVersions to NoncurrentExpiration action (#13580 ) This unit allows users to limit the maximum number of noncurrent versions of an object. To enable this rule you need the following ilm.json ``` cat >> ilm.json <<EOF { "Rules": [ { "ID": "test-max-noncurrent", "Status": "Enabled", "Filter": { "Prefix": "user-uploads/" }, "NoncurrentVersionExpiration": { "MaxNoncurrentVersions": 5 } } ] } EOF mc ilm import myminio/mybucket < ilm.json ```	2021-11-19 17:54:10 -08:00
Klaus Post	faf013ec84	Improve performance on multiple versions (#13573 ) Existing: ```go type xlMetaV2 struct { Versions []xlMetaV2Version `json:"Versions" msg:"Versions"` } ``` Serialized as regular MessagePack. ```go //msgp:tuple xlMetaV2VersionHeader type xlMetaV2VersionHeader struct { VersionID [16]byte ModTime int64 Type VersionType Flags xlFlags } ``` Serialize as streaming MessagePack, format: ``` int(headerVersion) int(xlmetaVersion) int(nVersions) for each version { binary blob, xlMetaV2VersionHeader, serialized binary blob, xlMetaV2Version, serialized. } ``` xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct can easily be reused and does not contain pointers, so efficient as a slice (single allocation) This allows quickly parsing everything as slices of bytes (no copy). Versions are always saved sorted by modTime, newest first. No more need to sort on load. * Allows checking if a version exists. * Allows reading single version without unmarshal all. * Allows reading latest version of type without unmarshal all. * Allows reading latest version without unmarshal of all. * Allows checking if the latest is deleteMarker by reading first entry. * Allows adding/updating/deleting a version with only header deserialization. * Reduces allocations on conversion to FileInfo(s).	2021-11-18 12:15:22 -08:00
Harshavardhana	886262e58a	heal legacy objects when versioning is enabled after upgrade (#13671 ) legacy objects in 'xl.json' after upgrade, should have following sequence of events - bucket should have versioning enabled and the object should have been overwritten with another version of an object. this situation was not handled, which would lead to older objects to stay perpetually with "legacy" dataDir, however these objects were readable by all means - there weren't converted to newer format. This PR fixes this situation properly.	2021-11-17 15:49:12 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	bb639d9f29	remove double reads delete versions (#13544 ) deleting collection of versions belonging to same object, we can avoid re-reading the xl.meta from the disk instead purge all the requested versions in-memory, the tradeoff is to allocate a map to de-dup the versions, allow disks to be read only once per object. additionally reduce the data transfer between nodes by shortening msgp data values.	2021-11-01 10:50:07 -07:00
Klaus Post	c603f85488	readAllData: Reuse small file buffers (#13530 ) (Re)use small buffers for small readAllData operations.	2021-10-28 17:02:22 -07:00
Krishnan Parthasarathi	939fbb3c38	ilm: Make per-tier stats available via admin-tier-info (#13381 )	2021-10-23 18:38:33 -07:00
Klaus Post	23d6770ff9	Inspect: Preserve permission flags (#13490 ) Preserve permission from disk files. Can help identify issues. Refactor GetRawData function to be cleaner.	2021-10-21 11:20:13 -07:00
Harshavardhana	ac36a377b0	fix: remove deprecated jwks_url from config KV (#13477 )	2021-10-20 11:31:09 -07:00
Harshavardhana	d693431183	fix: ReadFileStream should return an error when size mismatches (#13435 ) offset+length should match the Size() of the individual parts return 'errFileCorrupt' otherwise, to trigger healing of the individual parts do not error out prematurely when healing such bitrot's upon successful parts being written to the client. another issue this PR fixes is to not return and error to the client if we have just triggered a heal on a specific part of the object, instead continue to read all the content and let the heal happen asynchronously later.	2021-10-13 19:49:14 -07:00
Harshavardhana	f8c5c24159	force delete should just use rename() (#13417 ) use rename() instead of forced blocking delete call, faster for large namespaces.	2021-10-12 09:24:00 -07:00
Harshavardhana	f5a55c44d4	fix: do not overwrite error on fallback. (#13415 ) older content was returning '404' upon headObject() due to swallowing of the error, make sure the error is handling independently. fixes #13397	2021-10-11 19:48:42 -07:00

1 2 3 4 5 ...

326 Commits