minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	a6f1e727fb	add tests for ILM transition and healing (#166 ) (#20601 ) This PR fixes a regression introduced in https://github.com/minio/minio/pull/19797 by restoring the healing ability of transitioned objects Bonus: support for transitioned objects to carry original The object name is for future reverse lookups if necessary. Also fix parity calculation for tiered objects to n/2 for n/2 == (parity)	2024-10-31 15:10:24 -07:00
Klaus Post	ed5ed7e490	Trace ILM errors (#20576 ) Some paths would attempt transitions but in case of failures no traces would be emitted. Add traces (with errors) when transition operations fail.	2024-10-22 14:10:34 -07:00
Harshavardhana	dc86b8d9d4	fix: when readQuorum, inconsistent metadata return 404 (#20522 ) in cases where we cannot possibly know a way to read and construct the object, it is impossible to achieve any form of quorum via xl.meta while we have sufficient responses from all the drives, we should return object not found.	2024-10-04 00:13:14 -07:00
Harshavardhana	6186d11761	handle the locks properly for multi-pool callers (#20495 ) - PutObjectMetadata() - PutObjectTags() - DeleteObjectTags() - TransitionObject() - RestoreTransitionObject() Also improve the behavior of multipart code across pool locks, hold locks only once per upload ID for - CompleteMultipartUpload() - AbortMultipartUpload() - ListObjectParts() (read-lock) - GetMultipartInfo() (read-lock) - PutObjectPart() (read-lock) This avoids lock attempts across pools for no reason, this increases O(n) when there are n-pools.	2024-09-29 15:40:36 -07:00
Klaus Post	05a6c170bf	Fix PutObject Trailing checksum (#20456 ) PutObject would verify trailing checksums, but not store them. Fixes #20455	2024-09-19 05:59:07 -07:00
Harshavardhana	5bf41aff17	hold granular locking for multi-pool PutObject() (#20434 ) - PutObject() for multi-pooled was holding large region locks, which was not necessary. This affects almost all slowpoke clients and lengthy uploads. - Re-arrange locks for CompleteMultipart, PutObject to be close to rename()	2024-09-13 13:26:02 -07:00
Harshavardhana	c28a4beeb7	multipart support etag and pre-read small objects (#20423 )	2024-09-12 05:24:04 -07:00
Harshavardhana	bc527eceda	handle the actualSize() properly for PostUpload() (#20422 ) postUpload() incorrectly saves actual size as '-1' we should save correct size when its possible. Bonus: fix the PutObjectPart() write locker, instead of holding a lock before we read the client stream. We should hold it only when we need to commit the parts.	2024-09-11 11:35:37 -07:00
Anis Eleuch	51b1f41518	heal: Persist MRF queue in the disk during shutdown (#19410 )	2024-08-13 15:26:05 -07:00
Harshavardhana	e7a56f35b9	flatten out audit tags, do not send as free-form (#20256 ) move away from map[string]interface{} to map[string]string to simplify the audit, and also provide concise information. avoids large allocations under load(), reduces the amount of audit information generated, as the current implementation was a bit free-form. instead all datastructures must be flattened.	2024-08-13 15:22:04 -07:00
Harshavardhana	89c58ce87d	enhance getActualSize() to return valid values for most situations (#20228 )	2024-08-08 08:29:58 -07:00
Harshavardhana	91805bcab6	add optimizations to bring performance on unversioned READS (#20128 ) allow non-inlined on disk to be inlined via an unversioned ReadVersion() call, we only need ReadXL() to resolve objects with multiple versions only. The choice of this block makes it to be dynamic and chosen by the user via `mc admin config set` Other bonus things - Start measuring internode TTFB performance. - Set TCP_NODELAY, TCP_CORK for low latency	2024-07-23 03:53:03 -07:00
Harshavardhana	7bd1d899bc	remove overzealous check during HEAD() (#19940 ) due to a historic bug in CopyObject() where an inlined object loses its metadata, the check causes an incorrect fallback verifying data-dir. CopyObject() bug was fixed in `ffa91f9794` however the occurrence of this problem is historic, so the aforementioned check is stretching too much. Bonus: simplify fileInfoRaw() to read xl.json as well, also recreate buckets properly.	2024-06-17 07:29:18 -07:00
Anis Eleuch	789cbc6fb2	heal: Dangling check to evaluate object parts separately (#19797 )	2024-06-10 08:51:27 -07:00
Klaus Post	a2cab02554	Fix SSE-C checksums (#19896 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 08:31:51 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Anis Eleuch	bf1769d3e0	xl: Avoid marking a drive offline after one part read failure (#19779 ) This commit will fix one rare case of a multipart object that can be read in theory but GetObject API returned an error. It turned out that a six years old code was marking a drive offline when the bitrot streaming fails to read a part in a disk with any error. This can affect reading a subsequent part, though having enough shards, but unable to construct because one drive was marked offline earlier. This commit will remove the drive marking offline code. It will also close the bitrotstreaming reader before marking it as nil.	2024-05-21 07:36:21 -07:00
Harshavardhana	d3db7d31a3	fix: add deadlines for all synchronous REST callers (#19741 ) add deadlines that can be dynamically changed via the drive max timeout values. Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right away instead of waiting.	2024-05-15 09:52:29 -07:00
Harshavardhana	9b219cd646	fix: return quorum based error, temporary failures must be ignored (#19732 )	2024-05-14 03:29:17 -07:00
Harshavardhana	3549e583a6	results must be a single channel to avoid overwriting `healing.bin` (#19702 )	2024-05-09 10:15:03 -07:00
Harshavardhana	6a15580817	fix: collect quorum errors for deletePrefix() (#19685 ) do not return error for single drive being offline.	2024-05-06 22:44:46 -07:00
Harshavardhana	523bd769f1	add support for specific error response for InvalidRange (#19668 ) fixes #19648 AWS S3 returns the actual object size as part of XML response for InvalidRange error, this is used apparently by SDKs to retry the request without the range.	2024-05-05 09:56:21 -07:00
Krishnan Parthasarathi	7926401cbd	ilm: Handle DeleteAllVersions action differently for DEL markers (#19481 ) i.e., this rule element doesn't apply to DEL markers. This is a breaking change to how ExpiredObejctDeleteAllVersions functions today. This is necessary to avoid the following highly probable footgun scenario in the future. Scenario: The user uses tags-based filtering to select an object's time to live(TTL). The application sometimes deletes objects, too, making its latest version a DEL marker. The previous implementation skipped tag-based filters if the newest version was DEL marker, voiding the tag-based TTL. The user is surprised to find objects that have expired sooner than expected. * Add DelMarkerExpiration action This ILM action removes all versions of an object if its the latest version is a DEL marker. ```xml <DelMarkerObjectExpiration> <Days> 10 </Days> </DelMarkerObjectExpiration> ``` 1. Applies only to objects whose, • The latest version is a DEL marker. • satisfies the number of days criteria 2. Deletes all versions of this object 3. Associated rule can't have tag-based filtering Includes, - New bucket event type for deletion due to DelMarkerExpiration	2024-04-30 18:11:10 -07:00
Harshavardhana	a372c6a377	a bunch of fixes for error handling (#19627 ) - handle errFileCorrupt properly - micro-optimization of sending done() response quicker to close the goroutine. - fix logger.Event() usage in a couple of places - handle the rest of the client to return a different error other than lastErr() when the client is closed.	2024-04-28 10:53:50 -07:00
Poorna	e7aa26dc29	fix: allow DeleteObject unversioned objects with insufficient read quorum (#19581 ) Since the object is being permanently deleted, the lack of read quorum should not matter as long as sufficient disks are online to complete the deletion with parity requirements. If several pools have the same object with insufficient read quorum, attempt to delete object from all the pools where it exists	2024-04-25 17:31:12 -07:00
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Harshavardhana	95c65f4e8f	do not panic on rebalance during server restarts (#19563 ) This PR makes a feasible approach to handle all the scenarios that we must face to avoid returning "panic." Instead, we must return "errServerNotInitialized" when a bucketMetadataSys.Get() is called, allowing the caller to retry their operation and wait. Bonus fix the way data-usage-cache stores the object. Instead of storing usage-cache.bin with the bucket as `.minio.sys/buckets`, the `buckets` must be relative to the bucket `.minio.sys` as part of the object name. Otherwise, there is no way to decommission entries at `.minio.sys/buckets` and their final erasure set positions. A bucket must never have a `/` in it. Adds code to read() from existing data-usage.bin upon upgrade.	2024-04-22 10:49:30 -07:00
Klaus Post	ec816f3840	Reduce parallelReader allocs (#19558 )	2024-04-19 09:44:59 -07:00
Harshavardhana	f65dd3e5a2	reload from drive tier-config when in-memory cache is not found (#19527 ) avoid probing tier target while reloading() tier config	2024-04-16 22:09:58 -07:00
Poorna	ffa91f9794	fix CopyObject with replace overwriting inline status (#19468 ) Fixes #19450 - internal inline-data header can get overwritten during copy with replace before this fix.	2024-04-10 23:42:51 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Klaus Post	b435806d91	Reduce big message RPC allocations (#19390 ) Use `ODirectPoolSmall` buffers for inline data in PutObject. Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.	2024-04-01 16:42:09 -07:00
Shubhendu	468a9fae83	Enable replication of SSE-C objects (#19107 ) If site replication enabled across sites, replicate the SSE-C objects as well. These objects could be read from target sites using the same client encryption keys. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:44:56 -07:00
Harshavardhana	289223b6de	expire ILM all versions verify quorum on action (#19359 )	2024-03-27 23:44:52 -07:00
Harshavardhana	0a56dbde2f	allow configuring inline shard size value (#19336 )	2024-03-26 15:06:19 -07:00
Harshavardhana	ce1c640ce0	feat: allow retaining parity SLA to be configurable (#19260 ) at scale customers might start with failed drives, causing skew in the overall usage ratio per EC set. make this configurable such that customers can turn this off as needed depending on how comfortable they are.	2024-03-14 03:38:33 -07:00
Harshavardhana	88a89213ff	make immediate purge non-blocking up to 100,000 entries per drive (#19231 ) make immediate purge non-blocking upto 100000 entries per drive Bonus: turn-off O_DIRECT verification when FSType is 'XFS'	2024-03-09 18:53:48 -08:00
Harshavardhana	74ccee6619	avoid too much auditing during decom/rebalance make it more robust (#19174 ) there can be a sudden spike in tiny allocations, due to too much auditing being done, also don't hang on the ``` h.logCh <- entry ``` after initializing workers if you do not have a way to dequeue for some reason.	2024-03-06 03:43:16 -08:00
Krishnan Parthasarathi	a7577da768	Improve expiration of tiered objects (#18926 ) - Use a shared worker pool for all ILM expiry tasks - Free version cleanup executes in a separate goroutine - Add a free version only if removing the remote object fails - Add ILM expiry metrics to the node namespace - Move tier journal tasks to expiryState - Remove unused on-disk journal for tiered objects pending deletion - Distribute expiry tasks across workers such that the expiry of versions of the same object serialized - Ability to resize worker pool without server restart - Make scaling down of expiryState workers' concurrency safe; Thanks @klauspost - Add error logs when expiryState and transition state are not initialized (yet) * metrics: Add missed tier journal entry tasks * Initialize the ILM worker pool after the object layer	2024-03-01 21:11:03 -08:00
Harshavardhana	467714f33b	ignore x-amz-storage-class when its set to STANDARD (#19154 ) fixes #19135	2024-02-28 17:44:30 -08:00
Harshavardhana	53aa8f5650	use typos instead of codespell (#19088 )	2024-02-21 22:26:06 -08:00
Harshavardhana	35deb1a8e2	do not block on send channels under high load (#19090 ) all send channels must compete with `ctx` if not they will perpetually stay alive.	2024-02-20 15:00:35 -08:00
Harshavardhana	794a7993cb	calculate correct quorum check for metadata updates on object (#18979 ) this fixes rare bugs we have seen but never really found a reproducer for - PutObjectRetention() returning 503s - PutObjectTags() returning 503s - PutObjectMetadata() updates during replication returning 503s These calls return errors, and this perpetuates with no apparent fix. This PR fixes with correct quorum requirement.	2024-02-05 21:44:40 -08:00
Harshavardhana	fec13b0ec1	remove unused DiskMTime (#18965 )	2024-02-05 01:04:26 -08:00
Anis Eleuch	6ae97aedc9	xl: Disable rename2 in decommissioning/rebalance (#18964 ) Always disable rename2 optimization in decom/rebalance	2024-02-03 14:03:30 -08:00
Anis Eleuch	6fd63e920a	log: Use error log type instead of Application/MinIO type (#18930 ) * log: Use error log type instead of Application/MinIO type Also bump github.com/shirou/gopsutil version to address cross compilation issues. * Apply suggestions from code review Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com> --------- Co-authored-by: Anis Eleuch <anis@min.io> Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2024-02-01 16:13:57 -08:00
Harshavardhana	caac9d216e	remove all the frivolous logs, that may or may not be actionable (#18922 ) for actionable, inspections we have `mc support inspect` we do not need double logging, healing will report relevant errors if any, in terms of quorum lost etc.	2024-01-30 18:11:45 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	708cebe7f0	add necessary protection err, fileInfo slice reads and writes (#18854 ) protection was in place. However, it covered only some areas, so we re-arranged the code to ensure we could hold locks properly. Along with this, remove the DataShardFix code altogether, in deployments with many drive replacements, this can affect and lead to quorum loss.	2024-01-24 01:08:23 -08:00

1 2 3 4 5 ...

328 Commits