minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	e019f21bda	fix: trigger heal if one of the parts are not found (#11358 ) Previously we added heal trigger when bit-rot checks failed, now extend that to support heal when parts are not found either. This healing gets only triggered if we can successfully decode the object i.e read quorum is still satisfied for the object.	2021-01-27 10:21:14 -08:00
Anis Elleuch	e9ac7b0fb7	heal: Remove empty directories (#11354 ) Since the introduction of __XLDIR__, an empty directory does not have a meaning anymore in erasure mode. Make healing removes it wherever it finds it.	2021-01-27 02:19:28 -08:00
Anis Elleuch	00cff1aac5	audit: per object send pool number, set number and servers per operation (#11233 )	2021-01-26 13:21:51 -08:00
Harshavardhana	4315f93421	fix: make sure parentDirIsObject is used at set level (#11280 ) parentDirIsObject is not using set level understanding to check for parent objects, without this it can lead to objects that can actually reside on a separate set as objects and would conflict.	2021-01-17 01:11:48 -08:00
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	2021-01-16 12:08:02 -08:00
Poorna Krishnamoorthy	7824e19d20	Allow synchronous replication if enabled. (#11165 ) Synchronous replication can be enabled by setting the --sync flag while adding a remote replication target. This PR also adds proxying on GET/HEAD to another node in a active-active replication setup in the event of a 404 on the current node.	2021-01-11 22:36:51 -08:00
Harshavardhana	f21d650ed4	fix: readData in bulk call using messagepack byte wrappers (#11228 ) This PR refactors the way we use buffers for O_DIRECT and to re-use those buffers for messagepack reader writer. After some extensive benchmarking found that not all objects have this benefit, and only objects smaller than 64KiB see this benefit overall. Benefits are seen from almost all objects from 1KiB - 32KiB Beyond this no objects see benefit with bulk call approach as the latency of bytes sent over the wire v/s streaming content directly from disk negate each other with no remarkable benefits. All other optimizations include reuse of msgp.Reader, msgp.Writer using sync.Pool's for all internode calls.	2021-01-07 19:27:31 -08:00
Harshavardhana	d0027c3c41	do not use large buffers if not necessary (#11220 ) without this change, there is a performance regression for small objects GETs, this makes the overall speed to go back to pre '59d363' commit days.	2021-01-04 18:51:52 -08:00
Harshavardhana	c4131c2798	feat: Small object optimization read data in single bulk call (#11207 )	2021-01-03 11:27:57 -08:00
Anis Elleuch	677e80c0f8	xl: Remove check-dir in ReadVersion (#11200 ) The only purpose of check-dir flag in ReadVersion is to return 404 when an object has xl.meta but without data. This is causing an extract call to the disk which can be penalizing in case of busy system where disks receive many concurrent access.	2021-01-02 10:35:57 -08:00
Harshavardhana	027e17468a	fix: discarding results do not attempt in-memory metacache writer (#11163 ) Optimizations include - do not write the metacache block if the size of the block is '0' and it is the first block - where listing is attempted for a transient prefix, this helps to avoid creating lots of empty metacache entries for `minioMetaBucket` - avoid the entire initialization sequence of cacheCh , metacacheBlockWriter if we are simply going to skip them when discardResults is set to true. - No need to hold write locks while writing metacache blocks - each block is unique, per bucket, per prefix and also is written by a single node.	2020-12-24 15:02:02 -08:00
Harshavardhana	35fafb837b	fix: issues with handling delete markers in metacache (#11150 ) Additional cases handled - fix address situations where healing is not triggered on failed writes and deletes. - consider object exists during listing when metadata can be successfully decoded.	2020-12-22 09:16:43 -08:00
Poorna Krishnamoorthy	9adc33efbb	Return version-id header in DeleteObject response (#11090 ) even when the object version is non-existent To make this consistent with aws behavior. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>	2020-12-11 16:58:15 -08:00
Harshavardhana	ce93b2681b	fix: re-use er.getDisks() properly in certain calls (#11043 )	2020-12-07 10:04:07 -08:00
Harshavardhana	790833f3b2	Revert "Support variable server sets (#10314 )" This reverts commit `aabf053d2f`.	2020-12-01 12:02:29 -08:00
Harshavardhana	bdd094bc39	fix: avoid sending errors on missing objects on locked buckets (#10994 ) make sure multi-object delete returned errors that are AWS S3 compatible	2020-11-28 21:15:45 -08:00
Harshavardhana	aabf053d2f	Support variable server sets (#10314 )	2020-11-25 16:28:47 -08:00
Poorna Krishnamoorthy	2ff655a745	Refactor replication, ILM handling in DELETE API (#10945 )	2020-11-25 11:24:50 -08:00
Poorna Krishnamoorthy	39f3d5493b	Show Delete replication status header (#10946 ) X-Minio-Replication-Delete-Status header shows the status of the replication of a permanent delete of a version. All GETs are disallowed and return 405 on this object version. In the case of replicating delete markers. X-Minio-Replication-DeleteMarker-Status shows the status of replication, and would similarly return 405. Additionally, this PR adds reporting of delete marker event completion and updates documentation	2020-11-21 23:48:50 -08:00
Poorna Krishnamoorthy	08b24620c0	Display storage-class of transitioned object in HEAD	2020-11-20 09:17:31 -08:00
Harshavardhana	95675b0c9a	fix: do not crash PutObjectTags when node is down (#10940 ) fixes #10939	2020-11-20 09:10:48 -08:00
Poorna Krishnamoorthy	251c1ef6da	Add support for replication of object tags, retention metadata (#10880 )	2020-11-19 18:56:09 -08:00
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	2020-11-19 18:47:17 -08:00
Harshavardhana	8f7fe0405e	fix: delete marker replication should support directories (#10878 ) allow directories to be replicated as well, along with their delete markers in replication. Bonus fix to fix bloom filter updates for directories to be preserved.	2020-11-19 18:47:12 -08:00
Harshavardhana	9a34fd5c4a	Revert "Revert "Add delete marker replication support (#10396 )"" This reverts commit `267d7bf0a9`.	2020-11-19 18:43:58 -08:00
Harshavardhana	17a5ff51ff	fix: move context timeout closer to network for Delete calls (#10897 ) allowing for disconnects to be limited to the drive themselves instead of disconnecting all drives.	2020-11-13 16:56:45 -08:00
Harshavardhana	267d7bf0a9	Revert "Add delete marker replication support (#10396 )" This reverts commit `50c10a5087`. PR is moved to origin/dev branch	2020-11-12 11:43:14 -08:00
Poorna Krishnamoorthy	50c10a5087	Add delete marker replication support (#10396 ) Delete marker replication is implemented for V2 configuration specified in AWS spec (though AWS allows it only in the V1 configuration). This PR also brings in a MinIO only extension of replicating permanent deletes, i.e. deletes specifying version id are replicated to target cluster.	2020-11-10 15:24:14 -08:00
Harshavardhana	b72cac4cf3	fix: dangling objects on actual namespace (#10822 )	2020-11-05 11:48:55 -08:00
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	2020-11-04 08:25:42 -08:00
Harshavardhana	8527f22df1	optimize request URL encoding for internode (#10811 ) this reduces allocations in order of magnitude Also, revert "erasure: delete dangling objects automatically (#10765)" affects list caching should be investigated.	2020-11-02 15:15:12 -08:00
Anis Elleuch	b456292295	erasure: delete dangling objects automatically (#10765 )	2020-11-02 10:49:30 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Harshavardhana	5b30bbda92	fix: add more protection distribution to match EcIndex (#10772 ) allows for more stricter validation in picking up the right set of disks for reconstruction.	2020-10-28 00:09:15 -07:00
Krishna Srinivas	c49a80db41	fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764 )	2020-10-26 16:19:42 -07:00
Anis Elleuch	eb95353cb1	fix: Get/HeadObject return 404 on non quorum objects (#10753 )	2020-10-26 10:30:46 -07:00
Harshavardhana	253194e491	do not hold write locks - if objects don't exist (#10644 )	2020-10-08 17:47:21 -07:00
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	2020-10-08 12:32:32 -07:00
Harshavardhana	18063bf25c	fix: cleanup old directory handling code (#10633 ) we don't need them anymore, remove legacy code.	2020-10-06 12:03:57 -07:00
Harshavardhana	6fcbdd5607	remove unused putObjectDir code (#10528 )	2020-09-21 09:41:39 -07:00
Harshavardhana	02c1a08a5b	fix: make sure to lock CopyObject for in-place updates (#10492 )	2020-09-15 20:44:48 -07:00
Harshavardhana	0104af6bcc	delayed locks until we have started reading the body (#10474 ) This is to ensure that Go contexts work properly, after some interesting experiments I found that Go net/http doesn't cancel the context when Body is non-zero and hasn't been read till EOF. The following gist explains this, this can lead to pile up of go-routines on the server which will never be canceled and will die at a really later point in time, which can simply overwhelm the server. https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150 To avoid this refactor the locking such that we take locks after we have started reading from the body and only take locks when needed. Also, remove contextReader as it's not useful, doesn't work as expected context is not canceled until the body reaches EOF so there is no point in wrapping it with context and putting a `select {` on it which can unnecessarily increase the CPU overhead. We will still use the context to cancel the lockers etc. Additional simplification in the locker code to avoid timers as re-using them is a complicated ordeal avoid them in the hot path, since locking is very common this may avoid lots of allocations.	2020-09-14 15:57:13 -07:00
Harshavardhana	48919de301	fix: for defer'ed deleteObject use internal context (#10463 )	2020-09-11 06:39:19 -07:00
Harshavardhana	c13afd56e8	Remove MaxConnsPerHost settings to avoid potential hangs (#10438 ) MaxConnsPerHost can potentially hang a call without any way to timeout, we do not need this setting for our proxy and gateway implementations instead IdleConn settings are good enough. Also ensure to use NewRequestWithContext and make sure to take the disks offline only for network errors. Fixes #10304	2020-09-08 14:22:04 -07:00
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	2020-09-04 09:45:06 -07:00
Harshavardhana	37da0c647e	fix: delete marker compatibility behavior for suspended bucket (#10395 ) - delete-marker should be created on a suspended bucket as `null` - delete-marker should delete any pre-existing `null` versioned object and create an entry `null`	2020-09-02 00:19:03 -07:00
poornas	79e21601b0	fix: web handlers to enforce replication (#10249 ) This PR also preserves source ETag for replication	2020-08-12 17:32:24 -07:00
Harshavardhana	5ce82b45da	add CopyObject optimization when source and destination are same (#10170 ) when source and destination are same and versioning is enabled on the destination bucket - we do not need to re-create the entire object once again to optimize on space utilization. Cases this PR is not supporting - any pre-existing legacy object will not be preserved in this manner, meaning a new dataDir will be created. - key-rotation and storage class changes of course will never re-use the dataDir	2020-08-03 16:21:10 -07:00
Harshavardhana	b68bc75dad	fix: quorum calculation mistake with reduced parity (#10186 ) With reduced parity our write quorum should be same as read quorum, but code was still assuming ``` readQuorum+1 ``` In all situations which is not necessary.	2020-08-03 12:15:08 -07:00
Harshavardhana	35212b673e	add unformatted disk as part of the error list (#10128 ) these errors should be ignored for quorum error calculation to ensure that we don't prematurely return unformatted disk error as part of API calls	2020-07-24 13:16:11 -07:00

1 2

61 Commits