minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	2eb52ca5f4	fix: heal bucket metadata right before healing bucket (#11097 ) optimization mainly to avoid listing the entire `.minio.sys/buckets/.minio.sys` directory, this can get really huge and comes in the way of startup routines, contents inside `.minio.sys/buckets/.minio.sys` are rather transient and not necessary to be healed.	2020-12-13 11:57:08 -08:00
Klaus Post	4bca62a0bd	crawler: Stream bucket usage cache data (#11068 ) Stream bucket caches to storage and through RPC calls.	2020-12-10 13:03:22 -08:00
Harshavardhana	f794fe79e3	fix: network shutdown was not handle properly (#10927 ) fixes a regression introduced in #10859, due to the error returned by rest.Client being typed i.e *rest.NetworkError - IsNetworkHostDown function didn't work as expected to detect network issues. This in-turn aggravated the situations when nodes are disconnected leading to performance loss.	2020-11-19 13:53:49 -08:00
Harshavardhana	80b8ce89a4	remove context deadline from Delete calls (#10901 )	2020-11-17 09:09:45 -08:00
Harshavardhana	17a5ff51ff	fix: move context timeout closer to network for Delete calls (#10897 ) allowing for disconnects to be limited to the drive themselves instead of disconnecting all drives.	2020-11-13 16:56:45 -08:00
Klaus Post	06899210a7	Reduce health check output (#10859 ) This will make the health check clients 'silent'. Use `IsNetworkOrHostDown` determine if network is ok so it mimics the functionality in the actual client.	2020-11-10 09:28:23 -08:00
Harshavardhana	1a1f00fa15	fix: use internode data for DisksInfo, VolsInfo in message pack (#10821 ) Similar to #10775 for fewer memory allocations, since we use getOnlineDisks() extensively for listing we should optimize it further. Additionally, remove all unused walkers from the storage layer	2020-11-04 10:10:54 -08:00
Klaus Post	1e11b4629f	Add remote Diskinfo caching (#10824 ) Add 1 second remote disk info cache. Should decrease need for remote calls a great deal due to how actively it is used now.	2020-11-04 08:00:18 -08:00
Klaus Post	37749f4623	Optimize FileInfo(Version) transfer (#10775 ) File Info decoding, in particular, is showing up as a major allocator and time consumer for internode data transfers Switch to message pack for cross-server transfers: ``` MSGP: Size: 945 bytes BenchmarkEncodeFileInfoMsgp-32 1558444 866 ns/op 1.16 MB/s 0 B/op 0 allocs/op BenchmarkDecodeFileInfoMsgp-32 479968 2487 ns/op 0.40 MB/s 848 B/op 18 allocs/op GOB: Size: 1409 bytes BenchmarkEncodeFileInfoGOB-32 333339 3237 ns/op 0.31 MB/s 576 B/op 19 allocs/op BenchmarkDecodeFileInfoGOB-32 20869 57837 ns/op 0.02 MB/s 16439 B/op 428 allocs/op ```	2020-11-02 17:07:52 -08:00
Klaus Post	86e0d272f3	Reduce WriteAll allocs (#10810 ) WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers used by `io.CopyBuffer` even if they are pooled. Since all writers appear to write byte buffers, just send those instead and write directly. The files are opened through the `os` package so they have no special properties anyway. This removes the alloc and copy for each operation. REST sends content length so a precise alloc can be made.	2020-11-02 16:14:31 -08:00
Harshavardhana	4c773f7068	re-use remote transports in Peer,Storage,Locker clients (#10788 ) use one transport for internode communication	2020-11-02 07:43:11 -08:00
Klaus Post	e63a44b734	rest client: Expect context timeouts for locks (#10782 ) Add option for rest clients to not mark a remote offline for context timeouts. This can be used if context timeouts are expected on the call.	2020-10-29 09:52:11 -07:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Anis Elleuch	eb95353cb1	fix: Get/HeadObject return 404 on non quorum objects (#10753 )	2020-10-26 10:30:46 -07:00
Harshavardhana	029758cb20	fix: retain the previous UUID for newly replaced drives (#10759 ) only newly replaced drives get the new `format.json`, this avoids disks reloading their in-memory reference format, ensures that drives are online without reloading the in-memory reference format. keeping reference format in-tact means UUIDs never change once they are formatted.	2020-10-26 10:29:29 -07:00
Harshavardhana	71b97fd3ac	fix: connect disks pre-emptively during startup (#10669 ) connect disks pre-emptively upon startup, to ensure we have enough disks are connected at startup rather than wait for them. we need to do this to avoid long wait times for server to be online when we have servers come up in rolling upgrade fashion	2020-10-13 18:28:42 -07:00
Harshavardhana	1f9abbee4d	make sure to release locks upon timeout (#10596 ) fixes #10418	2020-09-29 15:18:34 -07:00
Harshavardhana	00eb6f6bc9	cache DiskInfo at storage layer for performance (#10586 ) `mc admin info` on busy setups will not move HDD heads unnecessarily for repeated calls, provides a better responsiveness for the call overall. Bonus change allow listTolerancePerSet be N-1 for good entries, to avoid skipping entries for some reason one of the disk went offline.	2020-09-29 09:54:41 -07:00
Harshavardhana	66174692a2	add '.healing.bin' for tracking currently healing disk (#10573 ) add a hint on the disk to allow for tracking fresh disk being healed, to allow for restartable heals, and also use this as a way to track and remove disks. There are more pending changes where we should move all the disk formatting logic to backend drives, this PR doesn't deal with this refactor instead makes it easier to track healing in the future.	2020-09-28 19:39:32 -07:00
Harshavardhana	81caf35926	fix: reduce healthcheck interval for storage rest client (#10544 )	2020-09-23 10:43:42 -07:00
Anis Elleuch	4c81201f95	fix: healing delete marker on versioned buckets (#10530 ) Healing was not working correctly in the distributed mode because errFileVersionNotFound was not properly converted in storage rest client. Besides, fixing the healing delete marker is not working as expected.	2020-09-21 15:16:16 -07:00
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	2020-09-04 09:45:06 -07:00
Klaus Post	c097ce9c32	continous healing based on crawler (#10103 ) Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914	2020-08-24 13:47:01 -07:00
Harshavardhana	e7ba78beee	use GlobalContext instead of context.Background when possible (#10254 )	2020-08-13 09:16:01 -07:00
Harshavardhana	1e2ebc9945	feat: time to bring back http2.0 support (#10230 ) Bonus move our CI/CD to go1.14	2020-08-10 09:02:29 -07:00
Harshavardhana	0b8255529a	fix: proxies set keep-alive timeouts to be system dependent (#10199 ) Split the DialContext's one for internode and another for all other external communications especially proxy forwarders, gateway transport etc.	2020-08-04 14:55:53 -07:00
Harshavardhana	a880283593	Send the lower level error directly from GetDiskID() (#10095 ) this is to detect situations of corruption disk format etc errors quickly and keep the disk online in such scenarios for requests to fail appropriately.	2020-07-21 13:54:06 -07:00
Klaus Post	d84fc58cac	fix: CheckParts endpoint call to correct API (#10073 ) CheckParts is calling the wrong endpoint, so instead of checking parts, it is writing metadata.	2020-07-17 10:17:59 -07:00
Harshavardhana	e7d7d5232c	fix: admin info output and improve overall performance (#10015 ) - admin info node offline check is now quicker - admin info now doesn't duplicate the code across doing the same checks for disks - rely on StorageInfo to return appropriate errors instead of calling locally. - diskID checks now return proper errors when disk not found v/s format.json missing. - add more disk states for more clarity on the underlying disk errors.	2020-07-13 09:51:07 -07:00
Harshavardhana	3b9fbf80ad	fix: make sure to use new restClient for healthcheck (#10026 ) Without instantiating a new rest client we can have a recursive error which can lead to healthcheck returning always offline, this can prematurely take the servers offline.	2020-07-11 22:19:38 -07:00
Klaus Post	abd999f64a	fix: list object versions in distributed setup (#9958 ) Remove calls to `WalkVersions` was calling the wrong endpoint, so unless quorum could be reached with local disks no results would ever be returned.	2020-07-02 10:29:50 -07:00
Harshavardhana	f7f12b8604	fix: crash in storage rest client due to spurious query params (#9924 ) regression got introduced in `dee3cf2d7f` when the DeleteVersion API was changed, but the corresponding query params were left in-tact.	2020-06-26 16:49:49 -07:00
Harshavardhana	dee3cf2d7f	fix: preserve modTime for DeleteMarker on remote disks (#9905 )	2020-06-23 10:20:31 -07:00
Harshavardhana	7ed1077879	Add a custom healthcheck function for online status (#9858 ) - Add changes to ensure remote disks are not incorrectly taken online if their order has changed or are incorrect disks. - Bring changes to peer to detect disconnection with separate Health handler, to avoid a rather expensive call GetLocakDiskIDs() - Follow up on the same changes for Lockers as well	2020-06-17 14:49:26 -07:00
Klaus Post	3ba4804d6c	Move online status to REST client (#9808 )	2020-06-16 18:59:32 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Klaus Post	43d6e3ae06	merge object lifecycle checks into usage crawler (#9579 )	2020-06-12 10:28:21 -07:00
Harshavardhana	b2db8123ec	Preserve errors returned by diskInfo to detect disk errors (#9727 ) This PR basically reverts #9720 and re-implements it differently	2020-05-28 13:03:04 -07:00
Anis Elleuch	375b79f11b	storage: Implement GetDiskID request in REST server side (#9720 ) GetDiskID() in storage rest client does not really issue a REST request to the remote disk, but returns an in-memory value instead. However, GetDiskID() should return an error when format.json is not found or for other similar issues (unmounted disks, etc..) GetDiskID() is only called when formatting disks and getting storage informatio, hence this commit should not have a performance degradation.	2020-05-28 08:17:42 -07:00
Anis Elleuch	9baeda781a	fix storage info output with unordered endpoints arguments (#9610 ) Shuffling arguments that we pass to MinIO server are supported. However, when that happens, Prometheus returns wrong information about disks usage and online/offline status. The commit fixes the issue by avoiding relying on xl.endpoints since it is not ordered.	2020-05-19 14:27:20 -07:00
Klaus Post	c4464e36c8	fix: limit HTTP transport tuables to affordable values (#9383 ) Close connections pro-actively in transient calls	2020-04-17 11:20:56 -07:00
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	2020-04-09 09:30:02 -07:00
Bala FA	2c3e34f001	add force delete option of non-empty bucket (#9166 ) passing HTTP header `x-minio-force-delete: true` would allow standard S3 API DeleteBucket to delete a non-empty bucket forcefully.	2020-03-27 21:52:59 -07:00
Harshavardhana	6f992134a2	fix: startup load time by reusing storageDisks (#9210 )	2020-03-27 14:48:30 -07:00
Krishna Srinivas	45b1c66195	fix: implement splunk specific listObjects when delimiter=guidSplunk (#9186 )	2020-03-22 19:23:47 -07:00
Klaus Post	8d98662633	re-implement data usage crawler to be more efficient (#9075 ) Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b	2020-03-18 16:19:29 -07:00
Anis Elleuch	0af62d35a0	xl: Implement posix.DeletePrefixes to enhance delete perf (#9100 ) Bulk delete API was using cleanupObjectsBulk() which calls posix listing and delete API to remove objects internal files in the backend (xl.json and parts) one by one. Add DeletePrefixes in the storage API to remove the content of a directory in a single call. Also use a remove goroutine for each disk to accelerate removal.	2020-03-11 08:56:36 -07:00
Harshavardhana	23a8411732	Add a generic Walk()'er to list a bucket, optinally prefix (#9026 ) This generic Walk() is used by likes of Lifecyle, or KMS to rotate keys or any other functionality which relies on this functionality.	2020-02-25 21:22:28 +05:30
Harshavardhana	7ce63b3078	fix: multi-delete API write quorum failures (#8926 ) multi-delete API failed with write quorum errors under following situations - list of files requested for delete doesn't exist anymore can lead to quorum errors and failure - due to usage of query param for paths, for really long paths MinIO server rejects these requests as malformed as unexpected. This was reproduced with warp	2020-02-01 18:11:29 -08:00
Harshavardhana	0879a4f743	rest/storage: Remove racy LastError usage (#8817 ) instead perform a liveness check call to verify if server is online and print relevant errors. Also introduce a StorageErr string error type instead of errors.New() deprecate usage of VerifyFileError, DeleteFileError for gob, change in datastructure also requires bump in storage REST version to v13. Fixes #8811	2020-01-14 18:45:17 -08:00
Klaus Post	37b32199e3	Validate XL sets on format (#8779 ) When formatting a set validate if a host failure will likely lead to data loss. While we don't know what config will be set in the future evaluate to our best knowledge, assuming default settings.	2020-01-13 13:09:10 -08:00
Harshavardhana	f68a7005c0	Improve disk formatting stage for large disk sets (#8690 )	2019-12-23 16:31:03 -08:00
Anis Elleuch	555969ee42	Add data usage collect with its new admin API (#8553 ) Admin data usage info API returns the following (Only FS & XL, for now) - Number of buckets - Number of objects - The total size of objects - Objects histogram - Bucket sizes	2019-12-12 06:02:37 -08:00
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	2019-11-13 12:17:45 -08:00
Harshavardhana	4e63e0e372	Return appropriate errors API versions changes across REST APIs (#8480 ) This PR adds code to appropriately handle versioning issues that come up quite constantly across our API changes. Currently we were also routing our requests wrong which sort of made it harder to write a consistent error handling code to appropriately reject or honor requests. This PR potentially fixes issues - old mc is used against new minio release which is incompatible returns an appropriate for client action. - any older servers talking to each other, report appropriate error - incompatible peer servers should report error and reject the calls with appropriate error	2019-11-04 09:30:59 -08:00
Krishna Srinivas	980bf78b4d	Detect underlying disk mount/unmount (#8408 )	2019-10-25 10:37:53 -07:00
Harshavardhana	6a4ef2e48e	Initialize configs correctly, move notification config (#8367 ) This PR also removes deprecated tests, adds checks to avoid races reproduced on CI/CD.	2019-10-09 11:41:15 +05:30
Harshavardhana	ff5bf51952	admin/heal: Fix deep healing to heal objects under more conditions (#8321 ) - Heal if the part.1 is truncated from its original size - Heal if the part.1 fails while being verified in between - Heal if the part.1 fails while being at a certain offset Other cleanups include make sure to flush the HTTP responses properly from storage-rest-server, avoid using 'defer' to improve call latency. 'defer' incurs latency avoid them in our hot-paths such as storage-rest handlers. Fixes #8319	2019-10-02 01:42:15 +05:30
Harshavardhana	f45977d371	Fix error handling in DeleteFileBulk storage handler (#8327 ) errors.errorString() cannot be marshalled by gob encoder, so using a slice of []error would fail to be encoded. This leads to no errors being generated instead gob.Decoder on the storage-client would see an io.EOF To avoid such bugs introduce a typed error for handling such translations and register this type for gob encoding support.	2019-09-30 19:01:28 -07:00
Klaus Post	ff726969aa	Switch to Snappy -> S2 compression (#8189 )	2019-09-25 23:08:24 -07:00
Praveen raj Mani	8700945cdf	Handle connection failures on webhook/url pings (#8204 ) Properly handle connection failures while replaying events Fixes #8194	2019-09-12 16:44:51 -07:00
Anis Elleuch	e7b3f39064	xl: Fix verifying non streaming highway algo with a dist setup (#8230 ) VerifyFile in the distributed setup does not work with the non streaming highway hash. The reason is that the internode mux router did not expect `storageRESTBitrotHash` parameter.	2019-09-12 13:08:02 -07:00
Anis Elleuch	3f258062d8	bitrot: Verify file size inside storage interface (#7932 )	2019-09-12 02:19:53 +05:30
Harshavardhana	6e7962bf35	Return if paths are empty in DeleteFileBulk (#8085 ) This avoids a network call, also fixes an issue when empty paths are passed the underlying call fails with "405 Method Not Allowed". This is reproducible when you are deleting a non-existent object. Fixes #8083	2019-08-15 13:15:49 -07:00
Harshavardhana	54eded2e6f	Do not assume all HTTP errors as Network errors (#7983 ) In situations such as when client uploading data, prematurely disconnects from server such as pressing ctrl-c before uploading all the data. Under this situation in distributed setup we prematurely disconnect disks causing a reconnect loop. This has an adverse affect we end up leaving a lot of files in temporary location which ideally should have been cleaned up when Put() prematurely fails. This is also a regression which got introduced in #7610	2019-07-29 14:48:18 -07:00
Anis Elleuch	000a60f238	xl: Heal empty parts (#7860 ) posix.VerifyFile() doesn't know how to check if a file is corrupted if that file is empty. We do have the part size in xl.json so we pass it to VerifyFile to return an error so healing empty parts can work properly.	2019-07-13 00:29:44 +01:00
Krishna Srinivas	58d90ed73c	Avoid network transfer for bitrot verification during healing (#7375 )	2019-07-08 13:51:18 -07:00
Krishna Srinivas	74e2fe0879	Return "SlowDown" to S3 clients for network related errors (#7610 ) Consider errors returned by httpClient.Do() as network errors. This is because the http clients returns different types of errors and it is hard to catch all the error types.	2019-05-29 10:21:47 -07:00
Harshavardhana	39b3e4f9b3	Avoid using io.ReadFull() for WriteAll and CreateFile (#7676 ) With these changes we are now able to peak performances for all Write() operations across disks HDD and NVMe. Also adds readahead for disk reads, which also increases performance for reads by 3x.	2019-05-22 13:47:15 -07:00
Harshavardhana	b3f22eac56	Offload listing to posix layer (#7611 ) This PR adds one API WalkCh which sorts and sends list over the network Each disk walks independently in a sorted manner.	2019-05-14 13:49:10 -07:00
Anis Elleuch	9c90a28546	Implement bulk delete (#7607 ) Bulk delete at storage level in Multiple Delete Objects API In order to accelerate bulk delete in Multiple Delete objects API, a new bulk delete is introduced in storage layer, which will accept a list of objects to delete rather than only one. Consequently, a new API is also need to be added to Object API.	2019-05-13 12:25:49 -07:00
poornas	cf2a436bc8	Show SlowDown error message if backend is busy (#7521 ) or if there are too many open file descriptors.	2019-05-02 07:09:57 -07:00
Harshavardhana	f767a2538a	Optimize listing with leaf check offloaded to posix (#7541 ) Other listing optimizations include - remove double sorting while filtering object entries - improve error message when upload-id is not in quorum - use jsoniter for full unmarshal json, instead of gjson - remove unused code	2019-04-23 14:54:28 -07:00
kannappanr	d2f42d830f	Lock: Use REST API instead of RPC (#7469 ) In distributed mode, use REST API to acquire and manage locks instead of RPC. RPC has been completely removed from MinIO source. Since we are moving from RPC to REST, we cannot use rolling upgrades as the nodes that have not yet been upgraded cannot talk to the ones that have been upgraded. We expect all minio processes on all nodes to be stopped and then the upgrade process to be completed. Also force http1.1 for inter-node communication	2019-04-17 23:16:27 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Anis Elleuch	53011606a5	Show 401 unauthorized msg when nodes are started with different creds (#7433 ) Before this commit, nodes wait indefinitely without showing any indicate error message when a node is started with different access and secret keys. This PR will show '401 Unauthorized' in this case.	2019-04-02 12:25:34 -07:00
Harshavardhana	a9032b52b8	Change storageRESTTimeout to 1minute (#7398 )	2019-03-20 13:20:09 -07:00
Harshavardhana	396d78352d	Support HTTP/2.0 (#7204 ) Fixes #6704	2019-02-14 17:53:46 -08:00
Krishna Srinivas	90213ff1b2	Detect peer reboots to invalidate current storage REST clients (#7227 )	2019-02-13 15:29:46 -08:00
Anis Elleuch	f9fecf0e76	storage: Increase the timeout of storage REST requests (#7218 ) This commit increases storage REST requests to 5 minutes, this includes the opening TCP connection, and sending/receiving data. This will reduce clients receiving errors when the server is under high load.	2019-02-12 23:27:33 -08:00
Harshavardhana	817269475f	Make sure to drain body upon an error (#7197 ) Also cleanup redundant code and use it at a common place	2019-02-06 12:07:03 -08:00
Krishna Srinivas	b18c0478e7	Only heal on disks where we are sure that healing is needed (#7148 )	2019-01-30 10:53:57 -08:00
Krishna Srinivas	98c950aacd	Streaming bitrot verification support (#7004 )	2019-01-17 18:28:18 +05:30
Anis Elleuch	69bd6df464	storage: Implement Close() in REST client (#6826 ) Calling /minio/prometheuses/metrics calls xlSets.StorageInfo() which creates a new storage REST client and closes it. However, currently, closing does nothing to the underlying opened http client. This commit introduces a closing behavior by calling CloseIdleConnections provided by http.Transport upon the initialization of this latter.	2018-11-20 11:07:19 -08:00
Harshavardhana	f1f23f6f11	Add sync mode for 'xl.json' (#6798 ) xl.json is the source of truth for all erasure coded objects, without which we won't be able to read the objects properly. This PR enables sync mode for writing `xl.json` such all writes go hit the disk and are persistent under situations such as abrupt power failures on servers running Minio.	2018-11-14 19:48:35 +05:30
Krishna Srinivas	81bee93b8d	Move remote disk StorageAPI abstraction from RPC to REST (#6464 )	2018-10-04 17:44:06 -07:00

1 2 3

136 Commits