minio

mirror of https://github.com/minio/minio.git synced 2024-12-24 22:25:54 -05:00

Author	SHA1	Message	Date
Anis Eleuch	fe63664164	prom: Add drive failure tolerance per erasure set (#18424 )	2023-11-13 00:59:48 -08:00
Sveinn	9afdb05bf4	fix: file consistency issue on SFTP upload (#18422 ) * creating a byte buffer for SFTP file segments * Adding an error condition for when there are remaining segments in the queue * Simplification of the queue using a map	2023-11-11 00:14:41 -08:00
Krishnan Parthasarathi	9569a85cee	Avoid allocs for MRF on-disk header (#18425 )	2023-11-10 19:54:46 -08:00
Harshavardhana	54721b7c7b	fix: batch replication from source allow out of band deletes (#18423 ) it is possible that ILM or Deletes got triggered on batch of objects that we are attempting to batch replicate, ignore this scenario as valid behavior.	2023-11-10 16:12:35 -08:00
Harshavardhana	91d8bddbd1	use sendfile/splice implementation to perform DMA (#18411 ) sendfile implementation to perform DMA on all platforms Go stdlib already supports sendfile/splice implementations for - Linux - Windows - *BSD - Solaris Along with this change however O_DIRECT for reads() must be removed as well since we need to use sendfile() implementation The main reason to add O_DIRECT for reads was to reduce the chances of page-cache causing OOMs for MinIO, however it would seem that avoiding buffer copies from user-space to kernel space this issue is not a problem anymore. There is no Go based memory allocation required, and neither the page-cache is referenced back to MinIO. This page- cache reference is fully owned by kernel at this point, this essentially should solve the problem of page-cache build up. With this now we also support SG - when NIC supports Scatter/Gather https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing)	2023-11-10 10:10:14 -08:00
Harshavardhana	80adc87a14	converge WARM tier object name to hash of deployment+bucket (#18410 ) this is to ensure that we can converge and save IOPs when hot-tier accesses MinIO.	2023-11-10 02:15:13 -08:00
Taran Pelkey	117ad1b65b	Loosen requirements to detach policies for LDAP (#18419 )	2023-11-09 14:44:43 -08:00
Klaus Post	2229509362	fix: leaking offline disks in MarkOffline() thread (#18414 ) `monitorAndConnectEndpoints` will continue to attempt to reconnect offline disks. Since disks were never closed, a `MarkOffline` would continue to try to check these disks forever. Close previous disks.	2023-11-09 09:33:32 -08:00
Krishnan Parthasarathi	0a25083fdb	Tiered objects require ns locks unlike inlined (#18409 )	2023-11-08 20:00:02 -08:00
Sveinn	15137d0327	refactor SFTP to use the new minio/pkg implementation (#18406 )	2023-11-08 09:47:05 -08:00
Poorna	8c9974bc0f	site replication: avoid propagating bucket b/w settings (#18399 ) replication mode and bucket bandwidth are one-way and should not be propagated to peer cluster. Regression from #18062	2023-11-08 00:40:25 -08:00
jiuker	079b6c2b50	fix: add err when all bucket resync failed (#18401 )	2023-11-08 00:40:08 -08:00
Harshavardhana	754f7a8a39	replace io.Discard usage to fix some NUMA copy() latencies (#18394 ) replace io.Discard usage to fix NUMA copy() latencies On NUMA systems copying from 8K buffer allocated via io.Discard leads to large latency build-up for every ``` copy(new8kbuf, largebuf) ``` can in-cur upto 1ms worth of latencies on NUMA systems due to memory sharding across NUMA nodes.	2023-11-06 14:26:08 -08:00
Harshavardhana	64bafe1dfe	skip speedtest bucket from site-replication (#18393 )	2023-11-06 11:52:33 -08:00
jiuker	c3e456e7e6	fix: no resyncid when site-replication cancel (#18392 )	2023-11-06 01:53:31 -08:00
vicmunoz	da95a2d13f	fix: object versions metric help (#18388 )	2023-11-03 11:43:52 -07:00
Shireesh Anjal	cc5e05fdeb	Do not anonymize hostnames by default (#18387 ) Anonymize them only if the parameter `anonymize` is set to `strict	2023-11-03 10:09:33 -07:00
jiuker	8a56af439c	fix: siteReplicationSys.startResync return no buckets return if error (#18374 )	2023-11-02 16:00:03 -07:00
Shireesh Anjal	f6e581ce54	Capture network device info in health report (#18381 )	2023-11-02 09:49:49 -07:00
Klaus Post	7472818d94	Fix hanging scanner saves (#18368 ) Fix various regressions from #18029 * If context is canceled the token is never returned. This will lead to scanner being unable to save and deadlocking. * Fix backup not being able to get any data (hr empty) * Reduce backup timeout.	2023-11-01 09:09:28 -07:00
Taran Pelkey	33322e6638	Change behavior of service account empty policies (#18346 ) * Fix embedded/implied policy behavior * assume implied policy if pased to empty * fix for all * Fix failing tests --------- Co-authored-by: Prakash Senthil Vel <23444145+prakashsvmx@users.noreply.github.com>	2023-10-31 12:30:36 -07:00
Daniel López Guimaraes	a1792ca0d1	fix: relax enforcing filename on PostPolicy (#18336 ) The filename is not required to be on the form data.	2023-10-30 21:06:32 -07:00
Harshavardhana	ac8c43fe9c	fix: allow missing hot-tier accounting (#18345 )	2023-10-30 14:42:11 -07:00
Allan Roger Reid	4d40ee00e9	Add check for reverse proxy setups (#18310 ) Add check for reverse proxy setups, to skip check for paths being served by different port on same address.	2023-10-30 10:49:04 -07:00
Adrian Najera	06f59ad631	fix: expiration time for share link when using OpenID (#18297 )	2023-10-30 10:21:34 -07:00
Harshavardhana	877e0cac03	fix: tiering statistics handling a bug in clone() implementation (#18342 ) Tiering statistics have been broken for some time now, a regression was introduced in `6f2406b0b6` Bonus fixes an issue where the objects are not assumed to be of the 'STANDARD' storage-class for the objects that have not yet tiered, this should be conditional based on the object's metadata not a default assumption. This PR also does some cleanup in terms of implementation, fixes #18070	2023-10-30 09:59:51 -07:00
Klaus Post	508710f4d1	Re-add duplicate upload id sanity check. (#18339 ) https://github.com/minio/minio/pull/18307 partially removed the duplicate upload id check. While I can't really see how ListDir can return duplicate entries, let's re-add it, since it is a cheap sanity check.	2023-10-29 08:33:30 -07:00
Matthew Toohey	c2fedb4c3f	fix: log targetID instead of Name when event error occurs (#18335 )	2023-10-28 08:32:57 -07:00
Poorna	03dc65e12d	Reload replication targets lazily if missing (#18333 ) There can be rare situations where errors seen in bucket metadata load on startup or subsequent metadata updates can result in missing replication remotes. Attempt a refresh of remote targets backed by a good replication config lazily in 5 minute intervals if there ever occurs a situation where remote targets go AWOL.	2023-10-27 21:08:53 -07:00
Praveen raj Mani	54aed421b8	fix: update the user cache while adding service accounts with expiry (#18320 )	2023-10-26 08:11:29 -07:00
jiuker	d5e8dac1cf	fix: canceling the heal caused goroutine to leak. (#18322 )	2023-10-26 07:53:06 -07:00
Poorna	96ec8fcba1	Preserve replica timestamps in multipart (#18318 ) Also a backward compatibility fix to use x-amz-replica-status if present as replication status.	2023-10-25 21:24:10 -07:00
Harshavardhana	0663eb69ed	fix: do not preserve mtime during CopyObject() metadata updates (#18316 ) mtime must be preserved only if destination mtime is set. fixes #18314	2023-10-25 14:30:56 -07:00
Harshavardhana	c60f54e5be	make ListMultipart/ListParts more reliable skip healing disks (#18312 ) this PR also fixes old flaky tests, by properly marking disk offline-based tests.	2023-10-24 23:33:25 -07:00
Harshavardhana	483389f2e2	set diskMaxConcurrent to 32 if nrRequests is lower	2023-10-24 17:21:12 -07:00
Harshavardhana	069d118329	fix: listObjectParts to prefer local and single disks (#18309 )	2023-10-24 13:51:57 -07:00
Harshavardhana	a7b1834772	fix: flaky and stupid tests in root lockdown (#18308 )	2023-10-24 13:22:44 -07:00
Klaus Post	6415dec37a	Improve multipart listing speed (#18307 )	2023-10-24 12:06:06 -07:00
Harshavardhana	2dc917e87f	maxConcurrent must be set only once per node (#18303 )	2023-10-23 21:42:36 -07:00
Aditya Manthramurthy	0a284a1a10	fix: SR: Add more info when IAM config differs (#18302 ) Provide details on what IAM info mismatched when the validation fails	2023-10-23 21:16:40 -07:00
Harshavardhana	5c8339e1e8	fix: veeam SOS API to higher layers (#18287 ) - support populating usage info from scanner info - support populating quota for the bucket via quota settings for the bucket	2023-10-23 13:55:45 -07:00
Harshavardhana	fd37418da2	fix: allow server not initialized error to be retried (#18300 ) Since relaxing quorum the error across pools for ListBuckets(), GetBucketInfo() we hit a situation where loading IAM could potentially return an error for second pool that server is not initialized. We need to handle this, let the pool come online and retry transparently - this PR fixes that.	2023-10-23 12:30:20 -07:00
Harshavardhana	bbfea29c2b	use object modTime for the event sequencer ID (#18285 ) always set modTime after lock is acquired in completemultipart stage to make sure that the modTime is not racy.	2023-10-20 19:28:05 -07:00
Harshavardhana	aa703dc903	relax write quorum requirement for ListBuckets()/HeadBucket() (#18288 ) Also fix error handling for HeadBucket() to be pool specific	2023-10-20 17:50:21 -07:00
Harshavardhana	780882efcf	do not check for query params to be signed headers (#18283 ) x-amz-signed-headers is meant for HTTP headers only not for query params, using that to verify things further can lead to failure. The generated presigned URL with custom metadata is already kosher (tamper proof). fixes #18281	2023-10-19 21:32:49 -07:00
Klaus Post	ba6218b354	fix: resource metrics "concurrent map iteration and map write" (#18273 ) `resourceMetricsMap` has no protection against concurrent reads and writes. Add a mutex and don't use maps from the last iteration. Bug introduced in #18057 Fixes #18271	2023-10-18 13:28:50 -07:00
Harshavardhana	8e32de3ba9	cache DiskInfo() metrics call separately (#18270 )	2023-10-18 11:17:32 -07:00
Klaus Post	e37508fb8f	fix: linter errors in Windows specific code (#18276 )	2023-10-18 11:08:15 -07:00
Klaus Post	b46a717425	Remove unused config migration (#18277 ) None of the migration is called. Remove dead code.	2023-10-18 11:05:24 -07:00
Klaus Post	7926df0b80	Fix globalDeploymentID race (#18275 ) globalDeploymentID was being read while it was being set. Fixes race: ``` WARNING: DATA RACE Write at 0x0000079605a0 by main goroutine: github.com/minio/minio/cmd.connectLoadInitFormats() github.com/minio/minio/cmd/prepare-storage.go:269 +0x14f0 github.com/minio/minio/cmd.waitForFormatErasure() github.com/minio/minio/cmd/prepare-storage.go:294 +0x21d ... Previous read at 0x0000079605a0 by goroutine 105: github.com/minio/minio/cmd.newContext() github.com/minio/minio/cmd/utils.go:817 +0x31e github.com/minio/minio/cmd.adminMiddleware.func1() github.com/minio/minio/cmd/admin-router.go:110 +0x96 net/http.HandlerFunc.ServeHTTP() net/http/server.go:2136 +0x47 github.com/minio/minio/cmd.setBucketForwardingMiddleware.func1() github.com/minio/minio/cmd/generic-handlers.go:460 +0xb1a net/http.HandlerFunc.ServeHTTP() net/http/server.go:2136 +0x47 ... ```	2023-10-18 08:06:57 -07:00
Harshavardhana	f91b257f50	choose different max_concurrent requests per drive based on HDD/NVMe (#18254 ) currently the default for all drives is 512, which is a lot for HDDs the recent testing has revealed moving this to 32 for HDDs seems like a fair value.	2023-10-16 17:18:13 -07:00
Harshavardhana	edfb310a59	fix: always load ENVs from files first as soon as server starts (#18247 ) This is a regression from #18231, however reading from ENV files must happen well before any parsing logic is invoked.	2023-10-15 21:13:43 -07:00
Poorna	78f1f69d57	fix site replication resync status (#18245 ) To persist status changes on disk upon completion. Adds new tests to handle this functionality.	2023-10-13 22:17:22 -07:00
Harshavardhana	e1e33077e8	fix: tests and resync replication status (#18244 )	2023-10-13 17:03:34 -07:00
Aditya Manthramurthy	b3e7de010d	Remove usage of errors.Join for go1.19 compat (#18243 )	2023-10-13 15:14:16 -07:00
Shireesh Anjal	bf1c6edb76	Revert "Capture network device info in health report" (#18241 ) Introducing a new version of healthinfo struct for adding this info is not correct. It needs to be implemented differently without adding a new version. This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.	2023-10-13 07:46:36 -07:00
jiuker	2ac7fee017	fix: missing fileName will upload failed when PostPolicyBucketHandler (#18240 )	2023-10-13 07:31:23 -07:00
Klaus Post	128256e3ab	Add event counters (#18232 ) Export metric for global events sent and skipped for the lifetime of the server.	2023-10-12 15:39:22 -07:00
Shireesh Anjal	a66a7f3e97	Capture network device info in health report (#18213 )	2023-10-12 15:33:31 -07:00
jiuker	20b79f8945	fix: env depend on the flag (#18231 )	2023-10-12 15:32:38 -07:00
Klaus Post	9a877734b2	Fix various poolmeta races (#18230 ) There is a fundamental race condition in `newErasureServerPools`, where setObjectLayer is called before the poolMeta has been loaded/populated. We add a placeholder value to this field but disable all saving of the value, so we don't risk overwriting the value on disk. Once the value has been loaded or created, it is replaced with the proper value, which will also be saved. Also fixes various accesses of `poolMeta` that were done without locks. We make the `poolMeta.IsSuspended` return false, even if we shouldn't risk out-of-bounds reads anymore.	2023-10-12 15:30:42 -07:00
Harshavardhana	409c391850	implement helpers to get relevant info instead of FileInfo() (#18228 )	2023-10-12 15:29:59 -07:00
jiuker	000928d34e	fix: should call func globalOSMetrics.time(s)() when updateOSMetrics (#18209 )	2023-10-12 00:08:13 -07:00
Harshavardhana	6829ae5b13	completely remove drive caching layer from gateway days (#18217 ) This has already been deprecated for close to a year now.	2023-10-11 21:18:17 -07:00
jiuker	f09756443d	fix: a dynamic config will make a panic for addOrUpdateIDP (#18208 )	2023-10-11 09:06:40 -07:00
jiuker	5512016885	fix: siteResyncMetrics init will make a deadlock when len(siteReplication) >= 3 (#18206 )	2023-10-10 23:27:27 -07:00
Harshavardhana	21ecb941fe	fix: avoid counting out of band deletes during disk heal (#18205 )	2023-10-10 14:39:48 -07:00
Harshavardhana	77e94087cf	fix: calling statfs() call moves the disk head (#18203 ) if erasure upgrade is needed rely on the in-memory values, instead of performing a "DiskInfo()" call. https://brendangregg.com/blog/2016-09-03/sudden-disk-busy.html for HDDs these are problematic, lets avoid this because there is no value in "being" absolutely strict here in terms of parity. We are okay to increase parity as we see based on the in-memory online/offline ratio.	2023-10-10 13:47:35 -07:00
Klaus Post	9ab1f25a47	fix : PutObjectExtract data races (#18199 ) Several callers to putObjectTar may be fighting to set sc. Move the write out of the loop. Use static resp, and request elements. Fixes tests with -race: ``` WARNING: DATA RACE Read at 0x00c01cd680e0 by goroutine 691354: github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1() e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2130 +0x149 github.com/minio/minio/cmd.untar.func1() e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6 github.com/minio/minio/cmd.untar.func8() e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4 Previous write at 0x00c01cd680e0 by goroutine 691352: github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1() e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2131 +0x15d github.com/minio/minio/cmd.untar.func1() e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6 github.com/minio/minio/cmd.untar.func8() e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4 ```	2023-10-10 08:36:44 -07:00
jiuker	aaab7aefbe	fix: avoid nil panic upon error in GetObjectNInfo via InnerGetObjectNInfoFn (#18198 )	2023-10-10 08:35:33 -07:00
Klaus Post	5b8599e52d	Do not log invalid tag errors (#18200 ) Eliminate logging on invalid tags: ``` API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325) Time: 07:40:33 UTC 10/10/2023 DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71 RequestID: 178CAF0DE0A67480 RemoteHost: 127.0.0.1 Host: 127.0.0.1:9001 UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64) Error: Tags cannot be more than 10 (tags.errTag) 5: internal\logger\logger.go:259:logger.LogIf() 4: cmd\api-errors.go:2350:cmd.toAPIErrorCode() 3: cmd\api-errors.go:2375:cmd.toAPIError() 2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler() 1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP() API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325) Time: 07:40:33 UTC 10/10/2023 DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71 RequestID: 178CAF0DE0BEA514 RemoteHost: 127.0.0.1 Host: 127.0.0.1:9001 UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64) Error: Cannot provide multiple Tags with the same key (tags.errTag) 5: internal\logger\logger.go:259:logger.LogIf() 4: cmd\api-errors.go:2350:cmd.toAPIErrorCode() 3: cmd\api-errors.go:2375:cmd.toAPIError() 2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler() 1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP() API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325) Time: 07:40:33 UTC 10/10/2023 DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71 RequestID: 178CAF0DE0E78970 RemoteHost: 127.0.0.1 Host: 127.0.0.1:9001 UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64) Error: The TagKey you have provided is invalid (tags.errTag) 5: internal\logger\logger.go:259:logger.LogIf() 4: cmd\api-errors.go:2350:cmd.toAPIErrorCode() 3: cmd\api-errors.go:2375:cmd.toAPIError() 2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler() 1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP() API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325) Time: 07:40:33 UTC 10/10/2023 DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71 RequestID: 178CAF0DE1002AE8 RemoteHost: 127.0.0.1 Host: 127.0.0.1:9001 UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64) Error: The TagValue you have provided is invalid (tags.errTag) 5: internal\logger\logger.go:259:logger.LogIf() 4: cmd\api-errors.go:2350:cmd.toAPIErrorCode() 3: cmd\api-errors.go:2375:cmd.toAPIError() 2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler() 1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP() ```	2023-10-10 08:35:03 -07:00
Harshavardhana	74e0c9ab9b	reduce unnecessary logging, simplify certain error handling (#18196 ) remove a bunch of unnecessary logs	2023-10-10 00:33:42 -07:00
Harshavardhana	dcce83b288	avoid rebalance state for getObjectTags if any (#18197 ) fixes #18190	2023-10-09 23:56:26 -07:00
Matthew Toohey	f731e7ea36	Fix current_send_in_progress metric always being zero (#18160 )	2023-10-09 17:28:17 -07:00
Maxim Tkachenko	ec30bb89a4	simplify channel send() in WalkDir() (#18186 )	2023-10-09 17:27:55 -07:00
Klaus Post	7cd08594f6	Use better host names for metric errors (#18188 ) Typically hosts would end up like this: ``` "hosts": [ ":9000", ":9000", ":9000", ... ``` Also add host name to errors.	2023-10-09 17:27:11 -07:00
Aditya Manthramurthy	2b4531f069	fix: O_DIRECT is on only for multi-disk setups (#18194 ) Disable it for single disk/unsupported platforms	2023-10-09 17:08:40 -07:00
Harshavardhana	11544a62aa	fix: upon write failure on disk journal close the file properly (#18183 ) close the file properly before dereferencing *os.File, this can silently leak fd's in rare cases. This PR fixes this properly.	2023-10-08 12:17:08 -07:00
Taran Pelkey	18550387d5	fix: DeleteServiceAccount API behavior (#18163 )	2023-10-08 12:13:18 -07:00
Klaus Post	0de2b9a1b2	Fix panic on double unfreezeServices (#18177 ) Calling unfreezeServices twice results in panic: ``` panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel goroutine 14703 [running]: runtime/debug.Stack() runtime/debug/stack.go:24 +0x65 github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1() github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e panic({0x27c3020, 0x4c9b370}) runtime/panic.go:884 +0x212 github.com/minio/minio/cmd.unfreezeServices() github.com/minio/minio/cmd/service.go:112 +0xc7 github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00) github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a net/http.HandlerFunc.ServeHTTP(...) ``` If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic. Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).	2023-10-06 07:51:50 -06:00
Poorna	9dc29d7687	Avoid ILM expiry on deleted versions that are yet to replicate (#18175 ) Fixes #18167	2023-10-06 06:55:15 -06:00
Poorna	72871dbb9a	delete replication: avoid overwriting replication decision (#18174 ) from ObjectInfo unless version purge status is present. Otherwise there is potential to make incorrect replication decision if Stat returned an error	2023-10-05 21:09:45 -06:00
Aditya Manthramurthy	4bda4e4e2b	fix: check for disk-level O_DIRECT support (#18173 ) Disk level O_DIRECT support checking at xl storage initialization was conditional on a config setting being enabled. (This never took effect because config initialization happens after ObjectLayer is ready.) This is not necessary as the config setting is dynamic - O_DIRECT should be enabled via runtime config. So we need to do the disk level support check regardless of the config setting.	2023-10-05 20:54:49 -06:00
Harshavardhana	1971c54a50	update buffer channels for both trace and listen events (#18171 ) - Trace needs higher buffered channels than 4000 to ensure when we run `mc admin trace -a` it captures all information sufficiently. - Listen event notification needs the event channel to be `apiRequestsMaxPerNode` * number of nodes	2023-10-05 18:16:04 -06:00
Anis Eleuch	b336e9a79f	fix: loading usage cache to not fail early when reading the backup fails (#18158 ) Currently, the retry is not fully used when there is no backup copy of the data usage; use 5 retry attempts when we don't have any valid data, new or backup, unless we have seen an un-recognized error.	2023-10-02 19:22:35 -07:00
Harshavardhana	a2ab21e91c	add max-keys=2 optimization for spark workloads (#18154 ) comment in the code provides more detailed explanation on what this PR entails and its assumptions. this PR reduces the amount of listing() by an order of magnitude, however there are other such calls that still needs further optimization that shall be done in subsequent PRs.	2023-10-02 07:52:59 -06:00
Sveinn	603437e70f	Fix startup formatting (#18156 ) Percentages in root user names are used for formatting. Before: ``` S3-API: http://192.168.50.21:9000 http://172.31.96.1:9000 http://127.0.0.1:9000 RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb" RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb" Console: http://192.168.50.21:51915 http://172.31.96.1:51915 http://127.0.0.1:51915 RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb" RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb" Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart FORMAT: %117s MESSAGE: $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb" $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%!a(MISSING)vZb" "Q4#Q6y8G%Px#npP4dudUobU#NBcGB7RMKV4ajYb" ``` After: ``` Status: 1 Online, 0 Offline. S3-API: http://192.168.50.21:9000 http://172.31.96.1:9000 http://127.0.0.1:9000 RootUser: "U4B6Zi!b75DXSPm%avZb" RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb" Console: http://192.168.50.21:52421 http://172.31.96.1:52421 http://127.0.0.1:52421 RootUser: "U4B6Zi!b75DXSPm%avZb" RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb" Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb" ``` No need for special Windows case. `mc` works just fine.	2023-10-02 07:39:47 -06:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00
Anis Eleuch	22d2dbc4e6	decom: Fix infinite retry when the decom is canceled (#18143 ) Also, use rand.Float64() since it is thread-safe; otherwise go race will complain.	2023-09-30 00:02:29 -07:00
Harshavardhana	d6446cb096	do not return an error in AbortMultipartUpload() (#18135 ) returning an error is a bit undefined in AWS S3 as it may return an error or not depending on the time from AbortMultipartUpload().	2023-09-29 10:28:19 -07:00
Harshavardhana	c34bdc33fb	make sure to set Versioned field to ensure rename2 is not called (#18141 ) without this the rename2() can rename the previous dataDir causing issues for different versions of the object, only latest version is preserved due to this bug. Added healing code to ensure recovery of such content.	2023-09-29 09:08:24 -07:00
Anis Eleuch	aec023f537	Avoid showing buckets without quorum in each pool (#18125 )	2023-09-29 00:58:54 -07:00
Poorna	e101eeeda9	fix: tier addition validation (#18136 )	2023-09-28 22:33:24 -07:00
Harshavardhana	3c470a6b8b	fix: the inspect script to use scheme per deployment (#18118 )	2023-09-27 08:22:50 -07:00
Poorna	6bc7d711b3	delete of a missing versionId return 204 (#18117 )	2023-09-26 14:02:56 -07:00
Harshavardhana	cdeab19673	fix: always check error upon w.Close() in Write() (#18111 ) not checking w.Close() can prematurely make us think that the w.Write() actually succeeded, apparently Write() may or may not return an error but sometimes only during a Close() call to the fd we may see the error from Write() propagate. Fdatasync(w) on the FD would return an error requiring Close() error handling is less of a concern, however it may happen such that fdatasync() did not return an error, where as Close() would.	2023-09-26 11:04:00 -07:00
Anis Eleuch	22ee678136	tier: Avoid doing versioned operations since not required anymore (#18108 ) Currently, setting a new tiering target returns an error when a bucket is versioned and the tiering credentials does not have authorization to specify a version-id when reading or removing a specific version; Since tiering does not require versioning anymore; avoid doing versioned operations when performing checklist ops while adding a new tiering configuration.	2023-09-26 00:14:56 -07:00
Poorna	50a8f13e85	site replication: allow setting bandwidth default for bucket (#18062 ) This can still be overridden at the bucket level	2023-09-25 15:50:52 -07:00
jiuker	6dec60b6e6	fix: check post policy like AWS S3 (#18074 )	2023-09-25 12:35:25 -07:00
Harshavardhana	ac3a19138a	fix: set scanning details locally to avoid cached values (#18092 ) atomic variable results such as scanning must not use cached values, instead rely on real-time information.	2023-09-25 08:26:29 -07:00
Klaus Post	21e8e071d7	Improve ListObject Compatibility (#18099 ) Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after. Fixes #18093	2023-09-25 08:13:08 -07:00
Klaus Post	57f84a8b4c	Add abandoned folder scanning to metrics (#18076 ) Include object and versions heal scan times when checking non-empty abandoned folders. Furthermore don't add delay between healing versions, instead do one per object wait.	2023-09-24 22:15:31 -07:00
Aditya Manthramurthy	22041bbcc4	fix: Update policy mapping properly in notification (#18088 ) This is fixing a regression from an earlier change where STS account loading was made lazy.	2023-09-22 20:47:50 -07:00
Harshavardhana	91ebac0a00	fix: move abandoned parts check after healing not in ILM path (#18087 )	2023-09-22 12:07:52 -07:00
Harshavardhana	3a90fb108c	only look for metadata if batch replication asks for metadata filters (#18082 ) This PR changes the StatObject() to be must have for non-minio source to being a conditional API call. - Calls StatObject() when needed - Calls GetObjectTagging() when needed These calls if we do without these conditionals can cause a lot of delays, so we avoid them if not needed in more common scenario.	2023-09-22 11:31:57 -07:00
Shubhendu	74cfb207c1	Added check for mandatory MINIO_KMS_KES_KEY_NAME env var (#18077 ) If MinIO started with KMS enabled, MINIO_KMS_KES_KEY_NAME should be set for server to start. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-09-21 10:37:37 -07:00
Harshavardhana	9788d85ea3	remove logging for invalid metadata values (#18068 )	2023-09-20 15:49:55 -07:00
Anis Eleuch	69c0e18685	perf net: Add the endpoint name related to the perf net error (#18063 ) In a perf test, one node will run speed test with all nodes. If there is an error with a peer node, the peer node name is not included in the error hence confusing the user. This commit will add the peer endpoint string to the netperf error.	2023-09-19 22:41:06 -07:00
Aditya Manthramurthy	3cac927348	Load STS policy mappings periodically (#18061 ) To ensure that policy mappings are current for service accounts belonging to (non-derived) STS accounts (like an LDAP user's service account) we periodically reload such mappings. This is primarily to handle a case where a policy mapping update notification is missed by a minio node. Such a node would continue to have the stale mapping in memory because STS creds/mappings were never periodically scanned from storage.	2023-09-19 17:57:42 -07:00
Harshavardhana	9081346c40	fix: more regressions listing policy mappings (#18060 ) also relax ListServiceAccounts() returning error if no service accounts exist.	2023-09-19 15:23:18 -07:00
Harshavardhana	fcfadb0e51	fix: regression in loading LDAP users policy mappings (#18055 ) LDAP users are stored as STS users, we need to load their policy mappings appropriately. Fixes a regression caused by #17994	2023-09-19 10:31:56 -07:00
Harshavardhana	2add57cfed	apply healing per object at 1024 cycles (#18050 ) - we already have MRF for most recent failures - we trigger healing during HEAD/GET operation These are enough, also change the default max wait from 5sec to 1sec for default scanner speed.	2023-09-19 09:24:22 -07:00
Poorna	b73699fad8	replication: pass user tags while queueing (#18052 ) Continues from #18032 - otherwise replication will fail on tag based rules.	2023-09-19 03:18:28 -07:00
Harshavardhana	b8ebe54e53	Revert "skip tiered objects to GLACIER in batch replication (#18044 )" This reverts commit `fd421ddd6f`. MinIO already provides `filter` based on metadata that would work in this scenario already.	2023-09-19 00:05:40 -07:00
Harshavardhana	c3d70e0795	cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051 ) AccountInfo is quite frequently called by the Console UI login attempts, when many users are logging in it is important that we provide them with better responsiveness. - ListBuckets information is cached every second - Bucket usage info is cached for up to 10 seconds - Prefix usage (optional) info is cached for up to 10 secs Failure to update after cache expiration, would still allow login which would end up providing information previously cached. This allows for seamless responsiveness for the Console UI logins, and overall responsiveness on a heavily loaded system.	2023-09-18 22:13:03 -07:00
Harshavardhana	fd421ddd6f	skip tiered objects to GLACIER in batch replication (#18044 ) tiered objects to GLACIER are not readable until they are restored, we skip these as unreadable	2023-09-18 10:25:31 -07:00
jiuker	9947c01c8e	feat: SSE-KMS use uuid instead of read all data to md5. (#17958 )	2023-09-18 10:00:54 -07:00
Eng Zer Jun	a00db4267c	data-usage-cache: remove redundant nil check (#17970 ) From the Go specification: "3. If the map is nil, the number of iterations is 0." [1] Therefore, an additional nil check for before the loop is unnecessary. [1]: https://go.dev/ref/spec#For_range Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2023-09-16 19:09:29 -07:00
Harshavardhana	36385010f5	use optimized pathJoin instead of path.Join (#18042 ) this avoids allocations in scanner routine, they are tiny but they allocate a lot over many cycles of the scanner.	2023-09-16 19:08:59 -07:00
Harshavardhana	fa6d082bfd	reduce all major allocations in replication path (#18032 ) - remove targetClient for passing around via replicationObjectInfo{} - remove cloing to object info unnecessarily - remove objectInfo from replicationObjectInfo{} (only require necessary fields)	2023-09-16 02:28:06 -07:00
Poorna	b733e6e83c	site replication turn off retry login for admin API calls (#18039 ) additionally also mark site offline if n/w is down	2023-09-15 18:01:47 -07:00
Anis Eleuch	37aa5934a1	scanner: Fix loading data usage cache structure (#18037 ) Return an empty data usage cache structure when the data usage cache file does not exist, otherwise, the scanner won't work.	2023-09-15 13:11:08 -07:00
Harshavardhana	1647fc7edc	fix: optimize listMultipartUploads to serve via local disks (#18034 ) and remove unused getLoadBalancedDisks()	2023-09-15 08:34:03 -07:00
Harshavardhana	7b92687397	remove generating presignedURLs with range header for lambda (#18033 )	2023-09-14 21:58:17 -07:00
Alex	dc48cd841a	Added MINIO_PROMETHEUS_AUTH_TOKEN env support (#18028 ) Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>	2023-09-14 17:28:21 -07:00
Anis Eleuch	b0e1776d6d	Do not use a chain for S3 tiering to return better error messages (#18030 ) When using a chain provider all providers do not return a valid access and secret key, an anonymous request is sent, which makes it hard for users to figure out what is going on In the case of S3 tiering, when AWS IAM temporary account generation returns an error, an anonymous login will be used because of the chain provider. Avoid this and use the AWS IAM provider directly to get a good error message.	2023-09-14 15:28:20 -07:00
Aditya Manthramurthy	7a7068ee47	Move IAM periodic ops to a single go routine (#18026 ) This helps reduce disk operations as these periodic routines would not run concurrently any more. Also add expired STS purging periodic operation: Since we do not scan the on-disk STS credentials (and instead only load them on-demand) a separate routine is needed to purge expired credentials from storage. Currently this runs about a quarter as often as IAM refresh. Also fix a bug where with etcd, STS accounts could get loaded into the iamUsersMap instead of the iamSTSAccountsMap.	2023-09-14 15:25:17 -07:00
Aditya Manthramurthy	cbc0ef459b	Fix policy package import name (#18031 ) We do not need to rename the import of minio/pkg/v2/policy as iampolicy any more.	2023-09-14 14:50:16 -07:00
Harshavardhana	a2aabfabd9	add backups for usage-caches to rely on upon error (#18029 ) This allows scanner to avoid lengthy scans, skip things appropriately and also not lose metrics in any manner. reduce longer deadlines for usage-cache loads/saves to match the disk timeout which is 2minutes now per IOP.	2023-09-14 11:53:52 -07:00
Harshavardhana	32890342ce	introduce MINIO_BROWSER_REDIRECT env to enable/disable auto-redirect (#18025 )	2023-09-13 18:43:57 -07:00
Aditya Manthramurthy	ed2c2a285f	Load STS accounts into IAM cache lazily (#17994 ) In situations with large number of STS credentials on disk, IAM load time is high. To mitigate this, STS accounts will now be loaded into memory only on demand - i.e. when the credential is used. In each IAM cache (re)load we skip loading STS credentials and STS policy mappings into memory. Since STS accounts only expire and cannot be deleted, there is no risk of invalid credentials being reused, because credential validity is checked when it is used.	2023-09-13 12:43:46 -07:00
Poorna	18e23bafd9	replication resync: report only the on-disk status (#18017 ) Avoid reporting in-memory status since results can vary if different nodes are queried, resync always runs at a single node.	2023-09-13 10:58:38 -07:00
Harshavardhana	8b8be2695f	optimize mkdir calls to avoid base-dir `Mkdir` attempts (#18021 ) Currently we have IOPs of these patterns ``` [OS] os.Mkdir play.min.io:9000 /disk1 2.718µs [OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs [OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs ``` It can be seen that we can save quite Nx levels such as if your drive is mounted at `/disk1/minio` you can simply skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`. Since they are expected to exist already, this PR adds a way for us to ignore all paths upto the mount or a directory which ever has been provided to MinIO setup.	2023-09-13 08:14:36 -07:00
Poorna	96fbf18201	replication: queue existing objects to same workers as incoming (#18020 ) Previously existing objects were queued to single worker and MRF re-queues are also handled by same worker - this does not fully use the available bandwidth in case there is no incoming workload.	2023-09-12 21:59:15 -07:00
Harshavardhana	c8a57a8fa2	fix: send content-md5 for AWS S3 proactively (#18018 ) fixes #17977	2023-09-12 19:11:13 -07:00
Harshavardhana	b1c2dacab3	fix: allow dynamic ports for API only in non-distributed setups (#18019 ) fixes #17998	2023-09-12 19:10:49 -07:00
Harshavardhana	08b3a466e8	fix: allow concurrent SFTP connections (#18013 ) current implementation did not fully implement the concurrent SFTP connection implementation, this PR properly handles this. fixes #17914	2023-09-12 12:41:52 -07:00
Harshavardhana	1df5e31706	optimize MRF replication queue to avoid memory leaks (#18007 )	2023-09-11 20:59:11 -07:00
Harshavardhana	9f7044aed0	fix: ignore transient errors in read path (#18006 ) Errors such as ``` returned an error (context deadline exceeded) (fmt.wrapError) ``` ``` (msgp: too few bytes left to read object) (fmt.wrapError) ```	2023-09-11 15:29:59 -07:00
Anis Eleuch	41de53996b	heal: calculate the number of workers based on NRRequests (#17945 )	2023-09-11 14:48:54 -07:00
Harshavardhana	9878031cfd	fix: change DISK_ to DRIVE_ for some drive related envs (#18005 )	2023-09-11 12:19:22 -07:00
Poorna	703ed46d79	fix: replication of tags while removing (#17989 ) A tag removal was not being replicated prior to this change	2023-09-06 19:05:02 -07:00
Harshavardhana	f7ca6c63c2	fix: bucket quota clear and honor existing quota config (#17988 )	2023-09-06 19:03:58 -07:00
Harshavardhana	ad69b9907f	fix: report bucket metrics for only existing buckets (#17987 )	2023-09-06 12:50:46 -07:00
Shubhendu	bfddbb8b40	Embed file in ZIP with custom permissions (#17954 ) This change enables embedding files in ZIP with custom permissions. Also uses default creds for starting MinIO based on inspect data. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-09-06 09:24:01 -07:00
Poorna	13a2dc8485	replication resync: avoid blocking on results channel. (#17981 ) continues fix in #17775	2023-09-05 20:22:39 -07:00
Harshavardhana	1e51424e8a	use syscall.Rename() directly instead of os.Rename() (#17982 )	2023-09-05 20:22:23 -07:00
Harshavardhana	5b114b43f7	refactor bandwidth throttling for replication target (#17980 ) This refactor is to allow using the bandwidth throttling for other purposes.	2023-09-05 20:21:59 -07:00
Poorna	812f5a02d7	metrics: fix panic in replication stats reporting (#17979 )	2023-09-05 10:26:18 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi	71c32e9b48	Return successorModTime in quorum when available (#17925 )	2023-09-04 08:24:17 -07:00
Harshavardhana	380a59520b	add missing testdata for benchmarking	2023-09-02 14:40:38 -07:00
Harshavardhana	3995355150	avoid repeated large allocations for large parts (#17968 ) objects with 10,000 parts and many of them can cause a large memory spike which can potentially lead to OOM due to lack of GC. with previous PR reducing the memory usage significantly in #17963, this PR reduces this further by 80% under repeated calls. Scanner sub-system has no use for the slice of Parts(), it is better left empty. ``` benchmark old ns/op new ns/op delta BenchmarkToFileInfo/ToFileInfo-8 295658 188143 -36.36% benchmark old allocs new allocs delta BenchmarkToFileInfo/ToFileInfo-8 61 60 -1.64% benchmark old bytes new bytes delta BenchmarkToFileInfo/ToFileInfo-8 1097210 227255 -79.29% ```	2023-09-02 07:49:24 -07:00
Harshavardhana	8208bcb896	remove all unnecessary logging, logOnce when absolutely needed (#17965 )	2023-09-01 16:19:18 -07:00
Poorna	d665e855de	replication: remove check for empty version id (#17964 )	2023-09-01 13:46:10 -07:00
Harshavardhana	18b3655c99	with xlv2 format we never had to fill in checksumInfo() (#17963 ) - this PR avoids sending a large ChecksumInfo slice when its not needed - also for a file with XLV2 format there is no reason to allocate Checksum slice while reading	2023-09-01 13:45:58 -07:00
Anis Eleuch	6a8d8f34a5	kafka: Do not require key when sending a message (#17962 ) Keys are helpful to ensure the strict ordering of messages, however currently the code uses a random request id for every log, hence using the request-id as a Kafka key is not serve any purpose; This commit removes the usage of the key, to also fix the audit issue from internal subsystem that does not have a request ID.	2023-09-01 08:37:22 -07:00
Harshavardhana	b1c1f02132	use buffers for pathJoin, to re-use buffers. (#17960 ) ``` benchmark old ns/op new ns/op delta BenchmarkPathJoin/PathJoin-8 79.6 55.3 -30.53% benchmark old allocs new allocs delta BenchmarkPathJoin/PathJoin-8 2 1 -50.00% benchmark old bytes new bytes delta BenchmarkPathJoin/PathJoin-8 48 24 -50.00% ```	2023-08-31 17:58:48 -07:00
yangw	b13fcaf666	fix: read atomic variable in clientDevNull round trip time (#17955 )	2023-08-31 08:31:01 -07:00
Harshavardhana	9458485e43	avoid double logging from healing (#17950 )	2023-08-30 18:46:04 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Krishnan Parthasarathi	6a67c277eb	Reuse types for key-value, notification and retry (#17936 )	2023-08-29 11:27:23 -07:00
Harshavardhana	7cafdc0512	fix: skip access checks further for known buckets (#17934 )	2023-08-28 15:16:41 -07:00
Harshavardhana	8a57b6bced	use renameat2 Linux extension syscall (#17757 ) this is a faster and safer alternative on newer kernel versions.	2023-08-27 09:57:11 -07:00
Krishnan Parthasarathi	53abd25116	Don't log when object to be tiered is not found (#17924 )	2023-08-25 23:34:16 -07:00
Harshavardhana	1ea7826c0e	do not have to consider replicationTimestamp for healing and quorum (#17922 ) replicationTimestamp might differ if there were retries in replication and the retried attempt overwrote in quorum but enough shards with newer timestamp causing the existing timestamps on xl.meta to be invalid, we do not rely on this value for anything external. this is purely a hint for debugging purposes, but there is no real value in it considering the object itself is in-tact we do not have to spend time healing this situation. we may consider healing this situation in future but that needs to be decoupled to make sure that we do not over calculate how much we have to heal.	2023-08-25 15:31:15 -07:00
Anis Eleuch	0cde37be50	Reduce the number of calls to import bucket metadata (#17899 ) For each bucket, save the bucket metadata once, call the site replication hook once	2023-08-25 07:59:16 -07:00
jiuker	6aeca54ece	fix: replace context by timeout-context from parent-context when `selfSpeedTest` (#17906 )	2023-08-25 07:58:38 -07:00
Harshavardhana	124e28578c	remove strict persistence requirements for List() .metacache objects (#17917 ) .metacache objects are transient in nature, and are better left to use page-cache effectively to avoid using more IOPs on the disks. this allows for incoming calls to be not taxed heavily due to multiple large batch listings.	2023-08-25 07:58:11 -07:00
Harshavardhana	62c9e500de	remove mTime requirement from pre-condition checks (#17916 ) given a versionId the mtime is always the same, it can never be different than its original value. versionIds also do not conflict, since they are uuid's and unique practically forever.	2023-08-24 14:33:58 -07:00
jiuker	02cc18ff29	refactor the perf client for TTFB and TotalResponseTime (#17901 )	2023-08-24 10:21:08 -07:00
Harshavardhana	ba4566e86d	add missing IAM node metrics to cluster and node endpoint (#17908 )	2023-08-24 09:26:37 -07:00
Krishnan Parthasarathi	87cb0081ec	Retain current and upto NewerNoncurrentVersions versions (#17909 ) applyNewerNoncurrentVersionLimit method should pass along versions unaffected by NewerNoncurrentVersions rule for further ILM evaluation.	2023-08-24 09:26:29 -07:00
Poorna	4a6af93c83	mark replication target offline if network timeouts seen (#17907 ) regular target liveness check every 5 secs will toggle state back as target returns online.	2023-08-24 09:24:26 -07:00
Harshavardhana	af564b8ba0	allow bootstrap to capture time-spent for each initializers (#17900 )	2023-08-23 03:07:06 -07:00
Klaus Post	7c8746732b	Return cancelled storage calls as 499 (#17895 ) Make upstream cancels more visible - right now they are just reported as "forbidden".	2023-08-22 11:10:41 -07:00
Klaus Post	f506117edb	Reduce memory profiling rate (#17894 ) Change profiling from every 4KB to every 128K, reducing the lock contention by a factor of 32.	2023-08-22 07:21:49 -07:00
Harshavardhana	1c5af7c31a	serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886 ) we expect a certain level of IOPs and latency so this is okay. fixes other miscellaneous bugs - such as hanging on mrfCh <- when the context is canceled - queuing MRF heal when the context is canceled - remove unused saveStateCh channel	2023-08-21 16:44:50 -07:00
Harshavardhana	3a0125fa1f	remove unexpected logging from peer calls (#17888 ) also make sure RequestID is set for system logs	2023-08-21 14:25:24 -07:00
Daniel Valdivia	328cb0a076	Pass environment variable to control session length to console (#17885 ) Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>	2023-08-21 11:55:43 -07:00
jiuker	e3ea97c964	fix: replace req context by locker context (#17880 )	2023-08-19 22:09:07 -07:00
Andreas Auernhammer	8f8f8854f0	update `minio/kes-go` dep to v0.2.0 (#17850 ) This commit updates the minio/kes-go dependency to v0.2.0 and updates the existing code to work with the new KES APIs. The `SetPolicy` handler got removed since it may not get implemented by KES at all and could not have been used in the past since stateless KES is read-only w.r.t. policies and identities. Signed-off-by: Andreas Auernhammer <hi@aead.dev>	2023-08-19 07:37:53 -07:00
Anis Eleuch	4c6869cd9a	ilm: Fix cleaning non current null versions (#17876 )	2023-08-18 12:55:47 -07:00
Harshavardhana	dde1a12819	fix: validate incoming uploadID to be base64 encoded (#17865 ) Bonus fixes include - do not have to write final xl.meta (renameData) does this already, saves some IOPs. - make sure to purge the multipart directory properly using a recursive delete, otherwise this can easily pile up and rely on the stale uploads cleanup. fixes #17863	2023-08-17 09:37:55 -07:00
Harshavardhana	9ebd10d3f4	Revert "Include SuccessorModTime for FileInfo quorum (#17732 )" (#17860 ) This reverts commit `bf3901342c`. This is to fix a regression caused when there are inconsistent versions, but one version is in quorum. SuccessorModTime issue must be fixed differently.	2023-08-16 07:51:33 -07:00
Harshavardhana	3ba927edae	fix: batch status reporting after complete (#17852 ) batch status can perpetually wait after completion due to a race between the MetricsHandler() returning the active metrics in intervals of 1sec and delete of metrics after job completion. this PR ensures that we keep the 'status' around for a while, i.e upto 24hrs for all the batch jobs.	2023-08-15 12:22:30 -07:00
Harshavardhana	c4ca0a5a57	add two more drive metrics when metrics is available (#17854 )	2023-08-15 10:55:47 -07:00
Klaus Post	406ea4f281	Fix distributed listing not able to resume (#17855 ) Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`. This meant that in distributed systems listings would never be able to continue and would restart on every call. Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.	2023-08-15 07:45:25 -07:00
Harshavardhana	64aa7feabd	allow specifying lower disks for Walk() (#17829 ) useful when you may want Walk() with reduced quorum requirements.	2023-08-14 21:32:39 -07:00
Poorna	875f4076ec	site replication: avoid retries when peer is offline (#17853 )	2023-08-14 21:31:41 -07:00
Harshavardhana	4643efe6be	fix: add deadline worker pattern for local disk removers (#17845 )	2023-08-14 12:28:13 -07:00
Harshavardhana	b760137e1d	fix: add proxyByNode for batch jobs as part of their jobId (#17844 )	2023-08-11 13:12:35 -07:00
Harshavardhana	5f56f441bf	fix: apply common notification code with content-type (#17843 )	2023-08-11 11:34:43 -07:00
Klaus Post	96a22bfcbb	fix: wrapped io.EOF during ListObjects() (#17842 ) When listing getObjectFileInfo can return `io.EOF` if file is being written. When we wrap the error it will not retry upstream, since `io.EOF` is a valid return value. Allow one retry before returning errors and canceling the listing.	2023-08-11 09:47:16 -07:00
Poorna	dfaf735073	replication: fix queuing of large uploads (#17831 ) Fixes regression from #17687	2023-08-10 15:48:42 -07:00
Anis Eleuch	7fcfde7f07	s3: Pick a pool with >85% if all other pools are in suspended state (#17826 )	2023-08-10 11:06:31 -07:00
jiuker	b1391d1991	feat: support perf client to show `TX` from client to server (#17718 )	2023-08-10 07:14:46 -07:00
Harshavardhana	eb55034dfe	optimize deletePrefix, use direct set location via object name (#17827 ) * optimize deletePrefix, use direct set location via object name instead of fanning out the calls for an object force delete we can assume the set location and not do fan-out calls * Apply suggestions from code review Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com> --------- Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2023-08-09 16:30:22 -07:00
Harshavardhana	c45bc32d98	skip disks under scanning when healing disks (#17822 ) Bonus: - avoid calling DiskInfo() calls when missing blocks instead heal the object using MRF operation. - change the max_sleep to 250ms beyond that we will not stop healing.	2023-08-09 12:51:47 -07:00
Harshavardhana	6e860b6dc5	count all versions as part of DeleteAllVersionsAction (#17821 )	2023-08-09 08:55:19 -07:00
Harshavardhana	b732a673dc	reduce logging in bucket replication in retry scenarios (#17820 )	2023-08-08 13:27:40 -07:00
Yang Wu	23e4895dfc	Create metrics slice when necessary (#17809 )	2023-08-07 02:21:22 -07:00
Harshavardhana	8666c55ca6	fix: do not use PrefixEnabled() logic to ignore valid objects (#17677 ) ignoring valid objects with valid replication metadata after the Prefix was disabled must still honor the older metadata. this can lead to unexpected results, allow it during READ phase always.	2023-08-05 13:56:01 -07:00
Anis Eleuch	a3f00c5d5e	batch: Strict unmarshal yaml document to avoid user made typos (#17808 ) // UnmarshalStrict is like Unmarshal except that any fields that are found // in the data that do not have corresponding struct members, or mapping // keys that are duplicates, will result in // an error.	2023-08-05 13:51:48 -07:00
Poorna	26c23b30f4	replication: set context timeout for NewMultipartUpload calls (#17807 )	2023-08-05 12:27:07 -07:00
Anis Eleuch	a436fd513b	track client disconnections properly for all ListObjects calls (#17804 ) Currently ListObjects* calls were returning 200 OK for timed-out clients, this makes debugging via `mc admin trace` very hard.	2023-08-04 15:57:27 -07:00
Harshavardhana	533cd8d6df	fix: batch replication pull must preserve versionID (#17805 ) batch replication pull must preserve versionID regardless of destination bucket versioning configuration. This is similar to the issue with decommissioning and rebalancing	2023-08-04 12:09:10 -07:00
Harshavardhana	cb089dcb52	error out by default beyond 10000 versions per object (#17803 ) ``` You've exceeded the limit on the number of versions you can create on this object ```	2023-08-04 10:40:21 -07:00
Harshavardhana	239ccc9c40	fix: crash in globalTierJournal when TierConfig is not initialized (#17791 )	2023-08-03 14:16:15 -07:00
Poorna	b762fbaf21	sts: validate if iam subsystem initialized in handlers (#17796 )	2023-08-03 13:24:25 -07:00
Praveen raj Mani	0285df5a02	fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783 )	2023-08-03 02:47:07 -07:00
Harshavardhana	45fb375c41	allow healing to prefer local disks over remote (#17788 )	2023-08-03 02:18:18 -07:00
Harshavardhana	4a4950fe41	fix: honor requested allow origin settings properly (#17789 ) fixes #17778	2023-08-02 20:41:21 -07:00
Anis Eleuch	1664fd8bb1	Avoid logging errors twice during transitioned objects expiration (#17782 )	2023-08-02 09:06:03 -07:00
Harshavardhana	21cdd2bf5d	avoid overwriting metrics on success, save it in defer (#17780 )	2023-08-01 22:19:56 -07:00
Harshavardhana	0153f96a20	add deadlines for readMetadata() in listing (#17776 ) Bonus: also skip spending time looking for xl.json - Listing() - Delete()	2023-08-01 21:52:31 -07:00
Harshavardhana	a7a7533190	add new errors for Disks with timeouts (#17770 )	2023-08-01 12:47:50 -07:00
Poorna	311380f8cb	replication resync: fix queueing (#17775 ) Assign resync of all versions of object to the same worker to avoid locking contention. Fixes parallel resync implementation in #16707	2023-08-01 11:51:15 -07:00
Harshavardhana	b0f0e53bba	fix: make sure to correctly initialize health checks (#17765 ) health checks were missing for drives replaced since - HealFormat() would replace the drives without a health check - disconnected drives when they reconnect via connectEndpoint() the loop also loses health checks for local disks and merges these into a single code. - other than this separate cleanUp, health check variables to avoid overloading them with similar requirements. - also ensure that we compete via context selector for disk monitoring such that the canceled disks don't linger around longer waiting for the ticker to trigger. - allow disabling active monitoring.	2023-08-01 10:54:26 -07:00
Klaus Post	004f1e2f66	Fix trailing header signature mismatch (#17774 ) Seems like clients may omit a newline at the end of the trailer chunk. Each header should end with a newline. Add that if missing. Fixes #17662	2023-08-01 08:45:57 -07:00
Harshavardhana	2fa561f22e	do not crash on invalid metric values (#17764 ) ``` minio[1032735]: panic: label value "\xc0.\xc0." is not valid UTF-8 minio[1032735]: goroutine 1781101 [running]: minio[1032735]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...) ``` log such errors for investigation	2023-08-01 00:55:39 -07:00
Harshavardhana	81be718674	fix: optimize DiskInfo() call avoid metrics when not needed (#17763 )	2023-07-31 15:20:48 -07:00
Sho Ce	49a1e2f98e	update-notifier.go: misleading version age message (#17750 )	2023-07-31 08:36:19 -07:00
Klaus Post	684c46369c	Send events for extracted objects (#17760 ) Fixes #17759	2023-07-31 08:33:51 -07:00
Harshavardhana	73edd5b8fd	introduce 'mc admin config set alias/ api odirect=on' (#17753 ) change disable_odirect=off -> odirect=on to make it easier to understand, instead of making it double negative.	2023-07-31 00:12:53 -07:00
Harshavardhana	5e5bdf5432	capture total errors data availability and any timeout errors (#17748 )	2023-07-29 23:26:26 -07:00
Harshavardhana	f13cfcb83e	allow disabling O_DIRECT for write ops (#17751 ) on really slow systems, O_DIRECT simply kills the drives allow for a way to disable them.	2023-07-29 15:17:56 -07:00
Harshavardhana	731e03fe5a	add ReadFileStream deadline for disk call (#17745 ) timeout the reader side if hung via disk max timeout	2023-07-28 15:37:53 -07:00
Anis Eleuch	7057d00a28	s3: Return invalid bucket name the first thing in all S3 calls (#17742 )	2023-07-28 10:49:20 -07:00
Harshavardhana	114fab4c70	export cluster health as prometheus metrics (#17741 )	2023-07-28 01:16:53 -07:00
ruspaul013	a92cb66468	Get the signed headers in the order they were signed (#17690 ) use pSignValues to get signed headers in order	2023-07-27 11:45:30 -07:00
ruspaul013	535f97ba61	check if metadata headers/url values are equal with signed headers (#17737 )	2023-07-27 11:44:56 -07:00
drivebyer	14ebd82dbd	fix: missing disk metrics when query metric api from peer (#17738 )	2023-07-27 11:44:13 -07:00
Harshavardhana	47dcfcbdd4	introduce deadlines on READ operations (#17724 )	2023-07-27 07:33:05 -07:00
Krishnan Parthasarathi	bf3901342c	Include SuccessorModTime for FileInfo quorum (#17732 )	2023-07-26 17:04:16 -07:00
Harshavardhana	b28bcad11b	avoid Access() calls on known bucket paths (#17719 )	2023-07-26 11:31:40 -07:00
Harshavardhana	a7c71e4c6b	protect disk monitoring to avoid busy loop configuration (#17723 )	2023-07-25 20:02:22 -07:00
Poorna	1a42693d68	replication: limit larger uploads to a subset of workers (#17687 ) Limit large uploads (> 128MiB) to a max of 10 workers, intent is to avoid larger uploads from using all replication bandwidth, giving room for smaller uploads to sync faster.	2023-07-25 20:02:02 -07:00
Harshavardhana	e7b60c4d65	Add slow drive timeouts to match with active disk monitoring (#17701 ) allow active disk-monitoring to be configurable, and use these add deadlines in various call layers for various syscalls.	2023-07-25 16:58:31 -07:00
Poorna	f95129894d	Use decrypted object size while computing object size summary (#17717 ) Corrects an issue with encrypted versioned objects being reported under `unversioned` bin in the object version histogram	2023-07-24 17:13:25 -07:00
Harshavardhana	c32c71c836	allow DNS cache TTL to be configurable (#17709 ) this is added for now as a hidden variable	2023-07-24 15:13:35 -07:00
Harshavardhana	14e1ace552	remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707 ) slower drives get knocked off because they are too slow via active monitoring, we do not need to block calls arbitrarily. Serializing adds latencies for already slow calls, remove it for SSDs/NVMEs Also, add a selection with context when writing to `out <-` channel, to avoid any potential blocks.	2023-07-24 09:30:19 -07:00
drivebyer	a7fb3a3853	fix: Create metrics slice when necessary in getCacheMetrics() (#17711 )	2023-07-24 08:40:21 -07:00
Klaus Post	2da4bd5f1a	Revert "don't error when asked for 0-based range on empty objects (#17708 ) (#17713 ) Revert "don't error when asked for 0-based range on empty objects (#17708)" This reverts commit `7e76d66184`. There is no valid way to specify offsets in a 0-byte file. Blame it on the [RFC](https://datatracker.ietf.org/doc/html/rfc7233#section-4.4) > The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the > request's Range header field (Section 3.1) overlap the current extent of the selected resource... A request for "bytes=0-" is a request for the first byte of a resource. If the resource is 0-length, the range [0,0] does not overlap the resource content and the server responds with an error.	2023-07-24 07:56:28 -07:00
flisk	7e76d66184	don't error when asked for 0-based range on empty objects (#17708 ) In a reverse proxying setup, a proxy in front of MinIO may attempt to request objects in slices for enhanced cache efficiency. Since such a a proxy cannot have prior knowledge of how large a requested resource is, it usually sends a header of the form: Range: 0-$slice_size ... and, depending on the size of the resource, expects either: - an empty response, if $resource_size == 0 - a full response, if $resource_size <= $slice_size - a partial response, if $resource_size > $slice_size Prior to this change, MinIO would respond 416 Range Not Satisfiable if a client tried to request a range on an empty resource. This behavior is technically consistent with RFC9110[1] – However, it renders sliced reverse proxying, such as implemented in Nginx, broken in the case of empty files. Nginx itself seems to break this convention to enable "useful" responses in these cases, and MinIO should probably do that too. [1]: https://www.rfc-editor.org/rfc/rfc9110#byte.ranges	2023-07-23 00:10:03 -07:00
Harshavardhana	7764f4a8e3	return tags as part of Head/Get calls (#17635 ) AWS S3 only returns the number of tag counts, along with that we must return the tags as well to avoid another metadata call to the server.	2023-07-22 07:19:43 -07:00
Kaan Kabalak	6624f970c0	Fix spelling of 'already' across repository (#17703 )	2023-07-21 08:45:08 -07:00
Harshavardhana	331bdc2245	fix: remove CompleteMultipartUpload() 200 OK response for blocking calls (#17699 ) sending whitespace character with CompleteMultipartUpload() with 200 OK was an AWS S3 compatible implementation detail, and it was expected that the client SDK must look for both successful XML as well as error XML for 200 OK. But this is not useful anymore on MinIO, since we do not have any large delayed coalescing of parts anymore.	2023-07-20 22:14:38 -07:00
Harshavardhana	e12ab486a2	avoid using os.Getenv for internal code, use env.Get() instead (#17688 )	2023-07-20 07:52:49 -07:00
Krishnan Parthasarathi	9eeee92d36	Add deletemarker_total metric (#17689 )	2023-07-20 07:52:32 -07:00

... 3 4 5 6 7 ...

5807 Commits