minio

mirror of https://github.com/minio/minio.git synced 2025-11-25 12:06:10 -05:00

Author	SHA1	Message	Date
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Krishnan Parthasarathi	6a67c277eb	Reuse types for key-value, notification and retry (#17936 )	2023-08-29 11:27:23 -07:00
Harshavardhana	7cafdc0512	fix: skip access checks further for known buckets (#17934 )	2023-08-28 15:16:41 -07:00
Harshavardhana	8a57b6bced	use renameat2 Linux extension syscall (#17757 ) this is a faster and safer alternative on newer kernel versions.	2023-08-27 09:57:11 -07:00
Krishnan Parthasarathi	53abd25116	Don't log when object to be tiered is not found (#17924 )	2023-08-25 23:34:16 -07:00
Harshavardhana	1ea7826c0e	do not have to consider replicationTimestamp for healing and quorum (#17922 ) replicationTimestamp might differ if there were retries in replication and the retried attempt overwrote in quorum but enough shards with newer timestamp causing the existing timestamps on xl.meta to be invalid, we do not rely on this value for anything external. this is purely a hint for debugging purposes, but there is no real value in it considering the object itself is in-tact we do not have to spend time healing this situation. we may consider healing this situation in future but that needs to be decoupled to make sure that we do not over calculate how much we have to heal.	2023-08-25 15:31:15 -07:00
Anis Eleuch	0cde37be50	Reduce the number of calls to import bucket metadata (#17899 ) For each bucket, save the bucket metadata once, call the site replication hook once	2023-08-25 07:59:16 -07:00
jiuker	6aeca54ece	fix: replace context by timeout-context from parent-context when `selfSpeedTest` (#17906 )	2023-08-25 07:58:38 -07:00
Harshavardhana	124e28578c	remove strict persistence requirements for List() .metacache objects (#17917 ) .metacache objects are transient in nature, and are better left to use page-cache effectively to avoid using more IOPs on the disks. this allows for incoming calls to be not taxed heavily due to multiple large batch listings.	2023-08-25 07:58:11 -07:00
Harshavardhana	62c9e500de	remove mTime requirement from pre-condition checks (#17916 ) given a versionId the mtime is always the same, it can never be different than its original value. versionIds also do not conflict, since they are uuid's and unique practically forever.	2023-08-24 14:33:58 -07:00
jiuker	02cc18ff29	refactor the perf client for TTFB and TotalResponseTime (#17901 )	2023-08-24 10:21:08 -07:00
Harshavardhana	ba4566e86d	add missing IAM node metrics to cluster and node endpoint (#17908 )	2023-08-24 09:26:37 -07:00
Krishnan Parthasarathi	87cb0081ec	Retain current and upto NewerNoncurrentVersions versions (#17909 ) applyNewerNoncurrentVersionLimit method should pass along versions unaffected by NewerNoncurrentVersions rule for further ILM evaluation.	2023-08-24 09:26:29 -07:00
Poorna	4a6af93c83	mark replication target offline if network timeouts seen (#17907 ) regular target liveness check every 5 secs will toggle state back as target returns online.	2023-08-24 09:24:26 -07:00
Harshavardhana	af564b8ba0	allow bootstrap to capture time-spent for each initializers (#17900 )	2023-08-23 03:07:06 -07:00
Klaus Post	7c8746732b	Return cancelled storage calls as 499 (#17895 ) Make upstream cancels more visible - right now they are just reported as "forbidden".	2023-08-22 11:10:41 -07:00
Klaus Post	f506117edb	Reduce memory profiling rate (#17894 ) Change profiling from every 4KB to every 128K, reducing the lock contention by a factor of 32.	2023-08-22 07:21:49 -07:00
Harshavardhana	1c5af7c31a	serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886 ) we expect a certain level of IOPs and latency so this is okay. fixes other miscellaneous bugs - such as hanging on mrfCh <- when the context is canceled - queuing MRF heal when the context is canceled - remove unused saveStateCh channel	2023-08-21 16:44:50 -07:00
Harshavardhana	3a0125fa1f	remove unexpected logging from peer calls (#17888 ) also make sure RequestID is set for system logs	2023-08-21 14:25:24 -07:00
Daniel Valdivia	328cb0a076	Pass environment variable to control session length to console (#17885 ) Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>	2023-08-21 11:55:43 -07:00
jiuker	e3ea97c964	fix: replace req context by locker context (#17880 )	2023-08-19 22:09:07 -07:00
Andreas Auernhammer	8f8f8854f0	update `minio/kes-go` dep to v0.2.0 (#17850 ) This commit updates the minio/kes-go dependency to v0.2.0 and updates the existing code to work with the new KES APIs. The `SetPolicy` handler got removed since it may not get implemented by KES at all and could not have been used in the past since stateless KES is read-only w.r.t. policies and identities. Signed-off-by: Andreas Auernhammer <hi@aead.dev>	2023-08-19 07:37:53 -07:00
Anis Eleuch	4c6869cd9a	ilm: Fix cleaning non current null versions (#17876 )	2023-08-18 12:55:47 -07:00
Harshavardhana	dde1a12819	fix: validate incoming uploadID to be base64 encoded (#17865 ) Bonus fixes include - do not have to write final xl.meta (renameData) does this already, saves some IOPs. - make sure to purge the multipart directory properly using a recursive delete, otherwise this can easily pile up and rely on the stale uploads cleanup. fixes #17863	2023-08-17 09:37:55 -07:00
Harshavardhana	9ebd10d3f4	Revert "Include SuccessorModTime for FileInfo quorum (#17732 )" (#17860 ) This reverts commit `bf3901342c`. This is to fix a regression caused when there are inconsistent versions, but one version is in quorum. SuccessorModTime issue must be fixed differently.	2023-08-16 07:51:33 -07:00
Harshavardhana	3ba927edae	fix: batch status reporting after complete (#17852 ) batch status can perpetually wait after completion due to a race between the MetricsHandler() returning the active metrics in intervals of 1sec and delete of metrics after job completion. this PR ensures that we keep the 'status' around for a while, i.e upto 24hrs for all the batch jobs.	2023-08-15 12:22:30 -07:00
Harshavardhana	c4ca0a5a57	add two more drive metrics when metrics is available (#17854 )	2023-08-15 10:55:47 -07:00
Klaus Post	406ea4f281	Fix distributed listing not able to resume (#17855 ) Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`. This meant that in distributed systems listings would never be able to continue and would restart on every call. Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.	2023-08-15 07:45:25 -07:00
Harshavardhana	64aa7feabd	allow specifying lower disks for Walk() (#17829 ) useful when you may want Walk() with reduced quorum requirements.	2023-08-14 21:32:39 -07:00
Poorna	875f4076ec	site replication: avoid retries when peer is offline (#17853 )	2023-08-14 21:31:41 -07:00
Harshavardhana	4643efe6be	fix: add deadline worker pattern for local disk removers (#17845 )	2023-08-14 12:28:13 -07:00
Harshavardhana	b760137e1d	fix: add proxyByNode for batch jobs as part of their jobId (#17844 )	2023-08-11 13:12:35 -07:00
Harshavardhana	5f56f441bf	fix: apply common notification code with content-type (#17843 )	2023-08-11 11:34:43 -07:00
Klaus Post	96a22bfcbb	fix: wrapped io.EOF during ListObjects() (#17842 ) When listing getObjectFileInfo can return `io.EOF` if file is being written. When we wrap the error it will not retry upstream, since `io.EOF` is a valid return value. Allow one retry before returning errors and canceling the listing.	2023-08-11 09:47:16 -07:00
Poorna	dfaf735073	replication: fix queuing of large uploads (#17831 ) Fixes regression from #17687	2023-08-10 15:48:42 -07:00
Anis Eleuch	7fcfde7f07	s3: Pick a pool with >85% if all other pools are in suspended state (#17826 )	2023-08-10 11:06:31 -07:00
jiuker	b1391d1991	feat: support perf client to show `TX` from client to server (#17718 )	2023-08-10 07:14:46 -07:00
Harshavardhana	eb55034dfe	optimize deletePrefix, use direct set location via object name (#17827 ) * optimize deletePrefix, use direct set location via object name instead of fanning out the calls for an object force delete we can assume the set location and not do fan-out calls * Apply suggestions from code review Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com> --------- Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2023-08-09 16:30:22 -07:00
Harshavardhana	c45bc32d98	skip disks under scanning when healing disks (#17822 ) Bonus: - avoid calling DiskInfo() calls when missing blocks instead heal the object using MRF operation. - change the max_sleep to 250ms beyond that we will not stop healing.	2023-08-09 12:51:47 -07:00
Harshavardhana	6e860b6dc5	count all versions as part of DeleteAllVersionsAction (#17821 )	2023-08-09 08:55:19 -07:00
Harshavardhana	b732a673dc	reduce logging in bucket replication in retry scenarios (#17820 )	2023-08-08 13:27:40 -07:00
Yang Wu	23e4895dfc	Create metrics slice when necessary (#17809 )	2023-08-07 02:21:22 -07:00
Harshavardhana	8666c55ca6	fix: do not use PrefixEnabled() logic to ignore valid objects (#17677 ) ignoring valid objects with valid replication metadata after the Prefix was disabled must still honor the older metadata. this can lead to unexpected results, allow it during READ phase always.	2023-08-05 13:56:01 -07:00
Anis Eleuch	a3f00c5d5e	batch: Strict unmarshal yaml document to avoid user made typos (#17808 ) // UnmarshalStrict is like Unmarshal except that any fields that are found // in the data that do not have corresponding struct members, or mapping // keys that are duplicates, will result in // an error.	2023-08-05 13:51:48 -07:00
Poorna	26c23b30f4	replication: set context timeout for NewMultipartUpload calls (#17807 )	2023-08-05 12:27:07 -07:00
Anis Eleuch	a436fd513b	track client disconnections properly for all ListObjects calls (#17804 ) Currently ListObjects* calls were returning 200 OK for timed-out clients, this makes debugging via `mc admin trace` very hard.	2023-08-04 15:57:27 -07:00
Harshavardhana	533cd8d6df	fix: batch replication pull must preserve versionID (#17805 ) batch replication pull must preserve versionID regardless of destination bucket versioning configuration. This is similar to the issue with decommissioning and rebalancing	2023-08-04 12:09:10 -07:00
Harshavardhana	cb089dcb52	error out by default beyond 10000 versions per object (#17803 ) ``` You've exceeded the limit on the number of versions you can create on this object ```	2023-08-04 10:40:21 -07:00
Harshavardhana	239ccc9c40	fix: crash in globalTierJournal when TierConfig is not initialized (#17791 )	2023-08-03 14:16:15 -07:00
Poorna	b762fbaf21	sts: validate if iam subsystem initialized in handlers (#17796 )	2023-08-03 13:24:25 -07:00

1 2 3 4 5 ...

5447 Commits