minio

mirror of https://github.com/minio/minio.git synced 2024-12-29 00:23:21 -05:00

Author	SHA1	Message	Date
Klaus Post	6da4a9c7bb	Improve tracing & notification scalability (#18903 ) * Perform JSON encoding on remote machines and only forward byte slices. * Migrate tracing & notification to WebSockets.	2024-01-30 12:49:02 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Anis Eleuch	a669946357	Add cgroup v2 support for memory limit (#18905 )	2024-01-30 11:13:27 -08:00
Poorna	7ffc162ea8	exclude veeam virtual objects from replication (#18918 ) Fixes: #18916	2024-01-30 10:43:58 -08:00
Poorna	bcfd7fbbcf	reuse transports for callhome and remote tgt validation (#18912 )	2024-01-29 23:05:39 -08:00
Harshavardhana	486e2e48ea	enable xattr capture by default (#18911 ) - healing must not set the write xattr because that is the job of active healing to update. what we need to preserve is permanent deletes. - remove older env for drive monitoring and enable it accordingly, as a global value.	2024-01-29 23:03:58 -08:00
Harshavardhana	2ddf2ca934	allow configuring maximum idle connections per host (#18908 )	2024-01-29 16:50:37 -08:00
Daniel Valdivia	403ec7cf21	fix: metrics URI path in prometheus docs (#18907 )	2024-01-29 14:34:21 -08:00
Poorna	29b1a29044	fix metrics panic in node metrics endpoint (#18894 )	2024-01-29 12:32:44 -08:00
jiuker	b4ab8e095a	fix: preserve bucket metric of data usage for replication info (#18895 )	2024-01-29 08:54:20 -08:00
Minio Trusted	ff4f4d4649	Update yaml files to latest version RELEASE.2024-01-29T03-56-32Z	2024-01-29 05:33:23 +00:00
Harshavardhana	9987ff570b	avoid calling close for nil inbound/outblock channels	2024-01-28 19:56:32 -08:00
Harshavardhana	cff8235068	remove getReplicationNodeMetrics() from peer metrics groups	2024-01-28 18:45:20 -08:00
Harshavardhana	9ef132c33b	remove excessive logging due to runtime.debugStack	2024-01-28 18:10:42 -08:00
Minio Trusted	ff8269575a	Update yaml files to latest version RELEASE.2024-01-28T22-35-53Z	2024-01-29 01:22:56 +00:00
Harshavardhana	7743d952dc	fix: incomingBytes() to update via handleMessages() (#18891 ) previous change #18880 was incomplete	2024-01-28 14:35:53 -08:00
Harshavardhana	944f3c1477	remove local disk metrics from cluster metrics (#18886 ) local disk metrics were polluting cluster metrics Please remove them instead of adding relevant ones. - batch job metrics were incorrectly kept at bucket metrics endpoint, move it to cluster metrics. - add tier metrics to cluster peer metrics from the node. - fix missing set level cluster health metrics	2024-01-28 12:53:59 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Klaus Post	38de8e6936	grid: Simpler reconnect logic (#18889 ) Do not rely on `connChange` to do reconnects. Instead, you can block while the connection is running and reconnect when handleMessages returns. Add fully async monitoring instead of monitoring on the main goroutine and keep this to avoid full network lockup.	2024-01-28 08:46:15 -08:00
Harshavardhana	6347fb6636	add missing proper error return in WalkDir() (#18884 ) without this the caller might end up returning incorrect errors and not ignoring the drive properly.	2024-01-27 16:13:41 -08:00
Harshavardhana	32e668eb94	update() stale rebalance stats() object during pool expansion (#18882 ) it is entirely possible that a rebalance process which was running when it was asked to "stop" it failed to write its last statistics to the disk. After this a pool expansion can cause disruption and all S3 API calls would fail at IsPoolRebalancing() function. This PRs makes sure that we update rebalance.bin under such conditions to avoid any runtime crashes.	2024-01-27 10:14:03 -08:00
Harshavardhana	c51f9ef940	fix: regression in internode bytes counting (#18880 ) wire up missing metrics since #18461 Bonus: fix trace output inconsistency	2024-01-27 00:25:49 -08:00
Cesar N	1a91edecae	Update list.md node_cpu wording (#18878 )	2024-01-26 18:57:58 -08:00
Harshavardhana	c88308cf0e	avoid 'panic' on mc admin update for single drive setup (#18876 )	2024-01-26 12:07:03 -08:00
Harshavardhana	88837fb753	add new update v2 that updates per node, allows idempotent behavior (#18859 ) add new update v2 that updates per node, allows idempotent behavior new API ensures that - binary is correct and can be downloaded checksummed verified - committed to actual path - restart returns back the relevant waiting drives	2024-01-26 08:40:13 -08:00
Harshavardhana	d0283ff354	remove unnecessary logs in HealBucket() (#18875 )	2024-01-26 08:39:57 -08:00
Harshavardhana	f449a7ae2c	allow bucket import to be idempotent (#18873 ) do not need to be defensive in our approach, we should simply override anything everything in import process, do not care about what currently exists on the disk - backup is the source of truth.	2024-01-25 17:20:54 -08:00
Klaus Post	a113b2c394	Fix inspect format.json exclusion (#18871 ) Right now the format.json is excluded if anything within `.minio.sys` is requested. I assume the check was meant to exclude only if it was actually requesting it.	2024-01-25 15:59:00 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	e377bb949a	migrate bootstrap logic directly to websockets (#18855 ) improve performance for startup sequences by 2x for 300+ nodes.	2024-01-24 13:36:44 -08:00
Praveen raj Mani	c905d3fe21	fix: Re-use TCP connections for Kafka dials (#18860 ) Fixes #18857	2024-01-24 13:10:52 -08:00
Poorna	b6e9d235fe	fix replication error logs to include target endpoint (#18863 )	2024-01-24 13:05:43 -08:00
Klaus Post	6968f7237a	Add separate grid reconnection mutex (#18862 ) Add separate reconnection mutex Give more safety around reconnects and make sure a state change isn't missed. Tested with several runs of `λ go test -race -v -count=500` Adds separate mutex and doesn't mix in the testing mutex.	2024-01-24 11:49:39 -08:00
Klaus Post	4a6c97463f	Fix all racy use of NewDeadlineWorker (#18861 ) AlmosAll uses of NewDeadlineWorker, which relied on secondary values, were used in a racy fashion, which could lead to inconsistent errors/data being returned. It also propagates the deadline downstream. Rewrite all these to use a generic WithDeadline caller that can return an error alongside a value. Remove the stateful aspect of DeadlineWorker - it was racy if used - but it wasn't AFAICT. Fixes races like: ``` WARNING: DATA RACE Read at 0x00c130b29d10 by goroutine 470237: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).ReadVersion() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:702 +0x611 github.com/minio/minio/cmd.readFileInfo() github.com/minio/minio/cmd/erasure-metadata-utils.go:160 +0x122 github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.1() github.com/minio/minio/cmd/erasure-object.go:809 +0x27a github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.2() github.com/minio/minio/cmd/erasure-object.go:828 +0x61 Previous write at 0x00c130b29d10 by goroutine 470298: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).ReadVersion.func1() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:698 +0x244 github.com/minio/minio/internal/ioutil.(DeadlineWorker).Run.func1() github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33 WARNING: DATA RACE Write at 0x00c0ba6e6c00 by goroutine 94507: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).StatVol.func1() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:419 +0x104 github.com/minio/minio/internal/ioutil.(DeadlineWorker).Run.func1() github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33 Previous read at 0x00c0ba6e6c00 by goroutine 94463: github.com/minio/minio/cmd.(xlStorageDiskIDCheck).StatVol() github.com/minio/minio/cmd/xl-storage-disk-id-check.go:422 +0x47e github.com/minio/minio/cmd.getBucketInfoLocal.func1() github.com/minio/minio/cmd/peer-s3-server.go:275 +0x122 github.com/minio/pkg/v2/sync/errgroup.(*Group).Go.func1() ``` Probably back from #17701	2024-01-24 10:08:31 -08:00
Frank Wessels	6c912ac960	Fix startup message when using single path (#18856 )	2024-01-24 10:02:56 -08:00
Harshavardhana	708cebe7f0	add necessary protection err, fileInfo slice reads and writes (#18854 ) protection was in place. However, it covered only some areas, so we re-arranged the code to ensure we could hold locks properly. Along with this, remove the DataShardFix code altogether, in deployments with many drive replacements, this can affect and lead to quorum loss.	2024-01-24 01:08:23 -08:00
Albert	152023e837	Correct a mistake in the value.yaml of minio helm chart (#18611 ) Only rootUser and rootPassword will be generated when not set.	2024-01-23 23:33:13 -08:00
Kevin Huber	0f16e19239	Helm: Add apiVersion and kind to the StatefulSets volumeClaimTemplates (#18770 )	2024-01-23 23:28:49 -08:00
Gonçalo Heleno	2c38e44e48	feat(chart): add support to set the display name of OpenID provider (#18781 )	2024-01-23 23:28:25 -08:00
Zirko	82739574b5	Helm: add cilium networkpolicy (#18650 ) Signed-off-by: QuantumEnigmaa <thibaud@giantswarm.io>	2024-01-23 23:27:57 -08:00
Harshavardhana	f78d677ab6	pre-allocate EC memory by default at startup (#18846 )	2024-01-23 20:41:11 -08:00
Poorna	e39e2306d6	site replication: remove extraneous log for missing group (#18785 )	2024-01-23 18:28:11 -08:00
Harshavardhana	52229a21cb	avoid reload of 'format.json' over the network under normal conditions (#18842 )	2024-01-23 14:11:46 -08:00
Harshavardhana	961f7dea82	compress binary while sending it to all the nodes (#18837 ) Also limit the amount of concurrency when sending binary updates to peers, avoid high network over TX that can cause disconnection events for the node sending updates.	2024-01-22 12:16:36 -08:00
Klaus Post	feeeef71f1	Add extra protection for grid reconnects (#18840 ) Race checks would occasionally show race on handleMsgWg WaitGroup by debug messages (used in test only). Use the `connMu` mutex to protect this against concurrent Wait/Add. Fixes #18827	2024-01-22 09:39:06 -08:00
Shubhendu	65c4d550cb	Distribution bucket metrics with site replication (#18841 ) If site replication is enabled, we should still show the size and version distribution histogram metrics at bucket level. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-22 08:45:36 -08:00
Harshavardhana	f9b4a8d6e8	improve server update behavior by re-using memory properly (#18831 )	2024-01-19 18:27:58 -08:00
Harshavardhana	e11d851aee	add new drive I/O waiting/tokens metric (#18836 ) Bonus: add virtual memory used as well part of the system resource metrics.	2024-01-19 14:51:36 -08:00
Harshavardhana	ac81f0248c	introduce new ServiceV2 API to handle guided restarts (#18826 ) New API now verifies any hung disks before restart/stop, provides a 'per node' break down of the restart/stop results. Provides also how many blocked syscalls are present on the drives and what users must do about them. Adds options to do pre-flight checks to provide information to the user regarding any hung disks. Provides 'force' option to forcibly attempt a restart() even with waiting syscalls on the drives.	2024-01-19 14:22:36 -08:00
Klaus Post	83bf15a703	grid: Return rejection reason (#18834 ) When rejecting incoming grid requests fill out the rejection reason and log it once. This will give more context when startup is failing. Already logged after a retry on caller.	2024-01-19 10:35:24 -08:00

... 8 9 10 11 12 ...

11925 Commits