minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	ebc6c9b498	Fix tracing send on closed channel (#18982 ) Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing: ``` Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]: Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(PubSub[...]).SubscribeJSON.func1() Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(PubSub[...]).SubscribeJSON in goroutine 378010884 Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352 ``` Wait explicitly for the goroutine to exit. Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.	2024-02-06 08:57:30 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Harshavardhana	ff80cfd83d	move Make,Delete,Head,Heal bucket calls to websockets (#18951 )	2024-02-02 14:54:54 -08:00
Frank Wessels	31743789dc	Fix some leftover issues from PR 18936 (#18946 )	2024-02-01 19:42:56 -08:00
Klaus Post	b192bc348c	Improve object reuse for grid messages (#18940 ) Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers. This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool. Add internal request reuse of internal types. Add for safe (pointerless) types explicitly. Only log params for internal types. Doing Sprint(obj) is just a bit too messy.	2024-02-01 12:41:20 -08:00
Harshavardhana	6440d0fbf3	move a collection of peer APIs to websockets (#18936 )	2024-02-01 10:47:20 -08:00
Klaus Post	6da4a9c7bb	Improve tracing & notification scalability (#18903 ) * Perform JSON encoding on remote machines and only forward byte slices. * Migrate tracing & notification to WebSockets.	2024-01-30 12:49:02 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	88837fb753	add new update v2 that updates per node, allows idempotent behavior (#18859 ) add new update v2 that updates per node, allows idempotent behavior new API ensures that - binary is correct and can be downloaded checksummed verified - committed to actual path - restart returns back the relevant waiting drives	2024-01-26 08:40:13 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	961f7dea82	compress binary while sending it to all the nodes (#18837 ) Also limit the amount of concurrency when sending binary updates to peers, avoid high network over TX that can cause disconnection events for the node sending updates.	2024-01-22 12:16:36 -08:00
Harshavardhana	f9b4a8d6e8	improve server update behavior by re-using memory properly (#18831 )	2024-01-19 18:27:58 -08:00
Harshavardhana	ac81f0248c	introduce new ServiceV2 API to handle guided restarts (#18826 ) New API now verifies any hung disks before restart/stop, provides a 'per node' break down of the restart/stop results. Provides also how many blocked syscalls are present on the drives and what users must do about them. Adds options to do pre-flight checks to provide information to the user regarding any hung disks. Provides 'force' option to forcibly attempt a restart() even with waiting syscalls on the drives.	2024-01-19 14:22:36 -08:00
Zhou Ting	31d16f6cc2	allow sha256 payload to be configurable for object perf test (#18712 ) Signed-off-by: Zhou Ting <ting.z.zhou@intel.com>	2023-12-29 23:56:50 -08:00
Anis Eleuch	8432fd5ac2	prom: Add online and healing drives metrics per erasure set (#18700 )	2023-12-21 16:56:43 -08:00
Harshavardhana	754f7a8a39	replace io.Discard usage to fix some NUMA copy() latencies (#18394 ) replace io.Discard usage to fix NUMA copy() latencies On NUMA systems copying from 8K buffer allocated via io.Discard leads to large latency build-up for every ``` copy(new8kbuf, largebuf) ``` can in-cur upto 1ms worth of latencies on NUMA systems due to memory sharding across NUMA nodes.	2023-11-06 14:26:08 -08:00
Shireesh Anjal	f6e581ce54	Capture network device info in health report (#18381 )	2023-11-02 09:49:49 -07:00
Shireesh Anjal	bf1c6edb76	Revert "Capture network device info in health report" (#18241 ) Introducing a new version of healthinfo struct for adding this info is not correct. It needs to be implemented differently without adding a new version. This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.	2023-10-13 07:46:36 -07:00
Shireesh Anjal	a66a7f3e97	Capture network device info in health report (#18213 )	2023-10-12 15:33:31 -07:00
Harshavardhana	1971c54a50	update buffer channels for both trace and listen events (#18171 ) - Trace needs higher buffered channels than 4000 to ensure when we run `mc admin trace -a` it captures all information sufficiently. - Listen event notification needs the event channel to be `apiRequestsMaxPerNode` * number of nodes	2023-10-05 18:16:04 -06:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
jiuker	a99cd825ab	fix: byHost realTime metrics API (#17681 )	2023-07-18 23:50:30 -07:00
Harshavardhana	6426b74770	move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663 ) users/customers do not have a reasonable number of buckets anymore, this is why we must avoid overpopulating cluster endpoints, instead move the bucket monitoring to a separate endpoint. some of it's a breaking change here for a couple of metrics, but it is imperative that we do it to improve the responsiveness of our Prometheus cluster endpoint. Bonus: Added new cluster metrics for usage, objects and histograms	2023-07-18 22:25:12 -07:00
Poorna	5e2f8d7a42	replication: Simplify mrf requeueing and add backlog handler (#17171 ) Simplify MRF queueing and add backlog handler - Limit re-tries to 3 to avoid repeated re-queueing. Fall offs to be re-tried when the scanner revisits this object or upon access. - Change MRF to have each node process only its MRF entries. - Collect MRF backlog by the node to allow for current backlog visibility	2023-07-12 23:51:33 -07:00
Anis Eleuch	6d0bc5ab1e	prometheus: Fix internode stats (#17594 ) Internode calculation was done inside S3 handlers, fix it by moving it to internode handlers. Remove admin stats since it is not used.	2023-07-08 07:35:11 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Harshavardhana	7605d07bb2	add support for bucket level request count per API (#17468 ) New metrics added to calculate API request count per bucket, per API. Captures errors, including 4xx, 5xx HTTP status codes separately.	2023-06-21 09:41:59 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Aditya Manthramurthy	518f6e4d39	fix: missing return after error response (#16920 )	2023-03-29 16:21:13 -07:00
Krishnan Parthasarathi	31fba6f434	Save bootstrap trace events in a circular buffer (#16823 )	2023-03-17 16:01:03 -07:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Daniel Valdivia	fb17f97cf3	move audit and logger message structure to minio/pkg (#16655 ) Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>	2023-02-21 21:21:17 -08:00
Harshavardhana	31b0decd46	migrate to minio/mux from gorilla/mux (#16456 )	2023-01-23 16:42:47 +05:30
Anis Elleuch	3039fd4519	Optimize background heal status to use LocalStorageInfo (#16414 )	2023-01-17 05:02:00 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Poorna	d6bc141bd1	feat: Add support for site level resync (#15753 )	2022-11-14 07:16:40 -08:00
Klaus Post	ddeca9f12a	fix: filter rest errors and logs returned (#16019 )	2022-11-07 10:38:08 -08:00
Klaus Post	71954faa3a	mark pubsub type safe via generics (#15961 )	2022-10-28 10:55:42 -07:00
Krishnan Parthasarathi	4523da6543	feat: introduce pool-level rebalance (#15483 )	2022-10-25 12:36:57 -07:00
Harshavardhana	2a13cc28f2	feat: implement support batch replication (#15554 )	2022-10-05 23:00:43 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Anis Elleuch	97a6322de1	Fix regression in notifying peers about new policy mapping (#15583 ) Switch from mux.Vars() to r.Form to avoid the issue of missing arguments passed to LoadPolicyMappingHandler.	2022-08-24 12:34:52 -07:00
Aditya Manthramurthy	afbb63a197	Factor out external event notification funcs (#15574 ) This change moves external event notification functionality into `event-notification.go`. This simplifies notification related code.	2022-08-24 06:42:36 -07:00
Anis Elleuch	b8cdf060c8	Properly replicate policy mapping for virtual users (#15558 ) Currently, replicating policy mapping for STS users does not work. Fix it is by passing user type to PolicyDBSet.	2022-08-23 11:11:45 -07:00
Anis Elleuch	5682685c80	Introduce disk io stats metrics (#15512 )	2022-08-16 07:13:49 -07:00
Cesar Celis Hernandez	8ec888d13d	feat: update binary once and push it to other servers (#15407 )	2022-07-29 08:34:30 -07:00
Anis Elleuch	e4b51235f8	upgrade: Split in two steps to ensure a stable retry (#15396 ) Currently, if one server in a distributed setup fails to upgrade due to any reasons, it is not possible to upgrade again unless nodes are restarted. To fix this, split the upgrade process into two steps : - download the new binary on all servers - If successful, overwrite the old binary with the new one	2022-07-25 17:49:47 -07:00

1 2 3 4 5

201 Commits