Commit Graph

117 Commits

Author SHA1 Message Date
Anis Eleuch
8432fd5ac2 prom: Add online and healing drives metrics per erasure set (#18700) 2023-12-21 16:56:43 -08:00
Harshavardhana
7c948adf88 allow pre-allocating buffers to reduce frequent GCs during growth (#18686)
This PR also increases per node bpool memory from 1024 entries
to 2048 entries; along with that, it also moves the byte pool
centrally instead of being per pool.
2023-12-21 08:59:38 -08:00
Klaus Post
5f971fea6e Fix Mux Connect Error (#18567)
`OpMuxConnectError` was not handled correctly.

Remove local checks for single request handlers so they can 
run before being registered locally.

Bonus: Only log IAM bootstrap on startup.
2023-12-01 00:18:04 -08:00
jiuker
be02333529 feat: drive sub-sys to max timeout reload (#18501) 2023-11-27 09:15:06 -08:00
Harshavardhana
877e0cac03 fix: tiering statistics handling a bug in clone() implementation (#18342)
Tiering statistics have been broken for some time now, a regression
was introduced in 6f2406b0b6

Bonus fixes an issue where the objects are not assumed to be
of the 'STANDARD' storage-class for the objects that have
not yet tiered, this should be conditional based on the object's
metadata not a default assumption.

This PR also does some cleanup in terms of implementation,

fixes #18070
2023-10-30 09:59:51 -07:00
Harshavardhana
c60f54e5be make ListMultipart/ListParts more reliable skip healing disks (#18312)
this PR also fixes old flaky tests, by properly marking disk offline-based tests.
2023-10-24 23:33:25 -07:00
Harshavardhana
9f7044aed0 fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Aditya Manthramurthy
1c99fb106c Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana
4643efe6be fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana
c45bc32d98 skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana
a7a7533190 add new errors for Disks with timeouts (#17770) 2023-08-01 12:47:50 -07:00
Harshavardhana
81be718674 fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Anis Eleuch
756d6aa729 fix: report correct pool/set/disk indexes for offline disks (#17695) 2023-07-20 07:48:21 -07:00
Harshavardhana
bdddf597f6 shuffle buckets randomly before being scanned (#17644)
this randomness is needed to avoid scanning
the same buckets across different erasure sets,
in the same order.

allow random buckets to be scanned instead
allowing a wider spread of ILM, replication
checks.

Additionally do not loop over twice to fill
the channel, fill the channel regardless of
having bucket new or old.
2023-07-14 02:25:40 -07:00
Kaan Kabalak
f64d62b01d Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Kaan Kabalak
21fbe88e1f Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Aditya Manthramurthy
5a1612fe32 Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Praveen raj Mani
72802a5972 Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Harshavardhana
8fd07bcd51 simplify sort.Sort by using sort.Slice (#17066) 2023-04-24 13:28:18 -07:00
Anis Eleuch
1f1c267b6c Add used inodes to disk info (#16994) 2023-04-07 07:52:14 -07:00
Klaus Post
a547bf517d Remove locks on usage cache (#16786) 2023-03-09 15:15:46 -08:00
Klaus Post
d07089ceac Fix scanner deadlock on lost global lock (#16726) 2023-02-28 21:34:45 -08:00
Klaus Post
9acf1024e4 Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Harshavardhana
b4ef5ff294 remove unnecessary code checking for supported features (#16423) 2023-01-17 19:37:47 +05:30
Harshavardhana
f1bbb7fef5 vectorize cluster-wide calls such as bucket operations (#16313) 2023-01-03 08:16:39 -08:00
Aditya Manthramurthy
a30cfdd88f Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Harshavardhana
5a8df7efb3 re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Klaus Post
98ba622679 Reduce temporary file clean-up waits (#16110) 2022-11-22 07:23:36 -08:00
Harshavardhana
23b329b9df remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Anis Elleuch
18fb86b7be convert context.DeadlineExceed to offline disk in DiskInfo() (#15886) 2022-10-18 03:01:16 -07:00
Harshavardhana
c79bcc8838 Revert "convert context.DeadlineExceed to offline disk in DiskInfo() (#15869)"
This reverts commit 0fe58dbb34.
2022-10-14 20:37:50 -07:00
Anis Elleuch
0fe58dbb34 convert context.DeadlineExceed to offline disk in DiskInfo() (#15869) 2022-10-14 19:32:13 -07:00
Anis Elleuch
db7a9b2c37 heal-info: Return the endpoint of a disk with unknown state (#15854) 2022-10-13 16:41:44 -07:00
Anis Elleuch
5682685c80 Introduce disk io stats metrics (#15512) 2022-08-16 07:13:49 -07:00
Harshavardhana
a406bb0288 restrict number of disks used for scanning buckets upto GOMAXPROCS (#15492)
control scanner parallelism to avoid higher CPU
usage on nodes that have more drives but an old CPU.
2022-08-08 16:16:44 -07:00
ebozduman
b57e7321e7 Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Klaus Post
ac055b09e9 Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
Harshavardhana
f1abb92f0c feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Anis Elleuch
16431d222c heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Harshavardhana
0e3bafcc54 improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Harshavardhana
57118919d2 cached diskIDs are not needed for scanner healing (#14170)
This PR removes an unnecessary state that gets
passed around for DiskIDs, which is not necessary
since each disk exactly knows which pool and which
set it belongs to on a running system.

Currently cached DiskId's won't work properly
because it always ends up skipping offline disks
and never runs healing when disks are offline, as
it expects all the cached diskIDs to be present
always. This also sort of made things in-flexible
in terms perhaps a new diskID for `format.json`.
(however this is not a big issue)

This is an unnecessary requirement that healing
via scanner needs all drives to be online, instead
healing should trigger even when partial nodes
and drives are available this ensures that we
keep the SLA in-tact on the objects when disks
are offline for a prolonged period of time.
2022-01-26 08:34:56 -08:00
Harshavardhana
9d588319dd support site replication to replicate IAM users,groups (#14128)
- Site replication was missing replicating users,
  groups when an empty site was added.

- Add site replication for groups and users when they
  are disabled and enabled.

- Add support for replicating bucket quota config.
2022-01-19 20:02:24 -08:00
Krishnan Parthasarathi
070c31eac5 Wait for updates collector when disk.NSScanner returns error (#14127) 2022-01-19 00:46:43 -08:00
Harshavardhana
f527c708f2 run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
866a95de38 fix: choose appropriate quorum for a given erasure set (#13998)
multiObject delete should honor expected quorum
2021-12-28 12:41:52 -08:00
Anis Elleuch
1d9e91e00f Fix wrong reporting of total disks after restart (#13326)
A restart of the cluster and a failed disk will wrongly count 
the number of total disks.
2021-09-29 11:36:19 -07:00
Klaus Post
88d719689c Synchronize bucket cycle numbers (#13058)
Synchronize bucket cycles so it is much more
likely that the same prefixes will be picked up
for scanning.

Use the global bloom filter cycle for that. 
Bump bloom filter versions to clear those.
2021-08-25 08:25:26 -07:00
Klaus Post
cc60d66909 Fix incremental usage accounting (#12871)
Remote caches were not returned correctly, so they would not get updated on save.

Furthermore make some tweaks for more reliable updates.

Invalidate bloom filter to ensure rescan.
2021-08-04 09:14:14 -07:00
Anis Elleuch
b0b4696a64 heal: Add MRF metrics to background heal API response (#12398)
This commit gathers MRF metrics from 
all nodes in a cluster and return it to the caller. This will show information about the 
number of objects in the MRF queues 
waiting to be healed.
2021-07-15 22:32:06 -07:00
Klaus Post
a7e2a1a38b fix: two different scanner update races (#12615) 2021-07-02 11:19:56 -07:00