Commit Graph

9497 Commits

Author SHA1 Message Date
Poorna
712dfa40cd
Add missing site replication hook for clearing sse config (#14512) 2022-03-10 00:04:34 -08:00
Harshavardhana
decfd6108c update dperf to calculate timing for fdatasync()/close() calls as well 2022-03-09 13:47:44 -08:00
Klaus Post
b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Daichi Mukai
0e3a570b85
helm: add namespace to StatefulSet (#14509)
Even if we specify the target namespace by `helm install --namespace`, 
the StatefulSet is created on the default namespace. Since this resource
references the ServiceAccount created on the target namespace, pods are
hindered to be created. To avoid this, we deploy the StatefulSet to the
target namespace of helm.
2022-03-09 11:25:36 -08:00
Klaus Post
7060c809c0
Add authorization header to HEAD requests (#14510)
Add Authorization to network check requests.

Fixes #14507
2022-03-09 10:48:56 -08:00
Andreas Auernhammer
9dbfd84c5b
CI: use MINIO_KMS_SECRET_KEY when verify healing (#14511)
This commit replaces the KMS / KES environment
variables with `MINIO_KMS_SECRET_KEY` when testing
healing on CI.

This change is necessary since KES `0.18.0` introduced
some API breaking changes and the healing tests run
a test (`verify-3604`) that requires an older MinIO
version (e.g. `2021-11-24T23-19-33Z`) which is not
able to parse a KES error as expected.

This commit allows the KES instance at `https://play.min.io:7373`
to get updated to newer versions.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-09 10:48:29 -08:00
Minio Trusted
fce380a044 Update yaml files to latest version RELEASE.2022-03-08T22-28-51Z 2022-03-09 01:36:59 +00:00
Poorna
46ba15ab03
Return MethodNotAllowed if force del on replicated bucket (#14505) 2022-03-08 14:28:51 -08:00
Poorna
1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Krishnan Parthasarathi
80ef1ae51c
Simplify assembling of tierStats from data-usage (#14504) 2022-03-08 12:08:29 -08:00
Krishna Srinivas
4d0715d226
Implement netperf for "mc support perf net" (#14397)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-03-08 09:54:38 -08:00
Klaus Post
8a274169da
heal: Fix first entry on dangling (#14495)
Instead of the first, the last entry was returned
pointerizing the range value.
2022-03-08 09:04:20 -08:00
Harshavardhana
21d8298fe1 update console UI to release v0.15.1 2022-03-07 23:40:58 -08:00
Harshavardhana
5d6f6d8d5b
create missing .minio.sys/config, .minio.sys/buckets during decommission (#14497) 2022-03-07 16:18:57 -08:00
Anis Elleuch
bacf6156c1
metrics: Avoid crash when fetching tier metrics (#14493)
Data usage does not always contain tiering info even if the data usage
information is valid. Avoid a crash in that case.

(e.g. the scanner scanned the namespace, the user enables tiering,
prometheus scrapes the server before the scanner gets a chance to
update the data usage with new tiering information)
2022-03-07 10:59:32 -08:00
Klaus Post
1d1b213f1f
scanner: Consider preselection bias when selecting for Healing (#14492)
Healing decisions would align with skipped folder counters. This can lead to files 
never being selected for heal checks on "clean" paths.

Use different hashing methods and take objectHealProbDiv into account when 
calculating the cycle.

Found by @vadmeste
2022-03-07 09:25:53 -08:00
Minio Trusted
1f11af42f1 Update yaml files to latest version RELEASE.2022-03-05T06-32-39Z 2022-03-05 09:27:28 +00:00
Jan Madera
a026c8748f
Update nginx.conf for large file uploads (#14481) 2022-03-04 22:32:39 -08:00
David Young
9f7d89b3cd
Add option to ignore checksumming config/secrets (#14396)
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
2022-03-04 22:32:15 -08:00
Harshavardhana
92a77cc78e
update pkg v1.1.20 to reload certs in k8s always (#14470) 2022-03-04 20:34:39 -08:00
Harshavardhana
b0c84e3de7
fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483)
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.

This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
2022-03-04 20:01:26 -08:00
Anis Elleuch
bbc914e174
heal: Do not override heal scan mode mode if it is set (#14476)
mc admin heal has --scan=deep flag which enforces bitrot checking 
when doing the healing.

Do not force override an existing heal scan option.
2022-03-04 18:25:06 -08:00
Anis Elleuch
3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana
66afa16aed
canceled PUTs throw frivolous logs (#14475)
remote drives might throw frivolous logs,
if the caller canceled the PUT operation
in such scenarios there is no reason to log.
2022-03-04 10:31:33 -08:00
Harshavardhana
9b0a8de7de update helm v3.5.9 2022-03-03 15:29:03 -08:00
Minio Trusted
04bbede17d Update yaml files to latest version RELEASE.2022-03-03T21-21-16Z 2022-03-03 22:16:10 +00:00
Harshavardhana
0e3bafcc54
improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Andreas Auernhammer
b48f719b8e
kes: remove unnecessary error conversion (#14459)
This commit removes some duplicate code that
converts KES API errors.

This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.

See: 21555fa624/error.go (L94)

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-03 09:42:37 -08:00
Lenin Alevski
289fcbd08c
KES dependency upgrade (#14454)
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation

Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-03-02 23:03:40 -08:00
Harshavardhana
f6875bb893
fix: regression from refactor in AMQP notification (#14455)
fixes a regression introduced in #14269 that refactored
the notification registration logic, all the amqp targets
however online will not be available for use anymore.

fixes #14451
2022-03-02 21:35:48 -08:00
Harshavardhana
7e803adf13
do not attempt force delete on bucket (#14452)
caller needs to ask explicitly for force delete
otherwise, the force delete might end up deleting
an existing bucket with data.

fixes #14445
2022-03-02 20:47:53 -08:00
Harshavardhana
5b5deee5b3 update minio/pkg to v1.1.18 2022-03-02 19:25:07 -08:00
Krishnan Parthasarathi
7dae4cb685
Update minio/pkg to v1.1.17 (#14450)
Fix for admin policy validation of KMSCreateKey
2022-03-02 17:06:06 -08:00
Emmet McPoland
27fad98179
Replace HeadBucket permission with GetBucketAcl (#14436)
Resolves https://github.com/minio/minio/issues/14379
2022-03-01 21:18:23 -08:00
Harshavardhana
58f7e3a829 update console v0.15.0, coredns v1.9.0 2022-03-01 17:17:18 -08:00
Anis Elleuch
4a15bd8ff8
Return info for DiskInfo when the disk is unformatted (#14427)
In a distributed setup, a DiskInfo REST call to an unformatted disk
returns an error with no disk information, such as the disk endpoint
URL, which is unexpected.
2022-03-01 15:06:47 -08:00
Klaus Post
b030ef1aca
tests: Clean up dsync package (#14415)
Add non-constant timeouts to dsync package.

Reduce test runtime by minutes. Hopefully not too aggressive.
2022-03-01 11:14:28 -08:00
Harshavardhana
cc46a99f97
skip object-lock headers without values (#14430)
metadata headers can have headers without values
as per AWS S3 spec however, we need to skip some
headers that do not have values that potentially
can have empty values set.
2022-03-01 11:04:47 -08:00
Xuehan Xu
becec6cb6b
correct mrf.newSetReconnected invocation's param order (#14426)
Signed-off-by: xuxuehan <xuxuehan@qianxin.com>
2022-02-28 09:13:19 -08:00
Harshavardhana
bc33db9fc0 update helm v3.5.8 2022-02-26 22:44:38 -08:00
Minio Trusted
7d4579e737 Update yaml files to latest version RELEASE.2022-02-26T02-54-46Z 2022-02-26 03:36:08 +00:00
Harshavardhana
b7c90751b0 allow drive tests to respond only drive paths 2022-02-25 18:54:46 -08:00
Klaus Post
88fd1cba71
select: add MISSING operator support (#14406)
Probably not full support, but for regular checks it should work.

Fixes #14358
2022-02-25 12:31:19 -08:00
Harshavardhana
e43cc316ff
remove errCh usage from HealObjects() simplify it (#14414)
errCh is not needed instead, rely on errs slice to
capture and return errors instead.

most probably fixes #14247
2022-02-25 12:20:41 -08:00
Klaus Post
e3f24a29fa
Upgrade simdjson & compress deps (#14411) 2022-02-25 10:48:41 -08:00
Harshavardhana
890e526bde rename 'mc admin inspect' to 'mc support inspect' 2022-02-24 17:17:53 -08:00
Harshavardhana
16ce455fca update docker release to RELEASE.2022-02-24T22-12-01Z 2022-02-24 15:35:14 -08:00
Harshavardhana
29b7164468 update console update v0.14.8 2022-02-24 14:12:01 -08:00
Harshavardhana
acdd03f609 update CREDITs file for new dependencies 2022-02-24 12:58:53 -08:00
hellivan
03b35ecdd0
collect correct parentUser for OIDC creds auto expiration (#14400) 2022-02-24 11:43:15 -08:00