Commit Graph

4373 Commits

Author SHA1 Message Date
Harshavardhana
bd6f7b6d83
fix: make decommission restart non-blocking (#14591)
currently an on-going decommission, during a server
restart might block the startup sequence for relatively
longer periods, instead start the decommission in
background lazily.
2022-03-20 14:46:43 -07:00
Andreas Auernhammer
b0a4beb66a
PutObjectPart: set SSE-KMS headers and truncate ETags. (#14578)
This commit fixes two bugs in the `PutObjectPartHandler`.
First, `PutObjectPart` should return SSE-KMS headers
when the object is encrypted using SSE-KMS.
Before, this was not the case.

Second, the ETag should always be a 16 byte hex string,
perhaps followed by a `-X` (where `X` is the number of parts).
However, `PutObjectPart` used to return the encrypted ETag
in case of SSE-KMS. This leaks MinIO internal etag details
through the S3 API.

The combination of both bugs causes clients that use SSE-KMS
to fail when trying to validate the ETag. Since `PutObjectPart`
did not send the SSE-KMS response headers, the response looked
like a plaintext `PutObjectPart` response. Hence, the client
tries to verify that the ETag is the content-md5 of the part.
This could never be the case, since MinIO used to return the
encrypted ETag.

Therefore, clients behaving as specified by the S3 protocol
tried to verify the ETag in a situation they should not.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-19 10:15:12 -07:00
Harshavardhana
01ee49045e
fix: handle race in server setup global CI/CD variable (#14579) 2022-03-18 18:21:09 -07:00
Harshavardhana
7bd9f821dd
return correct context errors for locking operations (#14569)
if a context is canceled do not need to return a timeout error
instead, return the appropriate error for context canceled.
2022-03-18 15:32:45 -07:00
Klaus Post
61eb9d4e29
Fix listing fallback re-using disks (#14576)
When more than 2 disks are unavailable for listing, the same disk will be used for fallback.

This makes quorum calculations incorrect since the same disk will have multiple entries.

This PR keeps track of which fallback disks have been handed out and only every returns a disk once.
2022-03-18 11:35:27 -07:00
Harshavardhana
43eb5a001c
re-use transport for AdminInfo() call (#14571)
avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.

Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.

Remove it everywhere.
2022-03-17 16:20:10 -07:00
Klaus Post
c1760fb764
Move apiCalls to front for field alignment (#14568)
Fixes #14565
2022-03-17 10:57:52 -07:00
Minio Trusted
ffcadcd99e Revert "Use S3 client for uplooads/downloads during perf test (#14553)"
This reverts commit ff811f594b.

Speedtest is broken need to fix this more cleanly.
2022-03-16 23:34:49 -07:00
Krishnan Parthasarathi
7b81967a3c
Fix handling of object versions pending purge (#14555)
- GetObject() with vid should return 405
- GetObject() without vid should return 404
- ListObjects() should ignore this object if this is the "latest" version of the object
- ListObjectVersions() should list this object as "DELETE marker"
- Remove data parts before sync'ing the version pending purge
2022-03-16 16:59:43 -07:00
Krishna Srinivas
ff811f594b
Use S3 client for uplooads/downloads during perf test (#14553) 2022-03-16 16:58:46 -07:00
Harshavardhana
e3071157f0
allow MakeBucketLocation to work for metaBucket (#14548)
decommission would fail to start due to failure
in MakeBucketLocation() error on .minio.sys/ bucket
creation.

Allow these special buckets.
2022-03-14 11:25:24 -07:00
Klaus Post
c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Harshavardhana
9c846106fa
decouple service accounts from root credentials (#14534)
changing root credentials makes service accounts
in-operable, this PR changes the way sessionToken
is generated for service accounts.

It changes service account behavior to generate
sessionToken claims from its own secret instead
of using global root credential.

Existing credentials will be supported by
falling back to verify using root credential.

fixes #14530
2022-03-14 09:09:22 -07:00
Harshavardhana
cf94d1f1f1
do not crash readXLMetaNoData - if the xl.meta has incorrect content (#14538)
```
tmp = buf[want:]
```

Would potentially crash when `buf` is truncated for some reason
and does not have the expected bytes, this is of course considered
not normal and is an odd situation. But we do not need to crash
here instead allow for errors to be returned and let callers handle
the errors.
2022-03-14 09:07:46 -07:00
Poorna
f8d6eaaa96
fix: regression from range GET proxy on replicated buckets #14345 (#14532)
Fixes: #14531
2022-03-11 15:56:49 -08:00
Poorna
75b925c326
Deprecate root disk for disk caching (#14527)
This PR modifies #14513 to issue a deprecation
warning rather than reject settings on startup.
2022-03-10 18:42:44 -08:00
Harshavardhana
91d419ee6c
warn issues about large block I/O performance for Linux older than 4.0.0 (#14524)
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.

The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
2022-03-10 17:36:13 -08:00
Harshavardhana
41079f1015
heal: remove blocking healDiskMeta upon startup (#14514)
This type of code is not necessary, read's of all
metadata content at `.minio.sys/config` automatically
triggers healing when necessary in the GetObjectNInfo()
call-path.

Having this code is not useful and this also adds to
the overall startup time of MinIO when there are lots
of users and policies.
2022-03-10 02:45:14 -08:00
Poorna
712dfa40cd
Add missing site replication hook for clearing sse config (#14512) 2022-03-10 00:04:34 -08:00
Klaus Post
b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Poorna
46ba15ab03
Return MethodNotAllowed if force del on replicated bucket (#14505) 2022-03-08 14:28:51 -08:00
Poorna
1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Krishnan Parthasarathi
80ef1ae51c
Simplify assembling of tierStats from data-usage (#14504) 2022-03-08 12:08:29 -08:00
Krishna Srinivas
4d0715d226
Implement netperf for "mc support perf net" (#14397)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-03-08 09:54:38 -08:00
Klaus Post
8a274169da
heal: Fix first entry on dangling (#14495)
Instead of the first, the last entry was returned
pointerizing the range value.
2022-03-08 09:04:20 -08:00
Harshavardhana
5d6f6d8d5b
create missing .minio.sys/config, .minio.sys/buckets during decommission (#14497) 2022-03-07 16:18:57 -08:00
Anis Elleuch
bacf6156c1
metrics: Avoid crash when fetching tier metrics (#14493)
Data usage does not always contain tiering info even if the data usage
information is valid. Avoid a crash in that case.

(e.g. the scanner scanned the namespace, the user enables tiering,
prometheus scrapes the server before the scanner gets a chance to
update the data usage with new tiering information)
2022-03-07 10:59:32 -08:00
Klaus Post
1d1b213f1f
scanner: Consider preselection bias when selecting for Healing (#14492)
Healing decisions would align with skipped folder counters. This can lead to files 
never being selected for heal checks on "clean" paths.

Use different hashing methods and take objectHealProbDiv into account when 
calculating the cycle.

Found by @vadmeste
2022-03-07 09:25:53 -08:00
Harshavardhana
92a77cc78e
update pkg v1.1.20 to reload certs in k8s always (#14470) 2022-03-04 20:34:39 -08:00
Harshavardhana
b0c84e3de7
fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483)
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.

This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
2022-03-04 20:01:26 -08:00
Anis Elleuch
bbc914e174
heal: Do not override heal scan mode mode if it is set (#14476)
mc admin heal has --scan=deep flag which enforces bitrot checking 
when doing the healing.

Do not force override an existing heal scan option.
2022-03-04 18:25:06 -08:00
Anis Elleuch
3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana
66afa16aed
canceled PUTs throw frivolous logs (#14475)
remote drives might throw frivolous logs,
if the caller canceled the PUT operation
in such scenarios there is no reason to log.
2022-03-04 10:31:33 -08:00
Harshavardhana
0e3bafcc54
improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Andreas Auernhammer
b48f719b8e
kes: remove unnecessary error conversion (#14459)
This commit removes some duplicate code that
converts KES API errors.

This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.

See: 21555fa624/error.go (L94)

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-03 09:42:37 -08:00
Lenin Alevski
289fcbd08c
KES dependency upgrade (#14454)
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation

Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-03-02 23:03:40 -08:00
Harshavardhana
7e803adf13
do not attempt force delete on bucket (#14452)
caller needs to ask explicitly for force delete
otherwise, the force delete might end up deleting
an existing bucket with data.

fixes #14445
2022-03-02 20:47:53 -08:00
Anis Elleuch
4a15bd8ff8
Return info for DiskInfo when the disk is unformatted (#14427)
In a distributed setup, a DiskInfo REST call to an unformatted disk
returns an error with no disk information, such as the disk endpoint
URL, which is unexpected.
2022-03-01 15:06:47 -08:00
Klaus Post
b030ef1aca
tests: Clean up dsync package (#14415)
Add non-constant timeouts to dsync package.

Reduce test runtime by minutes. Hopefully not too aggressive.
2022-03-01 11:14:28 -08:00
Harshavardhana
cc46a99f97
skip object-lock headers without values (#14430)
metadata headers can have headers without values
as per AWS S3 spec however, we need to skip some
headers that do not have values that potentially
can have empty values set.
2022-03-01 11:04:47 -08:00
Xuehan Xu
becec6cb6b
correct mrf.newSetReconnected invocation's param order (#14426)
Signed-off-by: xuxuehan <xuxuehan@qianxin.com>
2022-02-28 09:13:19 -08:00
Harshavardhana
b7c90751b0 allow drive tests to respond only drive paths 2022-02-25 18:54:46 -08:00
Harshavardhana
e43cc316ff
remove errCh usage from HealObjects() simplify it (#14414)
errCh is not needed instead, rely on errs slice to
capture and return errors instead.

most probably fixes #14247
2022-02-25 12:20:41 -08:00
hellivan
03b35ecdd0
collect correct parentUser for OIDC creds auto expiration (#14400) 2022-02-24 11:43:15 -08:00
Harshavardhana
c08540c7b7
reject speedtest when there isn't enough disk space available (#14402)
small setups do not return appropriate errors when speedtest
cannot run on small tiny setups, allow the tests to fail
appropriately more pro-actively.

many users bring toy setups, this PR simply returns an error
in such situations.
2022-02-24 09:06:18 -08:00
Shireesh Anjal
3934700a08
Make audit webhook and kafka config dynamic (#14390) 2022-02-24 09:05:33 -08:00
Harshavardhana
2d78e20120
enable CI environment additionally for MINIO_CI_CD (#14395)
all CI/CD environments set CI=true this is enough
for MinIO to be run inside CI environments, support
it.
2022-02-23 16:01:59 -08:00
Harshavardhana
2e6f8bdf19
do not skip healing disks during deletes (#14394)
healing disks take active I/O it is possible
that deleted objects might stay in .trash
folder for a really long time until the drive
is fully healed.

this PR changes it such that we are making sure
we purge the active content written to these
disks as well.
2022-02-23 14:30:46 -08:00
Shireesh Anjal
25144fedd5
Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Krishnan Parthasarathi
27f64dd9a4
Add support for tier-remove and tier-verify (#14382)
* Add tier remove support only if it's empty
* Add support for tier verify
2022-02-23 13:34:25 -08:00
Harshavardhana
9d7648f02f
reduce unnecessary logging during speedtest (#14387)
- speedtest logs calls that were canceled
  spuriously, in situations where it should
  be ignored.

- all errors of interest are always sent back
  to the client there is no need to log them
  on the server console.

- PUT failures should negate the increments
  such that GET is not attempted on unsuccessful
  calls.

- do not attempt MRF on speedtest objects.
2022-02-23 11:59:13 -08:00
Poorna
1ef8babfef
cache: improve error reported for atime check (#14384) 2022-02-23 11:57:06 -08:00
Poorna
4ea7bf0510
Use custom transport for site replication (#14391)
Also, ensure that tiering uses a different instance of custom transport
2022-02-23 11:50:40 -08:00
Anis Elleuch
5dcf1d13a9
ci: Always set disks as non root disks (#14389)
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
2022-02-23 10:11:33 -08:00
Shireesh Anjal
94d37d05e5
Apply dynamic config at sub-system level (#14369)
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.

This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
2022-02-22 10:59:28 -08:00
Harshavardhana
0cbdc458c5
fix: do not reload disk format.json on a reconnected disk (#14351)
An onlineDisk means its a valid disk but it may be a
re-connected disk, this PR verifies that based on LastConn()
to only trigger MRF. Current code would again re-load the
disk 'format.json' which is not necessary and perhaps an
unnecessary call.

A potential side affect of this is closing perfectly online
disks and getting re-replaced by reloading 'format.json'.

This PR tries to avoid this situation by making sure MRF
is triggered but not reloading 'format.json' because of MRF.
2022-02-21 15:51:54 -08:00
Harshavardhana
65b1a4282e
fix: console logger regression with dynamic logger webhook registration (#14346)
fixes a regression from #14289
2022-02-17 17:50:10 -08:00
Harshavardhana
af3dc25dfe
align 32bit integers with atomic values in structs (#14344)
fixes #14341
2022-02-17 15:22:26 -08:00
Krishnan Parthasarathi
5a0c0079a1
Don't add free-version on restore-object (#14340) 2022-02-17 15:05:19 -08:00
Harshavardhana
af8f563ed3
allow clearing FIFO config as fallback (#14338)
FIFO is already removed, for users who upgrade are allowed to clear their configs.
2022-02-17 12:49:46 -08:00
Poorna
93af4a4864
Handle non existent kms key correctly (#14329)
- in PutBucketEncryption API
- admin APIs for  `mc admin KMS key [create|info]`
- PutObject API when invalid KMS key is specified
2022-02-17 11:36:14 -08:00
Shireesh Anjal
28f188e3ef
Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Harshavardhana
d756da41b9 fix: print gateway banner on removal notice 2022-02-16 20:34:47 -08:00
Krishnan Parthasarathi
cdab4a3b85
Update hourly tier-stats only on succesful tiering (#14330) 2022-02-16 17:29:12 -08:00
Klaus Post
b88c57ba93
Add fgprof profiles (#14321)
https://github.com/felixge/fgprof#rocket-fgprof---the-full-go-profiler
2022-02-16 12:00:10 -08:00
Klaus Post
60cd513a33
Fix leaked healing goroutines (#14322)
Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely.

Instead read all errors but return the first non-nil, if any.

The intention appears to be that this should cancel on any error, 
so that part is kept. 

Regression from #13990
2022-02-16 08:40:18 -08:00
Harshavardhana
03a6e8aee2
fix: creating steep directory structure on trash folder (#14314)
weird directory structures get created on the '.trash'
folder upon server restarts, this PR fixes this.
2022-02-15 16:34:03 -08:00
Anis Elleuch
4afbb89774
nas: Clean stale background appended files (#14295)
When more than one gateway reads and writes from the same mount point
and there is a load balancer pointing to those gateways. Each gateway 
will try to create its own temporary append file but fails to clear it later 
when not needed.

This commit creates a routine that checks all upload IDs saved in
multipart directory and remove any stale entry with the same upload id
in the memory and in the temporary background append folder as well.
2022-02-15 09:25:47 -08:00
Klaus Post
5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Anis Elleuch
1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana
fad3d66093
parallelize background cleanup on local disks across sets (#14290) 2022-02-11 14:22:48 -08:00
Poorna
ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Anis Elleuch
71bab74148
Fix adding bucket forwarder handler in server mode (#14288)
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.

Though the federation is deprecated, let's fix this for the time being.
2022-02-10 08:49:36 -08:00
Anis Elleuch
661ea57907
restore: Add quotes some fields in x-amz-restore header (#14281)
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:

```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```

This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
2022-02-09 13:17:41 -08:00
Anis Elleuch
1f18efb0ba
gateway: Active bucket forwarding handler (#14277)
A regression removed support of federation in the gateway mode. 
Enable it again.

Federation is deprecated for a while but let's fix this for the time being.
2022-02-09 09:31:47 -08:00
Daniel
8ae46bce93
fix the error logs have been omitted because of retryCount never exceed 10 (#14268) 2022-02-09 03:14:22 -08:00
Harshavardhana
f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Krishnan Parthasarathi
0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Shireesh Anjal
9890f579f8
Add subsystem level validation on config set (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Anis Elleuch
2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana
3c87e1e60d
fix: rename some function names to avoid confusion (#14262) 2022-02-07 11:49:07 -08:00
Harshavardhana
0cac868a36
speed-up startup time, do not block on ListBuckets() (#14240)
Bonus fixes #13816
2022-02-07 10:39:57 -08:00
Harshavardhana
186c477f3c init console server after server config is initialized
fixes #14259
2022-02-07 00:17:33 -08:00
Harshavardhana
6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana
0256dae657
fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248)
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.

Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.

In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
2022-02-04 02:47:36 -08:00
Harshavardhana
84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana
01e550a9be
ignore unreadable metrics on certain closed systems (#14234)
fixes #14233
2022-02-03 09:45:12 -08:00
Poorna
63a2e0bab6
Remove notification from NotificationSys on bucket deletion (#14236) 2022-02-02 17:11:56 -08:00
Harshavardhana
24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Sidhartha Mani
d7df6bc738
add support for speedtest drive (#14182) 2022-02-01 22:38:05 -08:00
Poorna
a4e1de93a7
Add API for removing site(s) from site replication (#14104) 2022-02-01 17:26:09 -08:00
Klaus Post
067d21d0f2
fs: Retry listing if no marker (#14221)
Retry listings, when no next marker is returned and the result isn't truncated.

This can happen when an object is queued, but no info can be fetched.

Fixes #14190
2022-02-01 10:00:14 -08:00
Shireesh Anjal
3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Anis Elleuch
127e8bf3b6
heal: Avoid printing repetitive error to heal a root disk (#14220)
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.
2022-01-31 17:28:20 -08:00
Harshavardhana
74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
Harshavardhana
dbd05d6e82
remove FIFO bucket quota, use ILM expiration instead (#14206) 2022-01-31 11:07:04 -08:00
Harshavardhana
b5d35c7e09
ignore disk metrics for single drive mode (#14212)
fixes #14211
2022-01-31 00:44:26 -08:00
Poorna
0f88cdc80e
Return all stats in SiteReplicationStatus API if options unset (#14207) 2022-01-28 21:19:38 -08:00
Poorna
38e3c7a8f7
Added filters for SiteReplicationStatus API to support new UI changes (#14177) 2022-01-28 15:37:55 -08:00
Poorna
a4be47d7ad
Validate config before saving changes after config reset (#14203) 2022-01-27 18:28:16 -08:00