Commit Graph

4277 Commits

Author SHA1 Message Date
Harshavardhana
2d78e20120
enable CI environment additionally for MINIO_CI_CD (#14395)
all CI/CD environments set CI=true this is enough
for MinIO to be run inside CI environments, support
it.
2022-02-23 16:01:59 -08:00
Harshavardhana
2e6f8bdf19
do not skip healing disks during deletes (#14394)
healing disks take active I/O it is possible
that deleted objects might stay in .trash
folder for a really long time until the drive
is fully healed.

this PR changes it such that we are making sure
we purge the active content written to these
disks as well.
2022-02-23 14:30:46 -08:00
Shireesh Anjal
25144fedd5
Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Krishnan Parthasarathi
27f64dd9a4
Add support for tier-remove and tier-verify (#14382)
* Add tier remove support only if it's empty
* Add support for tier verify
2022-02-23 13:34:25 -08:00
Harshavardhana
9d7648f02f
reduce unnecessary logging during speedtest (#14387)
- speedtest logs calls that were canceled
  spuriously, in situations where it should
  be ignored.

- all errors of interest are always sent back
  to the client there is no need to log them
  on the server console.

- PUT failures should negate the increments
  such that GET is not attempted on unsuccessful
  calls.

- do not attempt MRF on speedtest objects.
2022-02-23 11:59:13 -08:00
Poorna
1ef8babfef
cache: improve error reported for atime check (#14384) 2022-02-23 11:57:06 -08:00
Poorna
4ea7bf0510
Use custom transport for site replication (#14391)
Also, ensure that tiering uses a different instance of custom transport
2022-02-23 11:50:40 -08:00
Anis Elleuch
5dcf1d13a9
ci: Always set disks as non root disks (#14389)
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
2022-02-23 10:11:33 -08:00
Shireesh Anjal
94d37d05e5
Apply dynamic config at sub-system level (#14369)
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.

This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
2022-02-22 10:59:28 -08:00
Harshavardhana
0cbdc458c5
fix: do not reload disk format.json on a reconnected disk (#14351)
An onlineDisk means its a valid disk but it may be a
re-connected disk, this PR verifies that based on LastConn()
to only trigger MRF. Current code would again re-load the
disk 'format.json' which is not necessary and perhaps an
unnecessary call.

A potential side affect of this is closing perfectly online
disks and getting re-replaced by reloading 'format.json'.

This PR tries to avoid this situation by making sure MRF
is triggered but not reloading 'format.json' because of MRF.
2022-02-21 15:51:54 -08:00
Harshavardhana
65b1a4282e
fix: console logger regression with dynamic logger webhook registration (#14346)
fixes a regression from #14289
2022-02-17 17:50:10 -08:00
Harshavardhana
af3dc25dfe
align 32bit integers with atomic values in structs (#14344)
fixes #14341
2022-02-17 15:22:26 -08:00
Krishnan Parthasarathi
5a0c0079a1
Don't add free-version on restore-object (#14340) 2022-02-17 15:05:19 -08:00
Harshavardhana
af8f563ed3
allow clearing FIFO config as fallback (#14338)
FIFO is already removed, for users who upgrade are allowed to clear their configs.
2022-02-17 12:49:46 -08:00
Poorna
93af4a4864
Handle non existent kms key correctly (#14329)
- in PutBucketEncryption API
- admin APIs for  `mc admin KMS key [create|info]`
- PutObject API when invalid KMS key is specified
2022-02-17 11:36:14 -08:00
Shireesh Anjal
28f188e3ef
Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Harshavardhana
d756da41b9 fix: print gateway banner on removal notice 2022-02-16 20:34:47 -08:00
Krishnan Parthasarathi
cdab4a3b85
Update hourly tier-stats only on succesful tiering (#14330) 2022-02-16 17:29:12 -08:00
Klaus Post
b88c57ba93
Add fgprof profiles (#14321)
https://github.com/felixge/fgprof#rocket-fgprof---the-full-go-profiler
2022-02-16 12:00:10 -08:00
Klaus Post
60cd513a33
Fix leaked healing goroutines (#14322)
Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely.

Instead read all errors but return the first non-nil, if any.

The intention appears to be that this should cancel on any error, 
so that part is kept. 

Regression from #13990
2022-02-16 08:40:18 -08:00
Harshavardhana
03a6e8aee2
fix: creating steep directory structure on trash folder (#14314)
weird directory structures get created on the '.trash'
folder upon server restarts, this PR fixes this.
2022-02-15 16:34:03 -08:00
Anis Elleuch
4afbb89774
nas: Clean stale background appended files (#14295)
When more than one gateway reads and writes from the same mount point
and there is a load balancer pointing to those gateways. Each gateway 
will try to create its own temporary append file but fails to clear it later 
when not needed.

This commit creates a routine that checks all upload IDs saved in
multipart directory and remove any stale entry with the same upload id
in the memory and in the temporary background append folder as well.
2022-02-15 09:25:47 -08:00
Klaus Post
5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Anis Elleuch
1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana
fad3d66093
parallelize background cleanup on local disks across sets (#14290) 2022-02-11 14:22:48 -08:00
Poorna
ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Anis Elleuch
71bab74148
Fix adding bucket forwarder handler in server mode (#14288)
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.

Though the federation is deprecated, let's fix this for the time being.
2022-02-10 08:49:36 -08:00
Anis Elleuch
661ea57907
restore: Add quotes some fields in x-amz-restore header (#14281)
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:

```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```

This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
2022-02-09 13:17:41 -08:00
Anis Elleuch
1f18efb0ba
gateway: Active bucket forwarding handler (#14277)
A regression removed support of federation in the gateway mode. 
Enable it again.

Federation is deprecated for a while but let's fix this for the time being.
2022-02-09 09:31:47 -08:00
Daniel
8ae46bce93
fix the error logs have been omitted because of retryCount never exceed 10 (#14268) 2022-02-09 03:14:22 -08:00
Harshavardhana
f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Krishnan Parthasarathi
0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Shireesh Anjal
9890f579f8
Add subsystem level validation on config set (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Anis Elleuch
2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana
3c87e1e60d
fix: rename some function names to avoid confusion (#14262) 2022-02-07 11:49:07 -08:00
Harshavardhana
0cac868a36
speed-up startup time, do not block on ListBuckets() (#14240)
Bonus fixes #13816
2022-02-07 10:39:57 -08:00
Harshavardhana
186c477f3c init console server after server config is initialized
fixes #14259
2022-02-07 00:17:33 -08:00
Harshavardhana
6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana
0256dae657
fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248)
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.

Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.

In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
2022-02-04 02:47:36 -08:00
Harshavardhana
84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana
01e550a9be
ignore unreadable metrics on certain closed systems (#14234)
fixes #14233
2022-02-03 09:45:12 -08:00
Poorna
63a2e0bab6
Remove notification from NotificationSys on bucket deletion (#14236) 2022-02-02 17:11:56 -08:00
Harshavardhana
24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Sidhartha Mani
d7df6bc738
add support for speedtest drive (#14182) 2022-02-01 22:38:05 -08:00
Poorna
a4e1de93a7
Add API for removing site(s) from site replication (#14104) 2022-02-01 17:26:09 -08:00
Klaus Post
067d21d0f2
fs: Retry listing if no marker (#14221)
Retry listings, when no next marker is returned and the result isn't truncated.

This can happen when an object is queued, but no info can be fetched.

Fixes #14190
2022-02-01 10:00:14 -08:00
Shireesh Anjal
3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Anis Elleuch
127e8bf3b6
heal: Avoid printing repetitive error to heal a root disk (#14220)
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.
2022-01-31 17:28:20 -08:00
Harshavardhana
74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
Harshavardhana
dbd05d6e82
remove FIFO bucket quota, use ILM expiration instead (#14206) 2022-01-31 11:07:04 -08:00