Commit Graph

5735 Commits

Author SHA1 Message Date
Harshavardhana dcce83b288
avoid rebalance state for getObjectTags if any (#18197)
fixes #18190
2023-10-09 23:56:26 -07:00
Matthew Toohey f731e7ea36
Fix current_send_in_progress metric always being zero (#18160) 2023-10-09 17:28:17 -07:00
Maxim Tkachenko ec30bb89a4
simplify channel send() in WalkDir() (#18186) 2023-10-09 17:27:55 -07:00
Klaus Post 7cd08594f6
Use better host names for metric errors (#18188)
Typically hosts would end up like this:

```
   "hosts": [
        ":9000",
        ":9000",
        ":9000",
...
```

Also add host name to errors.
2023-10-09 17:27:11 -07:00
Aditya Manthramurthy 2b4531f069
fix: O_DIRECT is on only for multi-disk setups (#18194)
Disable it for single disk/unsupported platforms
2023-10-09 17:08:40 -07:00
Harshavardhana 11544a62aa
fix: upon write failure on disk journal close the file properly (#18183)
close the file properly before dereferencing *os.File,
this can silently leak fd's in rare cases.

This PR fixes this properly.
2023-10-08 12:17:08 -07:00
Taran Pelkey 18550387d5
fix: DeleteServiceAccount API behavior (#18163) 2023-10-08 12:13:18 -07:00
Klaus Post 0de2b9a1b2
Fix panic on double unfreezeServices (#18177)
Calling unfreezeServices twice results in panic:

```
panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel
goroutine 14703 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
	github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e
panic({0x27c3020, 0x4c9b370})
	runtime/panic.go:884 +0x212
github.com/minio/minio/cmd.unfreezeServices()
	github.com/minio/minio/cmd/service.go:112 +0xc7
github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00)
	github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a
net/http.HandlerFunc.ServeHTTP(...)
```

If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic.

Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).
2023-10-06 07:51:50 -06:00
Poorna 9dc29d7687
Avoid ILM expiry on deleted versions that are yet to replicate (#18175)
Fixes #18167
2023-10-06 06:55:15 -06:00
Poorna 72871dbb9a
delete replication: avoid overwriting replication decision (#18174)
from ObjectInfo unless version purge status is present. Otherwise
there is potential to make incorrect replication decision if Stat
returned an error
2023-10-05 21:09:45 -06:00
Aditya Manthramurthy 4bda4e4e2b
fix: check for disk-level O_DIRECT support (#18173)
Disk level O_DIRECT support checking at xl storage initialization was
conditional on a config setting being enabled. (This never took effect
because config initialization happens after ObjectLayer is ready.) This
is not necessary as the config setting is dynamic - O_DIRECT should be
enabled via runtime config. So we need to do the disk level support
check regardless of the config setting.
2023-10-05 20:54:49 -06:00
Harshavardhana 1971c54a50
update buffer channels for both trace and listen events (#18171)
- Trace needs higher buffered channels than 4000 to ensure
  when we run `mc admin trace -a` it captures all information
  sufficiently.

- Listen event notification needs the event channel to be
  `apiRequestsMaxPerNode` * number of nodes
2023-10-05 18:16:04 -06:00
Anis Eleuch b336e9a79f
fix: loading usage cache to not fail early when reading the backup fails (#18158)
Currently, the retry is not fully used when there is no backup copy of
the data usage; use 5 retry attempts when we don't have any valid data, 
new or backup, unless we have seen an un-recognized error.
2023-10-02 19:22:35 -07:00
Harshavardhana a2ab21e91c
add max-keys=2 optimization for spark workloads (#18154)
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.

this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
2023-10-02 07:52:59 -06:00
Sveinn 603437e70f
Fix startup formatting (#18156)
Percentages in root user names are used for formatting.

Before:
```
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:51915 http://172.31.96.1:51915 http://127.0.0.1:51915
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
FORMAT: %117s MESSAGE: $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%!a(MISSING)vZb" "Q4#Q6y8G%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

After:

```
Status:         1 Online, 0 Offline.
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:52421 http://172.31.96.1:52421 http://127.0.0.1:52421
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

No need for special Windows case. `mc` works just fine.
2023-10-02 07:39:47 -06:00
Shireesh Anjal 6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Anis Eleuch 22d2dbc4e6
decom: Fix infinite retry when the decom is canceled (#18143)
Also, use rand.Float64() since it is thread-safe; otherwise go race
will complain.
2023-09-30 00:02:29 -07:00
Harshavardhana d6446cb096
do not return an error in AbortMultipartUpload() (#18135)
returning an error is a bit undefined in AWS S3
as it may return an error or not depending on the
time from AbortMultipartUpload().
2023-09-29 10:28:19 -07:00
Harshavardhana c34bdc33fb
make sure to set Versioned field to ensure rename2 is not called (#18141)
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.

Added healing code to ensure recovery of such content.
2023-09-29 09:08:24 -07:00
Anis Eleuch aec023f537
Avoid showing buckets without quorum in each pool (#18125) 2023-09-29 00:58:54 -07:00
Poorna e101eeeda9
fix: tier addition validation (#18136) 2023-09-28 22:33:24 -07:00
Harshavardhana 3c470a6b8b
fix: the inspect script to use scheme per deployment (#18118) 2023-09-27 08:22:50 -07:00
Poorna 6bc7d711b3
delete of a missing versionId return 204 (#18117) 2023-09-26 14:02:56 -07:00
Harshavardhana cdeab19673
fix: always check error upon w.Close() in Write() (#18111)
not checking w.Close() can prematurely make us
think that the w.Write() actually succeeded, apparently
Write() may or may not return an error but sometimes
only during a Close() call to the fd we may see the
error from Write() propagate.

Fdatasync(w) on the FD would return an error requiring
Close() error handling is less of a concern, however it may
happen such that fdatasync() did not return an error, where
as Close() would.
2023-09-26 11:04:00 -07:00
Anis Eleuch 22ee678136
tier: Avoid doing versioned operations since not required anymore (#18108)
Currently, setting a new tiering target returns an error when a bucket
is versioned and the tiering credentials does not have authorization to
specify a version-id when reading or removing a specific version;

Since tiering does not require versioning anymore; avoid doing versioned
operations when performing checklist ops while adding a new tiering
configuration.
2023-09-26 00:14:56 -07:00
Poorna 50a8f13e85
site replication: allow setting bandwidth default for bucket (#18062)
This can still be overridden at the bucket level
2023-09-25 15:50:52 -07:00
jiuker 6dec60b6e6
fix: check post policy like AWS S3 (#18074) 2023-09-25 12:35:25 -07:00
Harshavardhana ac3a19138a
fix: set scanning details locally to avoid cached values (#18092)
atomic variable results such as scanning must not use
cached values, instead rely on real-time information.
2023-09-25 08:26:29 -07:00
Klaus Post 21e8e071d7
Improve ListObject Compatibility (#18099)
Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after.

Fixes #18093
2023-09-25 08:13:08 -07:00
Klaus Post 57f84a8b4c
Add abandoned folder scanning to metrics (#18076)
Include object and versions heal scan times when checking non-empty abandoned folders.

Furthermore don't add delay between healing versions, instead do one per object wait.
2023-09-24 22:15:31 -07:00
Aditya Manthramurthy 22041bbcc4
fix: Update policy mapping properly in notification (#18088)
This is fixing a regression from an earlier change where STS account
loading was made lazy.
2023-09-22 20:47:50 -07:00
Harshavardhana 91ebac0a00
fix: move abandoned parts check after healing not in ILM path (#18087) 2023-09-22 12:07:52 -07:00
Harshavardhana 3a90fb108c only look for metadata if batch replication asks for metadata filters (#18082)
This PR changes the StatObject() to be must have for non-minio source
to being a conditional API call.

- Calls StatObject() when needed
- Calls GetObjectTagging() when needed

These calls if we do without these conditionals can cause a lot of
delays, so we avoid them if not needed in more common scenario.
2023-09-22 11:31:57 -07:00
Shubhendu 74cfb207c1
Added check for mandatory MINIO_KMS_KES_KEY_NAME env var (#18077)
If MinIO started with KMS enabled, MINIO_KMS_KES_KEY_NAME should
be set for server to start.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-21 10:37:37 -07:00
Harshavardhana 9788d85ea3
remove logging for invalid metadata values (#18068) 2023-09-20 15:49:55 -07:00
Anis Eleuch 69c0e18685
perf net: Add the endpoint name related to the perf net error (#18063)
In a perf test, one node will run speed test with all nodes. If there is
an error with a peer node, the peer node name is not included in the
error hence confusing the user.

This commit will add the peer endpoint string to the netperf error.
2023-09-19 22:41:06 -07:00
Aditya Manthramurthy 3cac927348
Load STS policy mappings periodically (#18061)
To ensure that policy mappings are current for service accounts
belonging to (non-derived) STS accounts (like an LDAP user's service
account) we periodically reload such mappings.

This is primarily to handle a case where a policy mapping update
notification is missed by a minio node. Such a node would continue to
have the stale mapping in memory because STS creds/mappings were never
periodically scanned from storage.
2023-09-19 17:57:42 -07:00
Harshavardhana 9081346c40 fix: more regressions listing policy mappings (#18060)
also relax ListServiceAccounts() returning error if
no service accounts exist.
2023-09-19 15:23:18 -07:00
Harshavardhana fcfadb0e51
fix: regression in loading LDAP users policy mappings (#18055)
LDAP users are stored as STS users, we need to load
their policy mappings appropriately.

Fixes a regression caused by #17994
2023-09-19 10:31:56 -07:00
Harshavardhana 2add57cfed
apply healing per object at 1024 cycles (#18050)
- we already have MRF for most recent failures
- we trigger healing during HEAD/GET operation

These are enough, also change the default max wait
from 5sec to 1sec for default scanner speed.
2023-09-19 09:24:22 -07:00
Poorna b73699fad8
replication: pass user tags while queueing (#18052)
Continues from #18032 - otherwise replication will fail on tag based rules.
2023-09-19 03:18:28 -07:00
Harshavardhana b8ebe54e53 Revert "skip tiered objects to GLACIER in batch replication (#18044)"
This reverts commit fd421ddd6f.

MinIO already provides `filter` based on metadata that would work
in this scenario already.
2023-09-19 00:05:40 -07:00
Harshavardhana c3d70e0795
cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051)
AccountInfo is quite frequently called by the Console UI 
login attempts, when many users are logging in it is important
that we provide them with better responsiveness.

- ListBuckets information is cached every second
- Bucket usage info is cached for up to 10 seconds
- Prefix usage (optional) info is cached for up to 10 secs

Failure to update after cache expiration, would still
allow login which would end up providing information
previously cached.

This allows for seamless responsiveness for the Console UI
logins, and overall responsiveness on a heavily loaded
system.
2023-09-18 22:13:03 -07:00
Harshavardhana fd421ddd6f
skip tiered objects to GLACIER in batch replication (#18044)
tiered objects to GLACIER are not readable until
they are restored, we skip these as unreadable
2023-09-18 10:25:31 -07:00
jiuker 9947c01c8e
feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Eng Zer Jun a00db4267c
data-usage-cache: remove redundant nil check (#17970)
From the Go specification:

  "3. If the map is nil, the number of iterations is 0." [1]

Therefore, an additional nil check for before the loop is unnecessary.

[1]: https://go.dev/ref/spec#For_range

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-09-16 19:09:29 -07:00
Harshavardhana 36385010f5
use optimized pathJoin instead of path.Join (#18042)
this avoids allocations in scanner routine, they are tiny but 
they allocate a lot over many cycles of the scanner.
2023-09-16 19:08:59 -07:00
Harshavardhana fa6d082bfd
reduce all major allocations in replication path (#18032)
- remove targetClient for passing around via replicationObjectInfo{}
- remove cloing to object info unnecessarily
- remove objectInfo from replicationObjectInfo{} (only require necessary fields)
2023-09-16 02:28:06 -07:00
Poorna b733e6e83c
site replication turn off retry login for admin API calls (#18039)
additionally also mark site offline if n/w is down
2023-09-15 18:01:47 -07:00
Anis Eleuch 37aa5934a1
scanner: Fix loading data usage cache structure (#18037)
Return an empty data usage cache structure when the data usage cache
file does not exist, otherwise, the scanner won't work.
2023-09-15 13:11:08 -07:00
Harshavardhana 1647fc7edc
fix: optimize listMultipartUploads to serve via local disks (#18034)
and remove unused getLoadBalancedDisks()
2023-09-15 08:34:03 -07:00
Harshavardhana 7b92687397
remove generating presignedURLs with range header for lambda (#18033) 2023-09-14 21:58:17 -07:00
Alex dc48cd841a
Added MINIO_PROMETHEUS_AUTH_TOKEN env support (#18028)
Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>
2023-09-14 17:28:21 -07:00
Anis Eleuch b0e1776d6d
Do not use a chain for S3 tiering to return better error messages (#18030)
When using a chain provider all providers do not return a valid
access and secret key, an anonymous request is sent, which makes it hard
for users to figure out what is going on

In the case of S3 tiering, when AWS IAM temporary account generation returns
an error, an anonymous login will be used because of the chain provider.
Avoid this and use the AWS IAM provider directly to get a good error
message.
2023-09-14 15:28:20 -07:00
Aditya Manthramurthy 7a7068ee47
Move IAM periodic ops to a single go routine (#18026)
This helps reduce disk operations as these periodic routines would not
run concurrently any more.

Also add expired STS purging periodic operation: Since we do not scan
the on-disk STS credentials (and instead only load them on-demand) a
separate routine is needed to purge expired credentials from storage.
Currently this runs about a quarter as often as IAM refresh.

Also fix a bug where with etcd, STS accounts could get loaded into the
iamUsersMap instead of the iamSTSAccountsMap.
2023-09-14 15:25:17 -07:00
Aditya Manthramurthy cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Harshavardhana a2aabfabd9
add backups for usage-caches to rely on upon error (#18029)
This allows scanner to avoid lengthy scans, skip
things appropriately and also not lose metrics in
any manner.

reduce longer deadlines for usage-cache loads/saves
to match the disk timeout which is 2minutes now per
IOP.
2023-09-14 11:53:52 -07:00
Harshavardhana 32890342ce
introduce MINIO_BROWSER_REDIRECT env to enable/disable auto-redirect (#18025) 2023-09-13 18:43:57 -07:00
Aditya Manthramurthy ed2c2a285f
Load STS accounts into IAM cache lazily (#17994)
In situations with large number of STS credentials on disk, IAM load
time is high. To mitigate this, STS accounts will now be loaded into
memory only on demand - i.e. when the credential is used.

In each IAM cache (re)load we skip loading STS credentials and STS
policy mappings into memory. Since STS accounts only expire and cannot
be deleted, there is no risk of invalid credentials being reused,
because credential validity is checked when it is used.
2023-09-13 12:43:46 -07:00
Poorna 18e23bafd9
replication resync: report only the on-disk status (#18017)
Avoid reporting in-memory status since results can vary if different
nodes are queried, resync always runs at a single node.
2023-09-13 10:58:38 -07:00
Harshavardhana 8b8be2695f
optimize mkdir calls to avoid base-dir `Mkdir` attempts (#18021)
Currently we have IOPs of these patterns

```
[OS] os.Mkdir play.min.io:9000 /disk1 2.718µs
[OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs
```

It can be seen that we can save quite Nx levels such as
if your drive is mounted at `/disk1/minio` you can simply
skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`.

Since they are expected to exist already, this PR adds a way
for us to ignore all paths upto the mount or a directory which
ever has been provided to MinIO setup.
2023-09-13 08:14:36 -07:00
Poorna 96fbf18201
replication: queue existing objects to same workers as incoming (#18020)
Previously existing objects were queued to single worker and MRF re-queues
are also handled by same worker - this does not fully use the available
bandwidth in case there is no incoming workload.
2023-09-12 21:59:15 -07:00
Harshavardhana c8a57a8fa2
fix: send content-md5 for AWS S3 proactively (#18018)
fixes #17977
2023-09-12 19:11:13 -07:00
Harshavardhana b1c2dacab3
fix: allow dynamic ports for API only in non-distributed setups (#18019)
fixes #17998
2023-09-12 19:10:49 -07:00
Harshavardhana 08b3a466e8
fix: allow concurrent SFTP connections (#18013)
current implementation did not fully implement
the concurrent SFTP connection implementation,
this PR properly handles this.

fixes #17914
2023-09-12 12:41:52 -07:00
Harshavardhana 1df5e31706
optimize MRF replication queue to avoid memory leaks (#18007) 2023-09-11 20:59:11 -07:00
Harshavardhana 9f7044aed0
fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Anis Eleuch 41de53996b
heal: calculate the number of workers based on NRRequests (#17945) 2023-09-11 14:48:54 -07:00
Harshavardhana 9878031cfd
fix: change DISK_ to DRIVE_ for some drive related envs (#18005) 2023-09-11 12:19:22 -07:00
Poorna 703ed46d79
fix: replication of tags while removing (#17989)
A tag removal was not being replicated prior to this change
2023-09-06 19:05:02 -07:00
Harshavardhana f7ca6c63c2
fix: bucket quota clear and honor existing quota config (#17988) 2023-09-06 19:03:58 -07:00
Harshavardhana ad69b9907f
fix: report bucket metrics for only existing buckets (#17987) 2023-09-06 12:50:46 -07:00
Shubhendu bfddbb8b40
Embed file in ZIP with custom permissions (#17954)
This change enables embedding files in ZIP with custom permissions.
Also uses default creds for starting MinIO based on inspect data.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-06 09:24:01 -07:00
Poorna 13a2dc8485
replication resync: avoid blocking on results channel. (#17981)
continues fix in #17775
2023-09-05 20:22:39 -07:00
Harshavardhana 1e51424e8a
use syscall.Rename() directly instead of os.Rename() (#17982) 2023-09-05 20:22:23 -07:00
Harshavardhana 5b114b43f7
refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Poorna 812f5a02d7
metrics: fix panic in replication stats reporting (#17979) 2023-09-05 10:26:18 -07:00
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi 71c32e9b48
Return successorModTime in quorum when available (#17925) 2023-09-04 08:24:17 -07:00
Harshavardhana 380a59520b add missing testdata for benchmarking 2023-09-02 14:40:38 -07:00
Harshavardhana 3995355150
avoid repeated large allocations for large parts (#17968)
objects with 10,000 parts and many of them can
cause a large memory spike which can potentially
lead to OOM due to lack of GC.

with previous PR reducing the memory usage significantly
in #17963, this PR reduces this further by 80% under
repeated calls.

Scanner sub-system has no use for the slice of Parts(),
it is better left empty.

```
benchmark                            old ns/op     new ns/op     delta
BenchmarkToFileInfo/ToFileInfo-8     295658        188143        -36.36%

benchmark                            old allocs     new allocs     delta
BenchmarkToFileInfo/ToFileInfo-8     61             60             -1.64%

benchmark                            old bytes     new bytes     delta
BenchmarkToFileInfo/ToFileInfo-8     1097210       227255        -79.29%
```
2023-09-02 07:49:24 -07:00
Harshavardhana 8208bcb896
remove all unnecessary logging, logOnce when absolutely needed (#17965) 2023-09-01 16:19:18 -07:00
Poorna d665e855de
replication: remove check for empty version id (#17964) 2023-09-01 13:46:10 -07:00
Harshavardhana 18b3655c99
with xlv2 format we never had to fill in checksumInfo() (#17963)
- this PR avoids sending a large ChecksumInfo slice
  when its not needed

- also for a file with XLV2 format there is no reason
  to allocate Checksum slice while reading
2023-09-01 13:45:58 -07:00
Anis Eleuch 6a8d8f34a5
kafka: Do not require key when sending a message (#17962)
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;

This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
2023-09-01 08:37:22 -07:00
Harshavardhana b1c1f02132
use buffers for pathJoin, to re-use buffers. (#17960)
```
benchmark                        old ns/op     new ns/op     delta
BenchmarkPathJoin/PathJoin-8     79.6          55.3          -30.53%

benchmark                        old allocs     new allocs     delta
BenchmarkPathJoin/PathJoin-8     2              1              -50.00%

benchmark                        old bytes     new bytes     delta
BenchmarkPathJoin/PathJoin-8     48            24            -50.00%
```
2023-08-31 17:58:48 -07:00
yangw b13fcaf666
fix: read atomic variable in clientDevNull round trip time (#17955) 2023-08-31 08:31:01 -07:00
Harshavardhana 9458485e43
avoid double logging from healing (#17950) 2023-08-30 18:46:04 -07:00
Poorna b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Krishnan Parthasarathi 6a67c277eb
Reuse types for key-value, notification and retry (#17936) 2023-08-29 11:27:23 -07:00
Harshavardhana 7cafdc0512
fix: skip access checks further for known buckets (#17934) 2023-08-28 15:16:41 -07:00
Harshavardhana 8a57b6bced
use renameat2 Linux extension syscall (#17757)
this is a faster and safer alternative
on newer kernel versions.
2023-08-27 09:57:11 -07:00
Krishnan Parthasarathi 53abd25116
Don't log when object to be tiered is not found (#17924) 2023-08-25 23:34:16 -07:00
Harshavardhana 1ea7826c0e
do not have to consider replicationTimestamp for healing and quorum (#17922)
replicationTimestamp might differ if there were retries
in replication and the retried attempt overwrote in
quorum but enough shards with newer timestamp causing
the existing timestamps on xl.meta to be invalid, we
do not rely on this value for anything external.

this is purely a hint for debugging purposes, but there
is no real value in it considering the object itself
is in-tact we do not have to spend time healing this
situation.

we may consider healing this situation in future but
that needs to be decoupled to make sure that we do not
over calculate how much we have to heal.
2023-08-25 15:31:15 -07:00
Anis Eleuch 0cde37be50
Reduce the number of calls to import bucket metadata (#17899)
For each bucket, save the bucket metadata 
once, call the site replication hook once
2023-08-25 07:59:16 -07:00
jiuker 6aeca54ece
fix: replace context by timeout-context from parent-context when `selfSpeedTest` (#17906) 2023-08-25 07:58:38 -07:00
Harshavardhana 124e28578c
remove strict persistence requirements for List() .metacache objects (#17917)
.metacache objects are transient in nature, and are better left to
use page-cache effectively to avoid using more IOPs on the disks.

this allows for incoming calls to be not taxed heavily due to
multiple large batch listings.
2023-08-25 07:58:11 -07:00
Harshavardhana 62c9e500de
remove mTime requirement from pre-condition checks (#17916)
given a versionId the mtime is always the same, it
can never be different than its original value.

versionIds also do not conflict, since they are uuid's
and unique practically forever.
2023-08-24 14:33:58 -07:00
jiuker 02cc18ff29 refactor the perf client for TTFB and TotalResponseTime (#17901) 2023-08-24 10:21:08 -07:00
Harshavardhana ba4566e86d
add missing IAM node metrics to cluster and node endpoint (#17908) 2023-08-24 09:26:37 -07:00
Krishnan Parthasarathi 87cb0081ec
Retain current and upto NewerNoncurrentVersions versions (#17909)
applyNewerNoncurrentVersionLimit method should pass along versions
unaffected by NewerNoncurrentVersions rule for further ILM evaluation.
2023-08-24 09:26:29 -07:00
Poorna 4a6af93c83
mark replication target offline if network timeouts seen (#17907)
regular target liveness check every 5 secs will toggle state back
as target returns online.
2023-08-24 09:24:26 -07:00
Harshavardhana af564b8ba0
allow bootstrap to capture time-spent for each initializers (#17900) 2023-08-23 03:07:06 -07:00
Klaus Post 7c8746732b
Return cancelled storage calls as 499 (#17895)
Make upstream cancels more visible - right now they are just reported as "forbidden".
2023-08-22 11:10:41 -07:00
Klaus Post f506117edb
Reduce memory profiling rate (#17894)
Change profiling from every 4KB to every 128K, reducing the lock contention by a factor of 32.
2023-08-22 07:21:49 -07:00
Harshavardhana 1c5af7c31a
serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886)
we expect a certain level of IOPs and latency so this is okay.

fixes other miscellaneous bugs

- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
2023-08-21 16:44:50 -07:00
Harshavardhana 3a0125fa1f
remove unexpected logging from peer calls (#17888)
also make sure RequestID is set for system logs
2023-08-21 14:25:24 -07:00
Daniel Valdivia 328cb0a076
Pass environment variable to control session length to console (#17885)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-08-21 11:55:43 -07:00
jiuker e3ea97c964
fix: replace req context by locker context (#17880) 2023-08-19 22:09:07 -07:00
Andreas Auernhammer 8f8f8854f0
update `minio/kes-go` dep to v0.2.0 (#17850)
This commit updates the minio/kes-go dependency
to v0.2.0 and updates the existing code to work
with the new KES APIs.

The `SetPolicy` handler got removed since it
may not get implemented by KES at all and could
not have been used in the past since stateless KES
is read-only w.r.t. policies and identities.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2023-08-19 07:37:53 -07:00
Anis Eleuch 4c6869cd9a
ilm: Fix cleaning non current null versions (#17876) 2023-08-18 12:55:47 -07:00
Harshavardhana dde1a12819
fix: validate incoming uploadID to be base64 encoded (#17865)
Bonus fixes include

- do not have to write final xl.meta (renameData) does this
  already, saves some IOPs.

- make sure to purge the multipart directory properly using
  a recursive delete, otherwise this can easily pile up and
  rely on the stale uploads cleanup.

fixes #17863
2023-08-17 09:37:55 -07:00
Harshavardhana 9ebd10d3f4
Revert "Include SuccessorModTime for FileInfo quorum (#17732)" (#17860)
This reverts commit bf3901342c.

This is to fix a regression caused when there are inconsistent
versions, but one version is in quorum. SuccessorModTime issue
must be fixed differently.
2023-08-16 07:51:33 -07:00
Harshavardhana 3ba927edae
fix: batch status reporting after complete (#17852)
batch status can perpetually wait after completion
due to a race between the MetricsHandler() returning
the active metrics in intervals of 1sec and delete
of metrics after job completion.

this PR ensures that we keep the 'status' around
for a while, i.e upto 24hrs for all the batch jobs.
2023-08-15 12:22:30 -07:00
Harshavardhana c4ca0a5a57
add two more drive metrics when metrics is available (#17854) 2023-08-15 10:55:47 -07:00
Klaus Post 406ea4f281
Fix distributed listing not able to resume (#17855)
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.

This meant that in distributed systems listings would never be able to continue and would restart on every call.

Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.
2023-08-15 07:45:25 -07:00
Harshavardhana 64aa7feabd
allow specifying lower disks for Walk() (#17829)
useful when you may want Walk() with
reduced quorum requirements.
2023-08-14 21:32:39 -07:00
Poorna 875f4076ec
site replication: avoid retries when peer is offline (#17853) 2023-08-14 21:31:41 -07:00
Harshavardhana 4643efe6be
fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana b760137e1d
fix: add proxyByNode for batch jobs as part of their jobId (#17844) 2023-08-11 13:12:35 -07:00
Harshavardhana 5f56f441bf
fix: apply common notification code with content-type (#17843) 2023-08-11 11:34:43 -07:00
Klaus Post 96a22bfcbb
fix: wrapped io.EOF during ListObjects() (#17842)
When listing getObjectFileInfo can return `io.EOF` if file is being written.

When we wrap the error it will *not* retry upstream, since `io.EOF` is a valid return value.

Allow one retry before returning errors and canceling the listing.
2023-08-11 09:47:16 -07:00
Poorna dfaf735073
replication: fix queuing of large uploads (#17831)
Fixes regression from #17687
2023-08-10 15:48:42 -07:00
Anis Eleuch 7fcfde7f07
s3: Pick a pool with >85% if all other pools are in suspended state (#17826) 2023-08-10 11:06:31 -07:00
jiuker b1391d1991
feat: support perf client to show `TX` from client to server (#17718) 2023-08-10 07:14:46 -07:00
Harshavardhana eb55034dfe
optimize deletePrefix, use direct set location via object name (#17827)
* optimize deletePrefix, use direct set location via object name

instead of fanning out the calls for an object force delete
we can assume the set location and not do fan-out calls

* Apply suggestions from code review

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>

---------

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>
2023-08-09 16:30:22 -07:00
Harshavardhana c45bc32d98
skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana 6e860b6dc5
count all versions as part of DeleteAllVersionsAction (#17821) 2023-08-09 08:55:19 -07:00
Harshavardhana b732a673dc
reduce logging in bucket replication in retry scenarios (#17820) 2023-08-08 13:27:40 -07:00
Yang Wu 23e4895dfc
Create metrics slice when necessary (#17809) 2023-08-07 02:21:22 -07:00
Harshavardhana 8666c55ca6
fix: do not use PrefixEnabled() logic to ignore valid objects (#17677)
ignoring valid objects with valid replication metadata
after the Prefix was disabled must still honor the older
metadata.

this can lead to unexpected results, allow it during
READ phase always.
2023-08-05 13:56:01 -07:00
Anis Eleuch a3f00c5d5e
batch: Strict unmarshal yaml document to avoid user made typos (#17808)
// UnmarshalStrict is like Unmarshal except that any fields that are found
// in the data that do not have corresponding struct members, or mapping
// keys that are duplicates, will result in
// an error.
2023-08-05 13:51:48 -07:00
Poorna 26c23b30f4
replication: set context timeout for NewMultipartUpload calls (#17807) 2023-08-05 12:27:07 -07:00
Anis Eleuch a436fd513b
track client disconnections properly for all ListObjects calls (#17804)
Currently ListObjects* calls were returning 200 OK for timed-out clients,
this makes debugging via `mc admin trace` very hard.
2023-08-04 15:57:27 -07:00
Harshavardhana 533cd8d6df
fix: batch replication pull must preserve versionID (#17805)
batch replication pull must preserve versionID regardless
of destination bucket versioning configuration.

This is similar to the issue with decommissioning and rebalancing
2023-08-04 12:09:10 -07:00
Harshavardhana cb089dcb52
error out by default beyond 10000 versions per object (#17803)
```
You've exceeded the limit on the number of versions you can create on this object
```
2023-08-04 10:40:21 -07:00
Harshavardhana 239ccc9c40
fix: crash in globalTierJournal when TierConfig is not initialized (#17791) 2023-08-03 14:16:15 -07:00
Poorna b762fbaf21
sts: validate if iam subsystem initialized in handlers (#17796) 2023-08-03 13:24:25 -07:00
Praveen raj Mani 0285df5a02
fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Harshavardhana 45fb375c41
allow healing to prefer local disks over remote (#17788) 2023-08-03 02:18:18 -07:00
Harshavardhana 4a4950fe41
fix: honor requested allow origin settings properly (#17789)
fixes #17778
2023-08-02 20:41:21 -07:00
Anis Eleuch 1664fd8bb1
Avoid logging errors twice during transitioned objects expiration (#17782) 2023-08-02 09:06:03 -07:00
Harshavardhana 21cdd2bf5d
avoid overwriting metrics on success, save it in defer (#17780) 2023-08-01 22:19:56 -07:00
Harshavardhana 0153f96a20
add deadlines for readMetadata() in listing (#17776)
Bonus: also skip spending time looking for xl.json

- Listing()
- Delete()
2023-08-01 21:52:31 -07:00
Harshavardhana a7a7533190
add new errors for Disks with timeouts (#17770) 2023-08-01 12:47:50 -07:00
Poorna 311380f8cb
replication resync: fix queueing (#17775)
Assign resync of all versions of object to the same worker to avoid locking
contention. Fixes parallel resync implementation in #16707
2023-08-01 11:51:15 -07:00
Harshavardhana b0f0e53bba
fix: make sure to correctly initialize health checks (#17765)
health checks were missing for drives replaced since

- HealFormat() would replace the drives without a health check
- disconnected drives when they reconnect via connectEndpoint()
  the loop also loses health checks for local disks and merges
  these into a single code.
- other than this separate cleanUp, health check variables to avoid
  overloading them with similar requirements.
- also ensure that we compete via context selector for disk monitoring
  such that the canceled disks don't linger around longer waiting for
  the ticker to trigger.
- allow disabling active monitoring.
2023-08-01 10:54:26 -07:00
Klaus Post 004f1e2f66
Fix trailing header signature mismatch (#17774)
Seems like clients may omit a newline at the end of the trailer chunk. Each header should end with a newline. Add that if missing.

Fixes #17662
2023-08-01 08:45:57 -07:00
Harshavardhana 2fa561f22e
do not crash on invalid metric values (#17764)
```
minio[1032735]: panic: label value "\xc0.\xc0." is not valid UTF-8
minio[1032735]: goroutine 1781101 [running]:
minio[1032735]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
```

log such errors for investigation
2023-08-01 00:55:39 -07:00
Harshavardhana 81be718674
fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Sho Ce 49a1e2f98e
update-notifier.go: misleading version age message (#17750) 2023-07-31 08:36:19 -07:00
Klaus Post 684c46369c
Send events for extracted objects (#17760)
Fixes #17759
2023-07-31 08:33:51 -07:00
Harshavardhana 73edd5b8fd
introduce 'mc admin config set alias/ api odirect=on' (#17753)
change disable_odirect=off -> odirect=on to make it
easier to understand, instead of making it double
negative.
2023-07-31 00:12:53 -07:00
Harshavardhana 5e5bdf5432
capture total errors data availability and any timeout errors (#17748) 2023-07-29 23:26:26 -07:00
Harshavardhana f13cfcb83e
allow disabling O_DIRECT for write ops (#17751)
on really slow systems, O_DIRECT simply kills the drives
allow for a way to disable them.
2023-07-29 15:17:56 -07:00
Harshavardhana 731e03fe5a
add ReadFileStream deadline for disk call (#17745)
timeout the reader side if hung via disk max timeout
2023-07-28 15:37:53 -07:00
Anis Eleuch 7057d00a28
s3: Return invalid bucket name the first thing in all S3 calls (#17742) 2023-07-28 10:49:20 -07:00
Harshavardhana 114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
ruspaul013 a92cb66468
Get the signed headers in the order they were signed (#17690)
use pSignValues to get signed headers in order
2023-07-27 11:45:30 -07:00
ruspaul013 535f97ba61
check if metadata headers/url values are equal with signed headers (#17737) 2023-07-27 11:44:56 -07:00
drivebyer 14ebd82dbd
fix: missing disk metrics when query metric api from peer (#17738) 2023-07-27 11:44:13 -07:00
Harshavardhana 47dcfcbdd4
introduce deadlines on READ operations (#17724) 2023-07-27 07:33:05 -07:00
Krishnan Parthasarathi bf3901342c
Include SuccessorModTime for FileInfo quorum (#17732) 2023-07-26 17:04:16 -07:00
Harshavardhana b28bcad11b
avoid Access() calls on known bucket paths (#17719) 2023-07-26 11:31:40 -07:00
Harshavardhana a7c71e4c6b
protect disk monitoring to avoid busy loop configuration (#17723) 2023-07-25 20:02:22 -07:00
Poorna 1a42693d68
replication: limit larger uploads to a subset of workers (#17687)
Limit large uploads (> 128MiB)  to a max of 10 workers, intent is to avoid
larger uploads from using all replication bandwidth, giving room for smaller
uploads to sync faster.
2023-07-25 20:02:02 -07:00
Harshavardhana e7b60c4d65
Add slow drive timeouts to match with active disk monitoring (#17701)
allow active disk-monitoring to be configurable, and use
these add deadlines in various call layers for various
syscalls.
2023-07-25 16:58:31 -07:00
Poorna f95129894d
Use decrypted object size while computing object size summary (#17717)
Corrects an issue with encrypted versioned objects being reported under
`unversioned` bin in the object version histogram
2023-07-24 17:13:25 -07:00
Harshavardhana c32c71c836
allow DNS cache TTL to be configurable (#17709)
this is added for now as a hidden variable
2023-07-24 15:13:35 -07:00
Harshavardhana 14e1ace552
remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707)
slower drives get knocked off because they are too slow via 
active monitoring, we do not need to block calls arbitrarily.

Serializing adds latencies for already slow calls, remove
it for SSDs/NVMEs

Also, add a selection with context when writing to `out <-`
channel, to avoid any potential blocks.
2023-07-24 09:30:19 -07:00
drivebyer a7fb3a3853
fix: Create metrics slice when necessary in getCacheMetrics() (#17711) 2023-07-24 08:40:21 -07:00
Klaus Post 2da4bd5f1a
Revert "don't error when asked for 0-based range on empty objects (#17708) (#17713)
Revert "don't error when asked for 0-based range on empty objects (#17708)"

This reverts commit 7e76d66184.

There is no valid way to specify offsets in a 0-byte file. Blame it on the [RFC](https://datatracker.ietf.org/doc/html/rfc7233#section-4.4)

> The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the 
> request's Range header field (Section 3.1) overlap the current extent of the selected resource...

A request for "bytes=0-" is a request for the first byte of a resource. If the resource is 0-length, 
the range [0,0] does not overlap the resource content and the server responds with an error.
2023-07-24 07:56:28 -07:00
flisk 7e76d66184
don't error when asked for 0-based range on empty objects (#17708)
In a reverse proxying setup, a proxy in front of MinIO may attempt to
request objects in slices for enhanced cache efficiency. Since such a
a proxy cannot have prior knowledge of how large a requested resource is,
it usually sends a header of the form:

        Range: 0-$slice_size

... and, depending on the size of the resource, expects either:

- an empty response, if $resource_size == 0
- a full response, if $resource_size <= $slice_size
- a partial response, if $resource_size > $slice_size

Prior to this change, MinIO would respond 416 Range Not Satisfiable if a
client tried to request a range on an empty resource. This behavior is
technically consistent with RFC9110[1] – However, it renders sliced
reverse proxying, such as implemented in Nginx, broken in the case of
empty files. Nginx itself seems to break this convention to enable
"useful" responses in these cases, and MinIO should probably do that
too.

[1]: https://www.rfc-editor.org/rfc/rfc9110#byte.ranges
2023-07-23 00:10:03 -07:00
Harshavardhana 7764f4a8e3
return tags as part of Head/Get calls (#17635)
AWS S3 only returns the number of tag
counts, along with that we must return
the tags as well to avoid another metadata
call to the server.
2023-07-22 07:19:43 -07:00
Kaan Kabalak 6624f970c0
Fix spelling of 'already' across repository (#17703) 2023-07-21 08:45:08 -07:00
Harshavardhana 331bdc2245
fix: remove CompleteMultipartUpload() 200 OK response for blocking calls (#17699)
sending whitespace character with CompleteMultipartUpload()
with 200 OK was an AWS S3 compatible implementation detail,
and it was expected that the client SDK must look for both
successful XML as well as error XML for 200 OK.

But this is not useful anymore on MinIO, since we do not
have any large delayed coalescing of parts anymore.
2023-07-20 22:14:38 -07:00
Harshavardhana e12ab486a2
avoid using os.Getenv for internal code, use env.Get() instead (#17688) 2023-07-20 07:52:49 -07:00
Krishnan Parthasarathi 9eeee92d36
Add deletemarker_total metric (#17689) 2023-07-20 07:52:32 -07:00
Anis Eleuch 756d6aa729
fix: report correct pool/set/disk indexes for offline disks (#17695) 2023-07-20 07:48:21 -07:00
Harshavardhana bddd53d6d2
fix: retry listing in decommissioning if it fails perpetually (#17682) 2023-07-19 13:09:37 -07:00
jiuker a99cd825ab
fix: byHost realTime metrics API (#17681) 2023-07-18 23:50:30 -07:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Harshavardhana 4f257bf1e6
pick internode interface properly via globalLocalNodeName (#17680)
current code will not pick the right interface name
if --address or --interface is not provided.
2023-07-18 19:18:11 -07:00
Krishnan Parthasarathi 0120ff93bc
admin-info: add DeleteMarkers count (#17659) 2023-07-18 10:49:40 -07:00
Anis Eleuch 49638fa533
s3: Delete Bucket should not recreate bucket if it does not exist (#17676)
Also return Bucket Not Found error in the same use case.
2023-07-18 09:32:19 -07:00
Shubhendu 7a3a7b19e5
Added a start script to inspect command output (#17591)
Using this script, post decrypt we should be able to bring up the
MinIO instance with same configuration.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-17 14:15:28 -07:00
Harshavardhana 24e86d0c59
avoid passing around poolIdx, setIdx instead pass the relevant disks (#17660) 2023-07-17 09:52:05 -07:00
jiuker d118031ed6
fix: when Origin: null is set return back '*' for allow origins (#17651) 2023-07-15 12:15:06 -07:00
Anis Eleuch 341a89c00d
return a descriptive error when loading any IAM item fails (#17654)
Sometimes IAM fails to load certain items, which could be a user, 
a service account or a policy but with not enough information for 
us to debug.

This commit will create a more descriptive error to make it easier to
debug in such situations.
2023-07-14 20:17:14 -07:00
Anis Eleuch df29d25e6b
return different status code for internode communication (#17655)
mc admin trace -a will be able to quickly show
401 Unauthorized header to pinpoint trivial issues
between nodes, such as wrong root 
credentials and skewed time.
2023-07-14 18:34:55 -07:00
Harshavardhana 3e196fa7b3
fix: ILM newer noncurrent version limit must return correct versions (#17652)
objects/versions that are not expired via NewerNoncurrentVersions
must be properly returned to be applied under further ILM actions.

this would cause legitimately expired objects to be missed
from expiration.
2023-07-14 16:42:35 -07:00
drivebyer 04c792476f
fix: provide a possible slice cap for heal failed metrics items (#17647)
Signed-off-by: Wu <yang.wu@daocloud.io>
2023-07-14 11:02:45 -07:00
Harshavardhana 005a4a275a
add more bootstrap messages to provide latency (#17650)
- simplify refreshing bucket metadata, wait() to
  depend on how fast the bucket metadata can load.

- simplify resync to start resync in single pass.
2023-07-14 04:00:29 -07:00
Harshavardhana bdddf597f6
shuffle buckets randomly before being scanned (#17644)
this randomness is needed to avoid scanning
the same buckets across different erasure sets,
in the same order.

allow random buckets to be scanned instead
allowing a wider spread of ILM, replication
checks.

Additionally do not loop over twice to fill
the channel, fill the channel regardless of
having bucket new or old.
2023-07-14 02:25:40 -07:00
Aditya Manthramurthy bb6921bf9c
Send AuditLog via new middleware fn for admin APIs (#17632)
A new middleware function is added for admin handlers, including options
for modifying certain behaviors. This admin middleware:

- sets the handler context via reflection in the request and sends AuditLog
- checks for object API availability (skipping it if a flag is passed)
- enables gzip compression (skipping it if a flag is passed)
- enables header tracing (adding body tracing if a flag is passed)

While the new function is a middleware, due to the flags used for
conditional behavior modification, which is used in each route registration
call.

To try to ensure that no regressions are introduced, the following
changes were done mechanically mostly with `sed` and regexp:

- Remove defer logger.AuditLog in admin handlers
- Replace newContext() calls with r.Context()
- Update admin routes registration calls

Bonus: remove unused NetSpeedtestHandler

Since the new adminMiddleware function checks for object layer presence
by default, we need to pass the `noObjLayerFlag` explicitly to admin
handlers that should work even when it is not available. The following
admin handlers do not require it:

- ServerInfoHandler
- StartProfilingHandler
- DownloadProfilingHandler
- ProfileHandler
- SiteReplicationDevNull
- SiteReplicationNetPerf
- TraceHandler

For these handlers adminMiddleware does not check for the object layer
presence (disabled by passing the `noObjLayerFlag`), and for all other
handlers, the pre-check ensures that the handler is not called when the
object layer is not available - the client would get a
ErrServerNotInitialized and can retry later.

This `noObjLayerFlag` is added based on existing behavior for these
handlers only.
2023-07-13 14:52:21 -07:00
Klaus Post 4f89e5bba9
Add active disk health checks (#17539)
Add check every 2 minutes to see if a write+read operation can complete.

If disk is unresponsive for 2 minutes or returns errFaultyDisk, take it offline.
2023-07-13 11:41:55 -07:00
jiuker 183428db03
fear: Implement 'mc support top net' (#17598) 2023-07-13 11:41:19 -07:00
Shireesh Anjal fc6d873758
Use os.ReadFile instead of ioutil.ReadFile (#17649)
ioutil.ReadFile is deprecated and also doesn't work with certain kinds
of symlinks.
2023-07-13 09:07:10 -07:00
Poorna 5e2f8d7a42
replication: Simplify mrf requeueing and add backlog handler (#17171)
Simplify MRF queueing and add backlog handler

- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.

- Change MRF to have each node process only its MRF entries.

- Collect MRF backlog by the node to allow for current backlog visibility
2023-07-12 23:51:33 -07:00
Shubhendu 9b9871cfbb
Added `endpoint` and `versions` attributes to KMS details (#17350)
Now it would list details of all KMS instances with additional
attributes `endpoint` and `version`. In the case of k8s-based
deployment the list would consist of a single entry.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-12 23:50:38 -07:00
guangwu f80b6926d3
chore: fix minor issues reported via staticcheck (#17639) 2023-07-12 20:33:11 -07:00
Shubhendu 6dc55fe5ed
Corrected the API name for audit logging purpose (#17642)
This would better to record the correct API name so that
any verification around audit logs to figure out if required
APIs are called required no of times, would be correct.
Here in this case of policy attached, API `AttachDetachPolicyBuiltin`
would be called with `requestPath` as `/minio/admin/v3/idp/builtin/policy/attach`
and in case of detach policy the value would be `/minio/admin/v3/idp/builtin/policy/detach`

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-12 15:38:49 -07:00
Harshavardhana 2d1cda2061
fix: do not os.Exit(1) while writing goroutines during shutdown (#17640)
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.

Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.

It can be observed how now restarting the service is
much snappier.
2023-07-12 07:18:30 -07:00
Harshavardhana a566bcf613
treat 0-byte objects to honor same quorum as delete marker (#17633)
on unversioned buckets its possible that 0-byte objects
might lose quorum on flaky systems, allow them to be same
as DELETE markers. Since practically speak they have no
content.
2023-07-11 21:53:49 -07:00
Klaus Post 9885a0a6af
Fix hasSpaceFor in SNSD setup (#17630)
If drive is offline or filled we divide by 0.

Fixes #17629

Bonus: Reject when any valid disk exceeds minimum inode threshold.
2023-07-11 14:29:34 -07:00
Kaan Kabalak f64d62b01d
Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Harshavardhana 82075e8e3a
use strconv variants to improve on performance per 'op' (#17626)
```
BenchmarkItoa
BenchmarkItoa-8         	673628088	         1.946 ns/op	       0 B/op	       0 allocs/op
BenchmarkFormatInt
BenchmarkFormatInt-8    	592919769	         2.012 ns/op	       0 B/op	       0 allocs/op
BenchmarkSprint
BenchmarkSprint-8       	26149144	        49.06 ns/op	       2 B/op	       1 allocs/op
BenchmarkSprintBool
BenchmarkSprintBool-8   	26440180	        45.92 ns/op	       4 B/op	       1 allocs/op
BenchmarkFormatBool
BenchmarkFormatBool-8   	1000000000	         0.2558 ns/op	       0 B/op	       0 allocs/op
```
2023-07-11 07:46:58 -07:00
Harshavardhana 5b7c83341b
move per bucket metrics to peer location (#17627) 2023-07-11 07:46:24 -07:00
Poorna fb49aead9b
replication: add validation API (#17520)
To check if replication is set up properly on a bucket.
2023-07-10 20:09:20 -07:00
Aditya Manthramurthy 85f5700e4e
fix: missing audit logger call for some admin APIs (#17623) 2023-07-10 16:59:44 -07:00
Aditya Manthramurthy 43b3c093ef
Fix: set request id in trace context properly (#17622) 2023-07-10 15:40:44 -07:00
Kaan Kabalak bd6842d917
Further print log messages once per error (#17618) 2023-07-10 07:59:57 -07:00
Poorna e8c98c3246
Avoid extra GetObjectInfo call in DeleteObject API (#17599)
Optimize DeleteObject API to avoid extra 
GetObjectInfo call on the replicating side.

For receiving side, it is just a regular
DeleteObject call.

Bonus: Fix a corner case where version purged is 
absent on target (either due to replication not yet
complete or target version already deleted in a
one-way replication or when replication was disabled). 

In such cases, mark version purge complete.
2023-07-10 07:57:56 -07:00
Harshavardhana dfd7cca0d2
fix: allow cancel of decom only when its in progress (#17607) 2023-07-10 07:55:38 -07:00
Harshavardhana f6186965c3
honor DeleteAllVersions in list(), head() calls (#17604) 2023-07-08 15:42:10 -07:00
Harshavardhana 28a01f0320
update missing license header in files (#17603) 2023-07-08 10:42:05 -07:00
Anis Eleuch 6d0bc5ab1e
prometheus: Fix internode stats (#17594)
Internode calculation was done inside S3 handlers, fix it by moving it
to internode handlers.

Remove admin stats since it is not used.
2023-07-08 07:35:11 -07:00
Aditya Manthramurthy 7af78af1f0
fix: set request ID in tracing context key (#17602)
Since `addCustomerHeaders` middleware was after the `httpTracer`
middleware, the request ID was not set in the http tracing context. By
reordering these middleware functions, the request ID header becomes
available. We also avoid setting the tracing context key again in
`newContext`.

Bonus: All middleware functions are renamed with a "Middleware" suffix
to avoid confusion with http Handler functions.
2023-07-08 07:31:42 -07:00
Harshavardhana abb1f22057 Revert "change ttfb_distribution metrics to histogramMetric (#17115)"
This reverts commit 9112ca4e29.
2023-07-07 13:57:37 -07:00
Harshavardhana f41edb23e2
add variadic delays in peer notification retries (#17592)
just adds more `jitter` in our retries to avoid
burst flooding for peer calls.
2023-07-07 07:47:38 -07:00
Klaus Post e20aab25ec
Check for progress before we reach the limit (#17552) 2023-07-07 00:13:57 -07:00
Klaus Post ff5988f4e0
Reduce allocations (#17584)
* Reduce allocations

* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.

* Make type safe. Make buffer 4K instead of 8.

* Unslice
2023-07-06 16:02:08 -07:00
jiuker c47ff44f5e
fix: disable site network test if site replication is disabled (#17579) 2023-07-06 09:19:14 -07:00
Harshavardhana 8af0773baf
remove deprecated Content-Security-Policy (#17580)
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/block-all-mixed-content
2023-07-06 09:18:38 -07:00
jiuker 2dbb1cff4a
feat: support perf site replication (#17477) 2023-07-05 22:28:26 -07:00
Klaus Post 6efcf9c982
Do lockless last minute latency metrics (#17576)
Collect metrics in one second and accumulate lockless before sending upstream.
2023-07-05 10:40:45 -07:00
Harshavardhana 0bc34952eb
fix: under FanOut API avoid repeated md5sum calculation (#17572)
md5sum calculation has a high CPU overhead, avoid calculating
it repeatedly for similar fanOut calls.

To fix following CPU profiler result
```
(pprof) top10
Showing nodes accounting for 678.68s, 84.67% of 801.54s total
Dropped 1072 nodes (cum <= 4.01s)
Showing top 10 nodes out of 156
      flat  flat%   sum%        cum   cum%
   332.54s 41.49% 41.49%    332.54s 41.49%  runtime/internal/syscall.Syscall6
   228.39s 28.49% 69.98%    228.39s 28.49%  crypto/md5.block
    48.07s  6.00% 75.98%     48.07s  6.00%  runtime.memmove
    28.91s  3.61% 79.59%     28.91s  3.61%  github.com/minio/highwayhash.updateAVX2
     8.25s  1.03% 80.61%      8.25s  1.03%  runtime.futex
     8.25s  1.03% 81.64%     10.81s  1.35%  runtime.step
     6.99s  0.87% 82.52%     22.35s  2.79%  runtime.pcvalue
     6.67s  0.83% 83.35%     38.90s  4.85%  runtime.mallocgc
     5.77s  0.72% 84.07%     32.61s  4.07%  runtime.gentraceback
     4.84s   0.6% 84.67%     10.49s  1.31%  runtime.lock2
```
2023-07-05 03:16:05 -07:00
Harshavardhana e37c4efc6e
fix: upon DNS refresh() failure use previous values (#17561)
DNS refresh() in-case of MinIO can safely re-use
the previous values on bare-metal setups, since
bare-metal arrangements do not change DNS in any 
manner commonly.

This PR simplifies that, we only ever need DNS caching
on bare-metal setups.

- On containerized setups do not enable DNS
  caching at all, as it may have adverse effects on
  the overall effectiveness of k8s DNS systems.

  k8s DNS systems are dynamic and expect applications
  to avoid managing DNS caching themselves, instead
  provide a cleaner container native caching
  implementations that must be used.

- update IsDocker() detection, including podman runtime

- move to minio/dnscache fork for a simpler package
2023-07-03 12:30:51 -07:00
Anis Eleuch 15fd5ce2fa
fix: A typo in per pool make/delete bucket errs calculation (#17553) 2023-07-03 09:47:40 -07:00
Harshavardhana 7f782983ca
fix: for FTP server driver allow implicit trust of TLS (#17541)
fixes #17535
2023-06-30 08:04:13 -07:00
Aditya Manthramurthy 9d628346eb
fix: service account list for root user (#17547)
Fixes https://github.com/minio/minio/issues/17545
2023-06-30 08:02:12 -07:00
Aditya Manthramurthy bde533a9c7
fix: OpenID config initialization (#17544)
This is due to a regression in the handling of the enable key in OpenID
configuration.
2023-06-29 23:38:26 -07:00
Harshavardhana aae6846413
feat: allow expiration of all versions via ILM Expiration action (#17521)
Following extension allows users to specify immediate purge of
all versions as soon as the latest version of this object has
expired.

```
<LifecycleConfiguration>
    <Rule>
        <ID>ClassADocRule</ID>
        <Filter>
           <Prefix>classA/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Expiration>
             <Days>3650</Days>
	     <ExpiredObjectAllVersions>true</ExpiredObjectAllVersions>
        </Expiration>
    </Rule>
    ...
```
2023-06-28 22:12:28 -07:00
Harshavardhana 5317a0b755
fix: support LDAP settings properly in ftp/sftp (#17536)
Bonus this PR enhances and supports creating
buckets via ftp `mkdir`

fixes #17526
2023-06-28 13:15:21 -07:00
Harshavardhana 73de721a63
fix: handle copyObjectPart encryption properly (#17530)
- look for requested encryption while compressing
  not just via HTTP Headers, but also via multipart
  metadata

- look for SSE-S3 etag decryption not just via HTTP
  Headers, but also via multipart metadata

fixes #17519
2023-06-28 09:43:50 -07:00
Harshavardhana d2f5c3621f
fix: add additional decommission traces for ILM expired content (#17522)
current decommission traces were missing for

- Skipped ILM expired versions
- Skipped single DELETE marked version
- A success or failure in decommissioning DELETE marker
- allow additional info to be shared in DecomStatus() API
2023-06-27 11:59:40 -07:00
Harshavardhana 1818764840
fix: bug in passing Versioned field set for getHealReplicationInfo() (#17498)
Bonus: rejects prefix deletes on object-locked buckets earlier
2023-06-27 09:45:50 -07:00
Harshavardhana d3e5e607a7
allow site-replication checks to work on non-distributed setups (#17524)
fixes #17523
2023-06-27 09:23:50 -07:00
Shireesh Anjal c1943ea3af
Capture realtime metrics in health report (#17516) 2023-06-27 01:39:18 -07:00
guangwu 87b6fb37d6
chore: pkg imported more than once (#17444) 2023-06-26 09:21:29 -07:00
Kaan Kabalak 21fbe88e1f
Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Harshavardhana 1f8b9b4bd5
fix: do not listAndHeal() inline with PutObject() (#17499)
there is a possibility that slow drives can actually add latency
to the overall call, leading to a large spike in latency.

this can happen if there are other parallel listObjects()
calls to the same drive, in-turn causing each other to sort
of serialize.

this potentially improves performance and makes PutObject()
also non-blocking.
2023-06-24 19:31:04 -07:00
Klaus Post 216069d0da
Remove 'null' version ID from directory object response (#17495)
Fixes #17494

Regression from #17132
2023-06-23 13:26:00 -07:00
Harshavardhana eefa047974
fix: keep decommission in a go-routine (#17496)
This was removed by mistake in #17491
2023-06-23 12:29:32 -07:00
Anis Eleuch d8dad5c9ea
s3: Make/Delete buckets to use error quorum per pool (#17467) 2023-06-23 11:48:23 -07:00
Klaus Post bf8a68879c
fix: Time ILM Actions for scanner info (#17493)
ILM Actions were not timed fix it.
2023-06-23 07:48:36 -07:00
Aditya Manthramurthy f3248a4b37
Redact all secrets from config viewing APIs (#17380)
This change adds a `Secret` property to `HelpKV` to identify secrets
like passwords and auth tokens that should not be revealed by the server
in its configuration fetching APIs. Configuration reporting APIs now do
not return secrets.
2023-06-23 07:45:27 -07:00
Harshavardhana d315d012a4
decom: during multiple pool decom preserve current pool status (#17491)
removal of completed pools must retain pool status of other
pools in draining, to resume any remaining draining operations.
2023-06-23 07:44:18 -07:00
Harshavardhana bd9bf3693f
lambda: negative duration for presigned URL default to 1H (#17489)
fixes a bug where users created with Expiration as
timeSentinel is not rejected while generating the
presigned URL for lambda processing.
2023-06-23 00:17:24 -07:00
Aditya Manthramurthy 82ce78a17c
Fix locking in policy attach API (#17426)
For policy attach/detach API to work correctly the server should hold a
lock before reading existing policy mapping and until after writing the
updated policy mapping. This is fixed in this change.

A site replication bug, where LDAP policy attach/detach were not
correctly propagated is also fixed in this change.

Bonus: Additionally, the server responds with the actual (or net)
changes performed in the attach/detach API call. For e.g. if a user
already has policy A applied, and a call to attach policies A and B is
performed, the server will respond that B was attached successfully.
2023-06-21 22:44:50 -07:00