Commit Graph

5616 Commits

Author SHA1 Message Date
Harshavardhana fd37418da2
fix: allow server not initialized error to be retried (#18300)
Since relaxing quorum the error across pools
for ListBuckets(), GetBucketInfo() we hit a
situation where loading IAM could potentially
return an error for second pool that server
is not initialized.

We need to handle this, let the pool come online
and retry transparently - this PR fixes that.
2023-10-23 12:30:20 -07:00
Harshavardhana bbfea29c2b
use object modTime for the event sequencer ID (#18285)
always set modTime after lock is acquired in
completemultipart stage to make sure that the
modTime is not racy.
2023-10-20 19:28:05 -07:00
Harshavardhana aa703dc903
relax write quorum requirement for ListBuckets()/HeadBucket() (#18288)
Also fix error handling for HeadBucket() to be pool specific
2023-10-20 17:50:21 -07:00
Harshavardhana 780882efcf
do not check for query params to be signed headers (#18283)
x-amz-signed-headers is meant for HTTP headers only
not for query params, using that to verify things
further can lead to failure.

The generated presigned URL with custom metadata
is already kosher (tamper proof).

fixes #18281
2023-10-19 21:32:49 -07:00
Klaus Post ba6218b354
fix: resource metrics "concurrent map iteration and map write" (#18273)
`resourceMetricsMap` has no protection against concurrent reads and writes.

Add a mutex and don't use maps from the last iteration.

Bug introduced in #18057

Fixes #18271
2023-10-18 13:28:50 -07:00
Harshavardhana 8e32de3ba9
cache DiskInfo() metrics call separately (#18270) 2023-10-18 11:17:32 -07:00
Klaus Post e37508fb8f
fix: linter errors in Windows specific code (#18276) 2023-10-18 11:08:15 -07:00
Klaus Post b46a717425
Remove unused config migration (#18277)
None of the migration is called. Remove dead code.
2023-10-18 11:05:24 -07:00
Klaus Post 7926df0b80
Fix globalDeploymentID race (#18275)
globalDeploymentID was being read while it was being set.

Fixes race:

```
WARNING: DATA RACE
Write at 0x0000079605a0 by main goroutine:
  github.com/minio/minio/cmd.connectLoadInitFormats()
      github.com/minio/minio/cmd/prepare-storage.go:269 +0x14f0
  github.com/minio/minio/cmd.waitForFormatErasure()
      github.com/minio/minio/cmd/prepare-storage.go:294 +0x21d
...

Previous read at 0x0000079605a0 by goroutine 105:
  github.com/minio/minio/cmd.newContext()
      github.com/minio/minio/cmd/utils.go:817 +0x31e
  github.com/minio/minio/cmd.adminMiddleware.func1()
      github.com/minio/minio/cmd/admin-router.go:110 +0x96
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
  github.com/minio/minio/cmd.setBucketForwardingMiddleware.func1()
      github.com/minio/minio/cmd/generic-handlers.go:460 +0xb1a
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
...
```
2023-10-18 08:06:57 -07:00
Harshavardhana f91b257f50
choose different max_concurrent requests per drive based on HDD/NVMe (#18254)
currently the default for all drives is 512, which is a lot
for HDDs the recent testing has revealed moving this to 32
for HDDs seems like a fair value.
2023-10-16 17:18:13 -07:00
Harshavardhana edfb310a59
fix: always load ENVs from files first as soon as server starts (#18247)
This is a regression from #18231, however reading from ENV files
must happen well before any parsing logic is invoked.
2023-10-15 21:13:43 -07:00
Poorna 78f1f69d57
fix site replication resync status (#18245)
To persist status changes on disk upon completion.

Adds new tests to handle this functionality.
2023-10-13 22:17:22 -07:00
Harshavardhana e1e33077e8
fix: tests and resync replication status (#18244) 2023-10-13 17:03:34 -07:00
Aditya Manthramurthy b3e7de010d
Remove usage of errors.Join for go1.19 compat (#18243) 2023-10-13 15:14:16 -07:00
Shireesh Anjal bf1c6edb76
Revert "Capture network device info in health report" (#18241)
Introducing a new version of healthinfo struct for adding this info is
not correct. It needs to be implemented differently without adding a new
version.

This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.
2023-10-13 07:46:36 -07:00
jiuker 2ac7fee017
fix: missing fileName will upload failed when PostPolicyBucketHandler (#18240) 2023-10-13 07:31:23 -07:00
Klaus Post 128256e3ab
Add event counters (#18232)
Export metric for global events sent and skipped for the lifetime of the server.
2023-10-12 15:39:22 -07:00
Shireesh Anjal a66a7f3e97
Capture network device info in health report (#18213) 2023-10-12 15:33:31 -07:00
jiuker 20b79f8945
fix: env depend on the flag (#18231) 2023-10-12 15:32:38 -07:00
Klaus Post 9a877734b2
Fix various poolmeta races (#18230)
There is a fundamental race condition in `newErasureServerPools`, where setObjectLayer is 
called before the poolMeta has been loaded/populated.

We add a placeholder value to this field but disable all saving of the value, so we don't risk 
overwriting the value on disk. Once the value has been loaded or created, it is replaced with 
the proper value, which will also be saved.

Also fixes various accesses of `poolMeta` that were done without locks.

We make the `poolMeta.IsSuspended` return false, even if we shouldn't risk out-of-bounds 
reads anymore.
2023-10-12 15:30:42 -07:00
Harshavardhana 409c391850
implement helpers to get relevant info instead of FileInfo() (#18228) 2023-10-12 15:29:59 -07:00
jiuker 000928d34e
fix: should call func globalOSMetrics.time(s)() when updateOSMetrics (#18209) 2023-10-12 00:08:13 -07:00
Harshavardhana 6829ae5b13
completely remove drive caching layer from gateway days (#18217)
This has already been deprecated for close to a year now.
2023-10-11 21:18:17 -07:00
jiuker f09756443d
fix: a dynamic config will make a panic for addOrUpdateIDP (#18208) 2023-10-11 09:06:40 -07:00
jiuker 5512016885
fix: siteResyncMetrics init will make a deadlock when len(siteReplication) >= 3 (#18206) 2023-10-10 23:27:27 -07:00
Harshavardhana 21ecb941fe
fix: avoid counting out of band deletes during disk heal (#18205) 2023-10-10 14:39:48 -07:00
Harshavardhana 77e94087cf
fix: calling statfs() call moves the disk head (#18203)
if erasure upgrade is needed rely on the in-memory
values, instead of performing a "DiskInfo()" call.

https://brendangregg.com/blog/2016-09-03/sudden-disk-busy.html

for HDDs these are problematic, lets avoid this because
there is no value in "being" absolutely strict here
in terms of parity. We are okay to increase parity
as we see based on the in-memory online/offline ratio.
2023-10-10 13:47:35 -07:00
Klaus Post 9ab1f25a47
fix : PutObjectExtract data races (#18199)
Several callers to putObjectTar may be fighting to set sc. Move the write out of the loop.

Use static resp, and request elements.

Fixes tests with -race:

```
WARNING: DATA RACE
Read at 0x00c01cd680e0 by goroutine 691354:
  github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1()
      e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2130 +0x149
  github.com/minio/minio/cmd.untar.func1()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6
  github.com/minio/minio/cmd.untar.func8()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4

Previous write at 0x00c01cd680e0 by goroutine 691352:
  github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1()
      e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2131 +0x15d
  github.com/minio/minio/cmd.untar.func1()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6
  github.com/minio/minio/cmd.untar.func8()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4
```
2023-10-10 08:36:44 -07:00
jiuker aaab7aefbe
fix: avoid nil panic upon error in GetObjectNInfo via InnerGetObjectNInfoFn (#18198) 2023-10-10 08:35:33 -07:00
Klaus Post 5b8599e52d
Do not log invalid tag errors (#18200)
Eliminate logging on invalid tags:

```
API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0A67480
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: Tags cannot be more than 10 (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0BEA514
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: Cannot provide multiple Tags with the same key (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0E78970
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: The TagKey you have provided is invalid (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE1002AE8
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: The TagValue you have provided is invalid (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()
```
2023-10-10 08:35:03 -07:00
Harshavardhana 74e0c9ab9b
reduce unnecessary logging, simplify certain error handling (#18196)
remove a bunch of unnecessary logs
2023-10-10 00:33:42 -07:00
Harshavardhana dcce83b288
avoid rebalance state for getObjectTags if any (#18197)
fixes #18190
2023-10-09 23:56:26 -07:00
Matthew Toohey f731e7ea36
Fix current_send_in_progress metric always being zero (#18160) 2023-10-09 17:28:17 -07:00
Maxim Tkachenko ec30bb89a4
simplify channel send() in WalkDir() (#18186) 2023-10-09 17:27:55 -07:00
Klaus Post 7cd08594f6
Use better host names for metric errors (#18188)
Typically hosts would end up like this:

```
   "hosts": [
        ":9000",
        ":9000",
        ":9000",
...
```

Also add host name to errors.
2023-10-09 17:27:11 -07:00
Aditya Manthramurthy 2b4531f069
fix: O_DIRECT is on only for multi-disk setups (#18194)
Disable it for single disk/unsupported platforms
2023-10-09 17:08:40 -07:00
Harshavardhana 11544a62aa
fix: upon write failure on disk journal close the file properly (#18183)
close the file properly before dereferencing *os.File,
this can silently leak fd's in rare cases.

This PR fixes this properly.
2023-10-08 12:17:08 -07:00
Taran Pelkey 18550387d5
fix: DeleteServiceAccount API behavior (#18163) 2023-10-08 12:13:18 -07:00
Klaus Post 0de2b9a1b2
Fix panic on double unfreezeServices (#18177)
Calling unfreezeServices twice results in panic:

```
panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel
goroutine 14703 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
	github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e
panic({0x27c3020, 0x4c9b370})
	runtime/panic.go:884 +0x212
github.com/minio/minio/cmd.unfreezeServices()
	github.com/minio/minio/cmd/service.go:112 +0xc7
github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00)
	github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a
net/http.HandlerFunc.ServeHTTP(...)
```

If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic.

Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).
2023-10-06 07:51:50 -06:00
Poorna 9dc29d7687
Avoid ILM expiry on deleted versions that are yet to replicate (#18175)
Fixes #18167
2023-10-06 06:55:15 -06:00
Poorna 72871dbb9a
delete replication: avoid overwriting replication decision (#18174)
from ObjectInfo unless version purge status is present. Otherwise
there is potential to make incorrect replication decision if Stat
returned an error
2023-10-05 21:09:45 -06:00
Aditya Manthramurthy 4bda4e4e2b
fix: check for disk-level O_DIRECT support (#18173)
Disk level O_DIRECT support checking at xl storage initialization was
conditional on a config setting being enabled. (This never took effect
because config initialization happens after ObjectLayer is ready.) This
is not necessary as the config setting is dynamic - O_DIRECT should be
enabled via runtime config. So we need to do the disk level support
check regardless of the config setting.
2023-10-05 20:54:49 -06:00
Harshavardhana 1971c54a50
update buffer channels for both trace and listen events (#18171)
- Trace needs higher buffered channels than 4000 to ensure
  when we run `mc admin trace -a` it captures all information
  sufficiently.

- Listen event notification needs the event channel to be
  `apiRequestsMaxPerNode` * number of nodes
2023-10-05 18:16:04 -06:00
Anis Eleuch b336e9a79f
fix: loading usage cache to not fail early when reading the backup fails (#18158)
Currently, the retry is not fully used when there is no backup copy of
the data usage; use 5 retry attempts when we don't have any valid data, 
new or backup, unless we have seen an un-recognized error.
2023-10-02 19:22:35 -07:00
Harshavardhana a2ab21e91c
add max-keys=2 optimization for spark workloads (#18154)
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.

this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
2023-10-02 07:52:59 -06:00
Sveinn 603437e70f
Fix startup formatting (#18156)
Percentages in root user names are used for formatting.

Before:
```
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:51915 http://172.31.96.1:51915 http://127.0.0.1:51915
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
FORMAT: %117s MESSAGE: $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%!a(MISSING)vZb" "Q4#Q6y8G%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

After:

```
Status:         1 Online, 0 Offline.
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:52421 http://172.31.96.1:52421 http://127.0.0.1:52421
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

No need for special Windows case. `mc` works just fine.
2023-10-02 07:39:47 -06:00
Shireesh Anjal 6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Anis Eleuch 22d2dbc4e6
decom: Fix infinite retry when the decom is canceled (#18143)
Also, use rand.Float64() since it is thread-safe; otherwise go race
will complain.
2023-09-30 00:02:29 -07:00
Harshavardhana d6446cb096
do not return an error in AbortMultipartUpload() (#18135)
returning an error is a bit undefined in AWS S3
as it may return an error or not depending on the
time from AbortMultipartUpload().
2023-09-29 10:28:19 -07:00
Harshavardhana c34bdc33fb
make sure to set Versioned field to ensure rename2 is not called (#18141)
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.

Added healing code to ensure recovery of such content.
2023-09-29 09:08:24 -07:00
Anis Eleuch aec023f537
Avoid showing buckets without quorum in each pool (#18125) 2023-09-29 00:58:54 -07:00
Poorna e101eeeda9
fix: tier addition validation (#18136) 2023-09-28 22:33:24 -07:00
Harshavardhana 3c470a6b8b
fix: the inspect script to use scheme per deployment (#18118) 2023-09-27 08:22:50 -07:00
Poorna 6bc7d711b3
delete of a missing versionId return 204 (#18117) 2023-09-26 14:02:56 -07:00
Harshavardhana cdeab19673
fix: always check error upon w.Close() in Write() (#18111)
not checking w.Close() can prematurely make us
think that the w.Write() actually succeeded, apparently
Write() may or may not return an error but sometimes
only during a Close() call to the fd we may see the
error from Write() propagate.

Fdatasync(w) on the FD would return an error requiring
Close() error handling is less of a concern, however it may
happen such that fdatasync() did not return an error, where
as Close() would.
2023-09-26 11:04:00 -07:00
Anis Eleuch 22ee678136
tier: Avoid doing versioned operations since not required anymore (#18108)
Currently, setting a new tiering target returns an error when a bucket
is versioned and the tiering credentials does not have authorization to
specify a version-id when reading or removing a specific version;

Since tiering does not require versioning anymore; avoid doing versioned
operations when performing checklist ops while adding a new tiering
configuration.
2023-09-26 00:14:56 -07:00
Poorna 50a8f13e85
site replication: allow setting bandwidth default for bucket (#18062)
This can still be overridden at the bucket level
2023-09-25 15:50:52 -07:00
jiuker 6dec60b6e6
fix: check post policy like AWS S3 (#18074) 2023-09-25 12:35:25 -07:00
Harshavardhana ac3a19138a
fix: set scanning details locally to avoid cached values (#18092)
atomic variable results such as scanning must not use
cached values, instead rely on real-time information.
2023-09-25 08:26:29 -07:00
Klaus Post 21e8e071d7
Improve ListObject Compatibility (#18099)
Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after.

Fixes #18093
2023-09-25 08:13:08 -07:00
Klaus Post 57f84a8b4c
Add abandoned folder scanning to metrics (#18076)
Include object and versions heal scan times when checking non-empty abandoned folders.

Furthermore don't add delay between healing versions, instead do one per object wait.
2023-09-24 22:15:31 -07:00
Aditya Manthramurthy 22041bbcc4
fix: Update policy mapping properly in notification (#18088)
This is fixing a regression from an earlier change where STS account
loading was made lazy.
2023-09-22 20:47:50 -07:00
Harshavardhana 91ebac0a00
fix: move abandoned parts check after healing not in ILM path (#18087) 2023-09-22 12:07:52 -07:00
Harshavardhana 3a90fb108c only look for metadata if batch replication asks for metadata filters (#18082)
This PR changes the StatObject() to be must have for non-minio source
to being a conditional API call.

- Calls StatObject() when needed
- Calls GetObjectTagging() when needed

These calls if we do without these conditionals can cause a lot of
delays, so we avoid them if not needed in more common scenario.
2023-09-22 11:31:57 -07:00
Shubhendu 74cfb207c1
Added check for mandatory MINIO_KMS_KES_KEY_NAME env var (#18077)
If MinIO started with KMS enabled, MINIO_KMS_KES_KEY_NAME should
be set for server to start.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-21 10:37:37 -07:00
Harshavardhana 9788d85ea3
remove logging for invalid metadata values (#18068) 2023-09-20 15:49:55 -07:00
Anis Eleuch 69c0e18685
perf net: Add the endpoint name related to the perf net error (#18063)
In a perf test, one node will run speed test with all nodes. If there is
an error with a peer node, the peer node name is not included in the
error hence confusing the user.

This commit will add the peer endpoint string to the netperf error.
2023-09-19 22:41:06 -07:00
Aditya Manthramurthy 3cac927348
Load STS policy mappings periodically (#18061)
To ensure that policy mappings are current for service accounts
belonging to (non-derived) STS accounts (like an LDAP user's service
account) we periodically reload such mappings.

This is primarily to handle a case where a policy mapping update
notification is missed by a minio node. Such a node would continue to
have the stale mapping in memory because STS creds/mappings were never
periodically scanned from storage.
2023-09-19 17:57:42 -07:00
Harshavardhana 9081346c40 fix: more regressions listing policy mappings (#18060)
also relax ListServiceAccounts() returning error if
no service accounts exist.
2023-09-19 15:23:18 -07:00
Harshavardhana fcfadb0e51
fix: regression in loading LDAP users policy mappings (#18055)
LDAP users are stored as STS users, we need to load
their policy mappings appropriately.

Fixes a regression caused by #17994
2023-09-19 10:31:56 -07:00
Harshavardhana 2add57cfed
apply healing per object at 1024 cycles (#18050)
- we already have MRF for most recent failures
- we trigger healing during HEAD/GET operation

These are enough, also change the default max wait
from 5sec to 1sec for default scanner speed.
2023-09-19 09:24:22 -07:00
Poorna b73699fad8
replication: pass user tags while queueing (#18052)
Continues from #18032 - otherwise replication will fail on tag based rules.
2023-09-19 03:18:28 -07:00
Harshavardhana b8ebe54e53 Revert "skip tiered objects to GLACIER in batch replication (#18044)"
This reverts commit fd421ddd6f.

MinIO already provides `filter` based on metadata that would work
in this scenario already.
2023-09-19 00:05:40 -07:00
Harshavardhana c3d70e0795
cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051)
AccountInfo is quite frequently called by the Console UI 
login attempts, when many users are logging in it is important
that we provide them with better responsiveness.

- ListBuckets information is cached every second
- Bucket usage info is cached for up to 10 seconds
- Prefix usage (optional) info is cached for up to 10 secs

Failure to update after cache expiration, would still
allow login which would end up providing information
previously cached.

This allows for seamless responsiveness for the Console UI
logins, and overall responsiveness on a heavily loaded
system.
2023-09-18 22:13:03 -07:00
Harshavardhana fd421ddd6f
skip tiered objects to GLACIER in batch replication (#18044)
tiered objects to GLACIER are not readable until
they are restored, we skip these as unreadable
2023-09-18 10:25:31 -07:00
jiuker 9947c01c8e
feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Eng Zer Jun a00db4267c
data-usage-cache: remove redundant nil check (#17970)
From the Go specification:

  "3. If the map is nil, the number of iterations is 0." [1]

Therefore, an additional nil check for before the loop is unnecessary.

[1]: https://go.dev/ref/spec#For_range

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-09-16 19:09:29 -07:00
Harshavardhana 36385010f5
use optimized pathJoin instead of path.Join (#18042)
this avoids allocations in scanner routine, they are tiny but 
they allocate a lot over many cycles of the scanner.
2023-09-16 19:08:59 -07:00
Harshavardhana fa6d082bfd
reduce all major allocations in replication path (#18032)
- remove targetClient for passing around via replicationObjectInfo{}
- remove cloing to object info unnecessarily
- remove objectInfo from replicationObjectInfo{} (only require necessary fields)
2023-09-16 02:28:06 -07:00
Poorna b733e6e83c
site replication turn off retry login for admin API calls (#18039)
additionally also mark site offline if n/w is down
2023-09-15 18:01:47 -07:00
Anis Eleuch 37aa5934a1
scanner: Fix loading data usage cache structure (#18037)
Return an empty data usage cache structure when the data usage cache
file does not exist, otherwise, the scanner won't work.
2023-09-15 13:11:08 -07:00
Harshavardhana 1647fc7edc
fix: optimize listMultipartUploads to serve via local disks (#18034)
and remove unused getLoadBalancedDisks()
2023-09-15 08:34:03 -07:00
Harshavardhana 7b92687397
remove generating presignedURLs with range header for lambda (#18033) 2023-09-14 21:58:17 -07:00
Alex dc48cd841a
Added MINIO_PROMETHEUS_AUTH_TOKEN env support (#18028)
Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>
2023-09-14 17:28:21 -07:00
Anis Eleuch b0e1776d6d
Do not use a chain for S3 tiering to return better error messages (#18030)
When using a chain provider all providers do not return a valid
access and secret key, an anonymous request is sent, which makes it hard
for users to figure out what is going on

In the case of S3 tiering, when AWS IAM temporary account generation returns
an error, an anonymous login will be used because of the chain provider.
Avoid this and use the AWS IAM provider directly to get a good error
message.
2023-09-14 15:28:20 -07:00
Aditya Manthramurthy 7a7068ee47
Move IAM periodic ops to a single go routine (#18026)
This helps reduce disk operations as these periodic routines would not
run concurrently any more.

Also add expired STS purging periodic operation: Since we do not scan
the on-disk STS credentials (and instead only load them on-demand) a
separate routine is needed to purge expired credentials from storage.
Currently this runs about a quarter as often as IAM refresh.

Also fix a bug where with etcd, STS accounts could get loaded into the
iamUsersMap instead of the iamSTSAccountsMap.
2023-09-14 15:25:17 -07:00
Aditya Manthramurthy cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Harshavardhana a2aabfabd9
add backups for usage-caches to rely on upon error (#18029)
This allows scanner to avoid lengthy scans, skip
things appropriately and also not lose metrics in
any manner.

reduce longer deadlines for usage-cache loads/saves
to match the disk timeout which is 2minutes now per
IOP.
2023-09-14 11:53:52 -07:00
Harshavardhana 32890342ce
introduce MINIO_BROWSER_REDIRECT env to enable/disable auto-redirect (#18025) 2023-09-13 18:43:57 -07:00
Aditya Manthramurthy ed2c2a285f
Load STS accounts into IAM cache lazily (#17994)
In situations with large number of STS credentials on disk, IAM load
time is high. To mitigate this, STS accounts will now be loaded into
memory only on demand - i.e. when the credential is used.

In each IAM cache (re)load we skip loading STS credentials and STS
policy mappings into memory. Since STS accounts only expire and cannot
be deleted, there is no risk of invalid credentials being reused,
because credential validity is checked when it is used.
2023-09-13 12:43:46 -07:00
Poorna 18e23bafd9
replication resync: report only the on-disk status (#18017)
Avoid reporting in-memory status since results can vary if different
nodes are queried, resync always runs at a single node.
2023-09-13 10:58:38 -07:00
Harshavardhana 8b8be2695f
optimize mkdir calls to avoid base-dir `Mkdir` attempts (#18021)
Currently we have IOPs of these patterns

```
[OS] os.Mkdir play.min.io:9000 /disk1 2.718µs
[OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs
```

It can be seen that we can save quite Nx levels such as
if your drive is mounted at `/disk1/minio` you can simply
skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`.

Since they are expected to exist already, this PR adds a way
for us to ignore all paths upto the mount or a directory which
ever has been provided to MinIO setup.
2023-09-13 08:14:36 -07:00
Poorna 96fbf18201
replication: queue existing objects to same workers as incoming (#18020)
Previously existing objects were queued to single worker and MRF re-queues
are also handled by same worker - this does not fully use the available
bandwidth in case there is no incoming workload.
2023-09-12 21:59:15 -07:00
Harshavardhana c8a57a8fa2
fix: send content-md5 for AWS S3 proactively (#18018)
fixes #17977
2023-09-12 19:11:13 -07:00
Harshavardhana b1c2dacab3
fix: allow dynamic ports for API only in non-distributed setups (#18019)
fixes #17998
2023-09-12 19:10:49 -07:00
Harshavardhana 08b3a466e8
fix: allow concurrent SFTP connections (#18013)
current implementation did not fully implement
the concurrent SFTP connection implementation,
this PR properly handles this.

fixes #17914
2023-09-12 12:41:52 -07:00
Harshavardhana 1df5e31706
optimize MRF replication queue to avoid memory leaks (#18007) 2023-09-11 20:59:11 -07:00
Harshavardhana 9f7044aed0
fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Anis Eleuch 41de53996b
heal: calculate the number of workers based on NRRequests (#17945) 2023-09-11 14:48:54 -07:00
Harshavardhana 9878031cfd
fix: change DISK_ to DRIVE_ for some drive related envs (#18005) 2023-09-11 12:19:22 -07:00