It is possible delete marker was received on old pool as decom
move in progress, this PR allows decom retry to ensure these
delete markers are moved to new pool so that decommission can
be completed.
Fixes#20819
Add profiling potential crash wourkaround
Using admin traces could potentially crash the server (or handler more likely) due to upstream divide by 0: https://github.com/felixge/fgprof/pull/34
Ensure the profile always runs 100ms before stopping, so sample count isn't 0 (default sample rate ~10ms/sample, but allow for cpu starvation)
If one object has many parts where all parts are readable but some parts
are missing from some drives, this object can be sometimes un-healable,
which is wrong.
This commit will avoid reading from drives that have missing, corrupted or
outdated xl.meta. It will also check if any part is unreadable to avoid
healing in that case.
we do not need to hold the read locks at the higher
layer instead before reading the body, instead hold
the read locks properly at the time of renamePart()
for protection from racy part overwrites to compete
with concurrent completeMultipart().
CheckParts call can take time to verify 10k parts of a single object in a single drive.
To avoid an internal dealine of one minute in the single handler RPC, this commit will
switch to streaming RPC instead.
This API had missing permissions checking, allowing a user to change
their policy mapping by:
1. Craft iam-info.zip file: Update own user permission in
user_mappings.json
2. Upload it via `mc admin cluster iam import nobody iam-info.zip`
Here `nobody` can be a user with pretty much any kind of permission (but
not anonymous) and this ends up working.
Some more detailed steps - start from a fresh setup:
```
./minio server /tmp/d{1...4} &
mc alias set myminio http://localhost:9000 minioadmin minioadmin
mc admin user add myminio nobody nobody123
mc admin policy attach myminio readwrite nobody nobody123
mc alias set nobody http://localhost:9000 nobody nobody123
mc admin cluster iam export myminio
mkdir /tmp/x && mv myminio-iam-info.zip /tmp/x
cd /tmp/x
unzip myminio-iam-info.zip
echo '{"nobody":{"version":1,"policy":"consoleAdmin","updatedAt":"2024-08-13T19:47:10.1Z"}}' > \
iam-assets/user_mappings.json
zip -r myminio-iam-info-updated.zip iam-assets/
mc admin cluster iam import nobody ./myminio-iam-info-updated.zip
mc admin service restart nobody
```
`mc admin heal ALIAS/bucket/object` does not have any flag to heal
object noncurrent versions, this commit will make healing of the object
noncurrent versions implicitly asked.
This also fixes the 'mc admin heal ALIAS/bucket/object' that does not work
correctly when the bucket is versioned. This has been broken since Apr 2023.
Golang http.Server will call SetReadDeadline overwriting the previous
deadline configuration set after a new connection Accept in the custom
listener code. Therefore, --idle-timeout was not correctly respected.
Make http.Server read/write timeout similar to --idle-timeout.
The code assigns corrupted state to a drive for any unexpected error,
which is confusing for users. This change will make sure to assign
corrupted state only for corrupted parts or xl.meta. Use unknown state
with a explanation for any unexpected error, like canceled, deadline
errors, drive timeout, ...
Also make sure to return the bucket/object name when the object is not
found or marked not found by the heal dangling code.
* Add the policy name to the audit log tags when doing policy-based API calls
* Audit log the retention settings requested in the API call
* Audit log of retention on PutObjectRetention API path too
The experimental functions are now available in the standard library in
Go 1.23 [1].
[1]: https://go.dev/doc/go1.23#new-unique-package
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
The RemoveUser API only removes internal users, and it reports success
when it didnt find the internal user account for deletion. When provided
with a service account, it should not report success as that is misleading.
Update github.com/cosnicolaou/pbzip2 to latest version for
significant performance improvements. This update brings a 45%
reduction in processing time.
Previously, not setting http.Config.HTTPTimeout for logger webhook
was resulting in a timeout of 0, and causing "deadline exceeded"
errors in log webhook.
This change introduces a new env variable for configuring log webhook
timeout and more importantly sets it when the config is initialised.
Manual heal can return XMinioHealInvalidClientToken if the manual
healing is started in the first node, and the next mc call to get the
heal status is landed on another node. The reason is that redirection
based on the token ID is not able to redirect requests to the first node
due to a typo.
This also affects the batch cancel command if the batch is being done in
the first node, the user will never be able to cancel it due to the same
bug.
HTTP likes to slap an infinite read deadline on a connection and
do a blocking read while the response is being written.
This effectively means that a reading deadline becomes the
request-response deadline.
Instead of enforcing our timeout, we pass it through and keep
"infinite deadline" is sticky on connections.
However, we still "record" when reads are aborted, so we never overwrite that.
The HTTP server should have `ReadTimeout` and `IdleTimeout` set for the deadline to be effective.
Use --idle-timeout for incoming connections.
Since DeadlineConn would send deadline updates directly upstream,
it would race with Read/Write operations. The stdlib will perform a read,
but do an async SetReadDeadLine(unix(1)) to cancel the Read in
`abortPendingRead`. In this case, the Read may override the
deadline intended to cancel the read.
Stop updating deadlines if a deadline in the past is seen and when Close is called.
A mutex now protects all upstream deadline calls to avoid races.
This should fix the short-term buildup of...
```
365 @ 0x44112e 0x4756b9 0x475699 0x483525 0x732286 0x737407 0x73816b 0x479601
# 0x475698 sync.runtime_notifyListWait+0x138 runtime/sema.go:569
# 0x483524 sync.(*Cond).Wait+0x84 sync/cond.go:70
# 0x732285 net/http.(*connReader).abortPendingRead+0xa5 net/http/server.go:729
# 0x737406 net/http.(*response).finishRequest+0x86 net/http/server.go:1676
# 0x73816a net/http.(*conn).serve+0x62a net/http/server.go:2050
```
AFAICT Only affects internode calls that create a connection (non-grid).