Currently the status of a completed or failed batch is held in the
memory, a simple restart will lose the status and the user will not
have any visibility of the job that was long running.
In addition to the metrics, add a new API that reads the batch status
from the drives. A batch job will be cleaned up three days after
completion.
Also add the batch type in the batch id, the reason is that the batch
job request is removed immediately when the job is finished, then we
do not know the type of batch job anymore, hence a difficulty to locate
the job report
Hot load a policy document when during account authorization evaluation
to avoid returning 403 during server startup, when not all policies are
already loaded.
Add this support for group policies as well.
you can attempt a rebalance first i.e, start with 2 pools.
```
mc admin rebalance start alias/
```
and after that you can add a new pool, this would
potentially crash.
```
Jun 27 09:22:19 xxx minio[7828]: panic: runtime error: invalid memory address or nil pointer dereference
Jun 27 09:22:19 xxx minio[7828]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x22cc225]
Jun 27 09:22:19 xxx minio[7828]: goroutine 1 [running]:
Jun 27 09:22:19 xxx minio[7828]: github.com/minio/minio/cmd.(*erasureServerPools).findIndex(...)
```
without atomic load() it is possible that for
a slow receiver we would get into a hot-loop, when
logCh is full and there are many incoming callers.
to avoid this as a workaround enable BATCH_SIZE
greater than 100 to ensure that your slow receiver
receives data in bulk to avoid being throttled in
some manner.
this PR however fixes the unprotected access to
the current workers value.
specifying customer HTTP client makes the gcs SDK
ignore the passed credentials, instead let the GCS
SDK manage the transport.
this PR fixes#19922 a regression from #19565
Currently, bucket metadata is being loaded serially inside ListBuckets
Objet API. Fix that by loading the bucket metadata as the number of
erasure sets * 10, which is a good approximation.
When a reconnection happens, `handleMessages` must be able to complete and exit.
This can be prevented in a full queue.
Deadlock chain (May 10th release)
```
1 @ 0x44110e 0x453125 0x109f88c 0x109f7d5 0x10a472c 0x10a3f72 0x10a34ed 0x4795e1
# 0x109f88b github.com/minio/minio/internal/grid.(*Connection).send+0x3eb github.com/minio/minio/internal/grid/connection.go:548
# 0x109f7d4 github.com/minio/minio/internal/grid.(*Connection).queueMsg+0x334 github.com/minio/minio/internal/grid/connection.go:586
# 0x10a472b github.com/minio/minio/internal/grid.(*Connection).handleAckMux+0xab github.com/minio/minio/internal/grid/connection.go:1284
# 0x10a3f71 github.com/minio/minio/internal/grid.(*Connection).handleMsg+0x231 github.com/minio/minio/internal/grid/connection.go:1211
# 0x10a34ec github.com/minio/minio/internal/grid.(*Connection).handleMessages.func1+0x6cc github.com/minio/minio/internal/grid/connection.go:1019
---> blocks ---> via (Connection).handleMsgWg
1 @ 0x44110e 0x454165 0x454134 0x475325 0x486b08 0x10a161a 0x10a1465 0x2470e67 0x7395a9 0x20e61af 0x20e5f1f 0x7395a9 0x22f781c 0x7395a9 0x22f89a5 0x7395a9 0x22f6e82 0x7395a9 0x22f49a2 0x7395a9 0x2206e45 0x7395a9 0x22f4d9c 0x7395a9 0x210ba06 0x7395a9 0x23089c2 0x7395a9 0x22f86e9 0x7395a9 0xd42582 0x2106c04
# 0x475324 sync.runtime_Semacquire+0x24 runtime/sema.go:62
# 0x486b07 sync.(*WaitGroup).Wait+0x47 sync/waitgroup.go:116
# 0x10a1619 github.com/minio/minio/internal/grid.(*Connection).reconnected+0xb9 github.com/minio/minio/internal/grid/connection.go:857
# 0x10a1464 github.com/minio/minio/internal/grid.(*Connection).handleIncoming+0x384 github.com/minio/minio/internal/grid/connection.go:825
```
Add a queue cleaner in reconnected that will pop old messages so `handleMessages` can
send messages without blocking and exit appropriately for the connection to be re-established.
Messages are likely dropped by the remote, but we may have some that can succeed,
so we only drop when running out of space.
Add the inlined data as base64 encoded field and try to add a string version if feasible.
Example:
```
λ xl-meta -data xl.meta
{
"8e03504e-1123-4957-b272-7bc53eda0d55": {
"bitrot_valid": true,
"bytes": 58,
"data_base64": "Z29sYW5nLm9yZy94L3N5cyB2MC4xNS4wIC8=",
"data_string": "golang.org/x/sys v0.15.0 /"
}
```
The string will have quotes, newlines escaped to produce valid JSON.
If content isn't valid utf8 or the encoding otherwise fails, only the base64 data will be added.
`-export` can still be used separately to extract the data as files (including bitrot).
* Prevents blocking when losing quorum (standard on cluster restarts).
* Time out to prevent endless buildup. Timed-out remote locks will be canceled because they miss the refresh anyway.
* Reduces latency for all calls since the wall time for the roundtrip to remotes no longer adds to the requests.
since the connection is active, the
response recorder body can grow endlessly
causing leak, as this bytes buffer is
never given back to GC due to an goroutine.
skip healing properly in scanner when drive is hotplugged
due to how the state is passed around the SkipHealing
might not be the true state() of the system always, causing
a situation where we might healing from the scanner on the
same drive which is being. Due to this competing heals get
triggered that slow each other down.