configure batch size to send audit/logger events
in batches instead of sending one event per connection.
this is mainly to optimize the number of requests
we make to webhook endpoint.
simplify audit webhook worker model
fixes couple of bugs like
- ping(ctx) was creating a logger without updating
number of workers leading to incorrect nWorkers
scaling, causing an additional worker that is not
tracked properly.
- h.logCh <- entry could potentially hang for when
the queue is full on heavily loaded systems.
there can be a sudden spike in tiny allocations,
due to too much auditing being done, also don't hang
on the
```
h.logCh <- entry
```
after initializing workers if you do not have a way to
dequeue for some reason.
Currently, once the audit becomes offline, there is no code that tries
to reconnect to the audit, at the same time Send() quickly returns with
an error without really trying to send a message the audit endpoint; so
the audit endpoint will never be online again.
Fixing this behavior; the current downside is that we miss printing some
logs when the audit becomes offline; however this information is
available in prometheus
Later, we can refactor internal/logger so the http endpoint can send errors to
console target.
If target went offline while MinIO was down, error once
while trying to send message. If target goes offline during
MinIO server running, it already comes through ping() call
and errors out if target offline.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.
Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
Send() is synchronous and can affect the latency of S3 requests when the
logger buffer is full.
Avoid checking if the HTTP target is online or not and increase the
workers anyway since the buffer is already full.
Also, avoid logs flooding when the audit target is down.
fixes#15334
- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
Fix `panic: "POST /minio/peer/v21/signalservice?signal=2": sync: WaitGroup is reused before previous Wait has returned`
Log entries already on the channel would cause `logEntry` to increment the
waitgroup when sending messages, after Cancel has been called.
Instead of tracking every single message, just check the send goroutine. Faster
and safe, since it will not decrement until the channel is closed.
Regression from #14289
Also log all the missed events and logs instead of silently
swallowing the events.
Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`