Commit Graph

49 Commits

Author SHA1 Message Date
Praveen raj Mani
c905d3fe21 fix: Re-use TCP connections for Kafka dials (#18860)
Fixes #18857
2024-01-24 13:10:52 -08:00
Harshavardhana
dd2542e96c add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
Klaus Post
51aa59a737 perf: websocket grid connectivity for all internode communication (#18461)
This PR adds a WebSocket grid feature that allows servers to communicate via 
a single two-way connection.

There are two request types:

* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
  roundtrips with small payloads.

* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
  which allows for different combinations of full two-way streams with an initial payload.

Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.

Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.

If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte 
slices or use a higher-level generics abstraction.

There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.

The request path can be changed to a new one for any protocol changes.

First, all servers create a "Manager." The manager must know its address 
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.

```
func (m *Manager) Connection(host string) *Connection
```

All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight 
requests and responses must also be given for streaming requests.

The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.

* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
   performs a single request and returns the result. Any deadline provided on the request is
   forwarded to the server, and canceling the context will make the function return at once.

* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
   will initiate a remote call and send the initial payload.

```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
	// Responses from the remote server.
	// Channel will be closed after an error or when the remote closes.
	// All responses *must* be read by the caller until either an error is returned or the channel is closed.
	// Canceling the context will cause the context cancellation error to be returned.
	Responses <-chan Response

	// Requests sent to the server.
	// If the handler is defined with 0 incoming capacity this will be nil.
	// Channel *must* be closed to signal the end of the stream.
	// If the request context is canceled, the stream will no longer process requests.
	Requests chan<- []byte
}

type Response struct {
	Msg []byte
	Err error
}
```

There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.
2023-11-20 17:09:35 -08:00
Anis Eleuch
12f570a307 audit: Try to send audit even if the status is offline (#18458)
Currently, once the audit becomes offline, there is no code that tries
to reconnect to the audit, at the same time Send() quickly returns with
an error without really trying to send a message the audit endpoint; so
the audit endpoint will never be online again.

Fixing this behavior; the current downside is that we miss printing some
logs when the audit becomes offline; however this information is
available in prometheus

Later, we can refactor internal/logger so the http endpoint can send errors to
console target.
2023-11-17 10:40:28 -08:00
Anis Eleuch
6ef8e87492 Support case insensitive kafka SASL mechanism config values (#18398) 2023-11-08 20:04:01 -08:00
Shubhendu
5b9656374c Error if target went offline (#18221)
If target went offline while MinIO was down, error once
while trying to send message. If target goes offline during
MinIO server running, it already comes through ping() call
and errors out if target offline.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-10-12 06:13:57 -07:00
Praveen raj Mani
c27d0583d4 Send kafka notification messages in batches when queue_dir is enabled (#18164)
Fixes #18124
2023-10-07 08:07:38 -07:00
Shubhendu
10d5dd3a67 fix: a regression with audit log sending (#18112)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-26 12:23:02 -07:00
Anis Eleuch
4eeb48f8e0 Return cached online/offline status for audit/http loggers (#18083)
To avoid having delays in prometheus scrape and in 'mc admin info' command.
2023-09-21 16:58:24 -07:00
Harshavardhana
1472875670 fix: failed messages counting in audit_http metrics (#18075)
all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.

Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
2023-09-21 11:24:56 -07:00
Aditya Manthramurthy
1c99fb106c Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Anis Eleuch
6a8d8f34a5 kafka: Do not require key when sending a message (#17962)
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;

This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
2023-09-01 08:37:22 -07:00
Harshavardhana
adb8be069e tune-kafka targets to ensure timeout triggers on hung brokers (#17898)
hung brokers can cause slowness to the entire system
when many callers are hung, leading to large goroutine
build-up.
2023-08-22 20:26:35 -07:00
Harshavardhana
11dfc817f3 do not log client canceled events (#17838) 2023-08-17 14:53:43 -07:00
Praveen raj Mani
0285df5a02 fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Anis Eleuch
9c0e8cd15b logger: Avoid slow calls in http logger Send() function (#17747)
Send() is synchronous and can affect the latency of S3 requests when the
logger buffer is full.

Avoid checking if the HTTP target is online or not and increase the
workers anyway since the buffer is already full.

Also, avoid logs flooding when the audit target is down.
2023-07-29 12:49:18 -07:00
Harshavardhana
dbd4c2425e fix: kafka broker pings must not be greater than 1sec (#17376) 2023-06-07 11:47:00 -07:00
Krishnan Parthasarathi
55a3310446 logger-http: Don't retry after a succesful send (#17266) 2023-05-22 14:53:18 -07:00
jiuker
41fa8fa2d2 fix: increment counter when entry be skipped (#17237) 2023-05-19 08:36:52 -07:00
Praveen raj Mani
85912985b6 Check for only network errors in audit webhook for reachability (#17228) 2023-05-17 11:10:33 -07:00
Praveen raj Mani
57acacd5a7 Support persistent queue store for loggers (#17121) 2023-05-08 21:20:31 -07:00
jiuker
b28d391a22 fix: add correct worker count before startHTTPLogger() (#17091) 2023-04-27 10:51:16 -07:00
Harshavardhana
8a9b9832fd add Dial timeout for Kafka broker pings (#17044) 2023-04-17 15:45:01 -07:00
Klaus Post
11d04279c8 Add lazy init of audit logger (#16842) 2023-03-21 10:50:40 -07:00
ferhat elmas
714283fae2 cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Daniel Valdivia
fb17f97cf3 move audit and logger message structure to minio/pkg (#16655)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-02-21 21:21:17 -08:00
Shubhendu
6b65ba1551 Added attribute proxy for mc admin config set ALIAS logger_webhook (#16657)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-02-21 21:19:46 -08:00
Anis Elleuch
939c0100a6 log: Do not interpret verbs in object names in console output (#16233) 2022-12-13 08:27:40 -08:00
Shireesh Anjal
98a67a3776 Improvements in logger and audit webhooks (#16102) 2022-11-28 08:03:26 -08:00
Klaus Post
5b242f1d11 Add Audit target metrics (#16044) 2022-11-10 10:20:21 -08:00
Harshavardhana
5e763b71dc use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Harshavardhana
32b2f6117e fix: do not pass around sync.Map (#15250)
it is not safe to pass around sync.Map
through pointers, as it may be concurrently
updated by different callers.

this PR simplifies by avoiding sync.Map
altogether, we do not need sync.Map
to keep object->erasureMap association.

This PR fixes a crash when concurrently
using this value when audit logs are
configured.

```
fatal error: concurrent map iteration and map write

goroutine 247651580 [running]:
runtime.throw({0x277a6c1?, 0xc002381400?})
        runtime/panic.go:992 +0x71 fp=0xc004d29b20 sp=0xc004d29af0 pc=0x438671
runtime.mapiternext(0xc0d6e87f18?)
        runtime/map.go:871 +0x4eb fp=0xc004d29b90 sp=0xc004d29b20 pc=0x41002b
```
2022-07-07 17:04:25 -07:00
Klaus Post
ac055b09e9 Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
Harshavardhana
9d07cde385 use crypto/sha256 only for FIPS 140-2 compliance (#14983)
It would seem like the PR #11623 had chewed more
than it wanted to, non-fips build shouldn't really
be forced to use slower crypto/sha256 even for
presumed "non-performance" codepaths. In MinIO
there are really no "non-performance" codepaths.
This assumption seems to have had an adverse
effect in certain areas of CPU usage.

This PR ensures that we stick to sha256-simd
on all non-FIPS builds, our most common build
to ensure we get the best out of the CPU at
any given point in time.
2022-05-27 06:00:19 -07:00
Anis Elleuch
e952e2a691 audit/kafka: Fix quitting early after first logging (#14932)
A recent commit created some regressions:
- Kafka/Audit goroutines quit when the first log is sent
- Missing doneCh initialization in Kafka audit
2022-05-17 07:43:25 -07:00
Harshavardhana
040ac5cad8 fix: when logger queue is full exit quickly upon doneCh (#14928)
Additionally only reload requested sub-system not everything
2022-05-16 16:10:51 -07:00
Klaus Post
472c2d828c Fix waitgroup add after wait on config reload (#14584)
Fix `panic: "POST /minio/peer/v21/signalservice?signal=2": sync: WaitGroup is reused before previous Wait has returned`

Log entries already on the channel would cause `logEntry` to increment the
 waitgroup when sending messages, after Cancel has been called.

Instead of tracking every single message, just check the send goroutine. Faster 
and safe, since it will not decrement until the channel is closed.

Regression from #14289
2022-03-19 09:15:45 -07:00
Shireesh Anjal
3934700a08 Make audit webhook and kafka config dynamic (#14390) 2022-02-24 09:05:33 -08:00
Shireesh Anjal
25144fedd5 Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Shireesh Anjal
28f188e3ef Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Harshavardhana
a60ac7ca17 fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana
f527c708f2 run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
9ad6012782 simplify logger time and avoid possible crashes (#13986)
time.Format() is not necessary prematurely for JSON
marshalling, since JSON marshalling indeed defaults
to RFC3339Nano.

This also ensures the 'time' is remembered until its
logged and it is the same time when the 'caller'
invoked 'log' functions.
2021-12-23 15:33:54 -08:00
Harshavardhana
499872f31d Add configurable channel queue_size for audit/logger webhook targets (#13819)
Also log all the missed events and logs instead of silently
swallowing the events.

Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
2021-12-20 13:16:53 -08:00
Harshavardhana
fee3f88cb5 use acceptedResponseStatusCode everywhere in HTTP logger (#13755) 2021-11-24 13:53:11 -08:00
Harshavardhana
661b263e77 add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
ArthurMa
2807c11410 http hook should accept more than 200 statusCode (#13180)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-09-10 14:27:37 -07:00
Harshavardhana
e316873f84 feat: Add support for kakfa audit logger target (#12678) 2021-07-13 09:39:13 -07:00
Harshavardhana
1f262daf6f rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00