Commit Graph

97 Commits

Author SHA1 Message Date
Klaus Post
51aa59a737
perf: websocket grid connectivity for all internode communication (#18461)
This PR adds a WebSocket grid feature that allows servers to communicate via 
a single two-way connection.

There are two request types:

* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
  roundtrips with small payloads.

* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
  which allows for different combinations of full two-way streams with an initial payload.

Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.

Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.

If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte 
slices or use a higher-level generics abstraction.

There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.

The request path can be changed to a new one for any protocol changes.

First, all servers create a "Manager." The manager must know its address 
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.

```
func (m *Manager) Connection(host string) *Connection
```

All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight 
requests and responses must also be given for streaming requests.

The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.

* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
   performs a single request and returns the result. Any deadline provided on the request is
   forwarded to the server, and canceling the context will make the function return at once.

* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
   will initiate a remote call and send the initial payload.

```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
	// Responses from the remote server.
	// Channel will be closed after an error or when the remote closes.
	// All responses *must* be read by the caller until either an error is returned or the channel is closed.
	// Canceling the context will cause the context cancellation error to be returned.
	Responses <-chan Response

	// Requests sent to the server.
	// If the handler is defined with 0 incoming capacity this will be nil.
	// Channel *must* be closed to signal the end of the stream.
	// If the request context is canceled, the stream will no longer process requests.
	Requests chan<- []byte
}

type Response struct {
	Msg []byte
	Err error
}
```

There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.
2023-11-20 17:09:35 -08:00
Anis Eleuch
12f570a307
audit: Try to send audit even if the status is offline (#18458)
Currently, once the audit becomes offline, there is no code that tries
to reconnect to the audit, at the same time Send() quickly returns with
an error without really trying to send a message the audit endpoint; so
the audit endpoint will never be online again.

Fixing this behavior; the current downside is that we miss printing some
logs when the audit becomes offline; however this information is
available in prometheus

Later, we can refactor internal/logger so the http endpoint can send errors to
console target.
2023-11-17 10:40:28 -08:00
Anis Eleuch
6ef8e87492
Support case insensitive kafka SASL mechanism config values (#18398) 2023-11-08 20:04:01 -08:00
Shubhendu
5b9656374c
Error if target went offline (#18221)
If target went offline while MinIO was down, error once
while trying to send message. If target goes offline during
MinIO server running, it already comes through ping() call
and errors out if target offline.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-10-12 06:13:57 -07:00
Praveen raj Mani
c27d0583d4
Send kafka notification messages in batches when queue_dir is enabled (#18164)
Fixes #18124
2023-10-07 08:07:38 -07:00
Sveinn
603437e70f
Fix startup formatting (#18156)
Percentages in root user names are used for formatting.

Before:
```
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:51915 http://172.31.96.1:51915 http://127.0.0.1:51915
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
FORMAT: %117s MESSAGE: $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%!a(MISSING)vZb" "Q4#Q6y8G%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

After:

```
Status:         1 Online, 0 Offline.
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:52421 http://172.31.96.1:52421 http://127.0.0.1:52421
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

No need for special Windows case. `mc` works just fine.
2023-10-02 07:39:47 -06:00
Shubhendu
10d5dd3a67
fix: a regression with audit log sending (#18112)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-26 12:23:02 -07:00
Anis Eleuch
4eeb48f8e0
Return cached online/offline status for audit/http loggers (#18083)
To avoid having delays in prometheus scrape and in 'mc admin info' command.
2023-09-21 16:58:24 -07:00
Harshavardhana
1472875670
fix: failed messages counting in audit_http metrics (#18075)
all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.

Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
2023-09-21 11:24:56 -07:00
Aditya Manthramurthy
1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Anis Eleuch
6a8d8f34a5
kafka: Do not require key when sending a message (#17962)
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;

This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
2023-09-01 08:37:22 -07:00
Harshavardhana
07b1281046 add queue_dir to help message for logger/audit targets 2023-08-29 16:07:35 -07:00
Harshavardhana
adb8be069e
tune-kafka targets to ensure timeout triggers on hung brokers (#17898)
hung brokers can cause slowness to the entire system
when many callers are hung, leading to large goroutine
build-up.
2023-08-22 20:26:35 -07:00
Harshavardhana
3a0125fa1f
remove unexpected logging from peer calls (#17888)
also make sure RequestID is set for system logs
2023-08-21 14:25:24 -07:00
Harshavardhana
11dfc817f3
do not log client canceled events (#17838) 2023-08-17 14:53:43 -07:00
Praveen raj Mani
0285df5a02
fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Anis Eleuch
9c0e8cd15b
logger: Avoid slow calls in http logger Send() function (#17747)
Send() is synchronous and can affect the latency of S3 requests when the
logger buffer is full.

Avoid checking if the HTTP target is online or not and increase the
workers anyway since the buffer is already full.

Also, avoid logs flooding when the audit target is down.
2023-07-29 12:49:18 -07:00
Aditya Manthramurthy
f3248a4b37
Redact all secrets from config viewing APIs (#17380)
This change adds a `Secret` property to `HelpKV` to identify secrets
like passwords and auth tokens that should not be revealed by the server
in its configuration fetching APIs. Configuration reporting APIs now do
not return secrets.
2023-06-23 07:45:27 -07:00
Aditya Manthramurthy
5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Harshavardhana
dbd4c2425e
fix: kafka broker pings must not be greater than 1sec (#17376) 2023-06-07 11:47:00 -07:00
Krishnan Parthasarathi
55a3310446
logger-http: Don't retry after a succesful send (#17266) 2023-05-22 14:53:18 -07:00
jiuker
41fa8fa2d2
fix: increment counter when entry be skipped (#17237) 2023-05-19 08:36:52 -07:00
Praveen raj Mani
85912985b6
Check for only network errors in audit webhook for reachability (#17228) 2023-05-17 11:10:33 -07:00
Klaus Post
99c4ffa34f
fix: avoid audit log race protection deadlocks (#17168) 2023-05-09 08:11:32 -07:00
Praveen raj Mani
57acacd5a7
Support persistent queue store for loggers (#17121) 2023-05-08 21:20:31 -07:00
jiuker
6e27264c6b
update cleanupRoutine comment (#17102) 2023-04-28 01:11:51 -07:00
Anis Eleuch
5c83c9724f
audit: Add request path and host to audit event (#17099) 2023-04-27 22:18:24 -07:00
jiuker
b28d391a22
fix: add correct worker count before startHTTPLogger() (#17091) 2023-04-27 10:51:16 -07:00
Harshavardhana
8a9b9832fd
add Dial timeout for Kafka broker pings (#17044) 2023-04-17 15:45:01 -07:00
Harshavardhana
a5835cecbf
fix: regression in counting total requests (#17024) 2023-04-12 14:37:19 -07:00
Anis Eleuch
d90d0c8931
Use one http response recorder per external http call (#16938) 2023-03-31 09:37:29 -07:00
Klaus Post
11d04279c8
Add lazy init of audit logger (#16842) 2023-03-21 10:50:40 -07:00
Harshavardhana
3b5dbf9046
allow bootstrapping to validate internode tokens (#16853) 2023-03-20 01:40:24 -07:00
Harshavardhana
46f9049fb4
simplify error responses for KMS (#16793) 2023-03-16 11:59:42 -07:00
Nitish Tiwari
50dbd2cacc
Update audit log flow to use new headers with unit (#16797) 2023-03-13 22:50:19 -07:00
ferhat elmas
714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Daniel Valdivia
fb17f97cf3
move audit and logger message structure to minio/pkg (#16655)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-02-21 21:21:17 -08:00
Shubhendu
6b65ba1551
Added attribute proxy for mc admin config set ALIAS logger_webhook (#16657)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-02-21 21:19:46 -08:00
Anis Elleuch
fadc46b906
Add the access key and parent user in the audit log (#16572) 2023-02-08 11:05:26 -08:00
Harshavardhana
aa8b9572b9
remove double ENABLED help output (#16528) 2023-02-03 05:52:52 -08:00
Harshavardhana
b67d97b1ba
add missing fields in audit logs for non-compressed handlers (#16328) 2022-12-30 10:20:19 -08:00
Anis Elleuch
939c0100a6
log: Do not interpret verbs in object names in console output (#16233) 2022-12-13 08:27:40 -08:00
Aditya Manthramurthy
a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Anis Elleuch
1f1dcdce65
move HTTP recorder to an internal library (#16128) 2022-11-28 10:20:27 -08:00
Shireesh Anjal
98a67a3776
Improvements in logger and audit webhooks (#16102) 2022-11-28 08:03:26 -08:00
jiuker
bf89f79694
save deploymentID to avoid mutating request entry in Audit (#16053) 2022-11-11 12:42:15 -08:00
Klaus Post
5b242f1d11
Add Audit target metrics (#16044) 2022-11-10 10:20:21 -08:00
Klaus Post
ddeca9f12a
fix: filter rest errors and logs returned (#16019) 2022-11-07 10:38:08 -08:00
Harshavardhana
23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Harshavardhana
f696a221af
allow tagging policy condition for GetObject (#15777) 2022-10-02 12:29:29 -07:00