847 Commits

Author SHA1 Message Date
Klaus Post
e8a476ef5a
Keep larger merge buffers for RPC (#20654)
Keep larger merge buffers

When sending large messages >1K, the merge buffer would continuously be reallocated.

This could happen on listings, where blocks are typically 4->8K.

Keep merge buffer of up to 256KB.

Benchmark with 4096b messages:

```
benchmark                                          old ns/op     new ns/op     delta
BenchmarkRequests/servers=2/bytes/par=32-32        8271          6360          -23.10%
BenchmarkRequests/servers=2/bytes/par=64-32        7840          4731          -39.66%
BenchmarkRequests/servers=2/bytes/par=128-32       7291          4740          -34.99%
BenchmarkRequests/servers=2/bytes/par=256-32       7095          4580          -35.45%
BenchmarkRequests/servers=2/bytes/par=512-32       6757          4584          -32.16%
BenchmarkRequests/servers=2/bytes/par=1024-32      6429          4453          -30.74%

benchmark                                          old bytes     new bytes     delta
BenchmarkRequests/servers=2/bytes/par=32-32        12090         821           -93.21%
BenchmarkRequests/servers=2/bytes/par=64-32        17423         820           -95.29%
BenchmarkRequests/servers=2/bytes/par=128-32       18493         822           -95.56%
BenchmarkRequests/servers=2/bytes/par=256-32       18892         821           -95.65%
BenchmarkRequests/servers=2/bytes/par=512-32       19064         826           -95.67%
BenchmarkRequests/servers=2/bytes/par=1024-32      19038         842           -95.58%
```
2024-11-16 09:18:48 -08:00
Krutika Dhananjay
267f0ecea2
Fix 0 httpTimeout for logger webhook (#20653)
Previously, not setting http.Config.HTTPTimeout for logger webhook
was resulting in a timeout of 0, and causing "deadline exceeded"
errors in log webhook.

This change introduces a new env variable for configuring log webhook
timeout and more importantly sets it when the config is initialised.
2024-11-16 02:01:36 -08:00
Klaus Post
b5177993b3
Make DeadlineConn http.Listener compatible (#20635)
HTTP likes to slap an infinite read deadline on a connection and 
do a blocking read while the response is being written.

This effectively means that a reading deadline becomes the 
request-response deadline.

Instead of enforcing our timeout, we pass it through and keep 
"infinite deadline" is sticky on connections.

However, we still "record" when reads are aborted, so we never overwrite that.

The HTTP server should have `ReadTimeout` and `IdleTimeout` set for the deadline to be effective.

Use --idle-timeout for incoming connections.
2024-11-12 12:41:41 -08:00
Klaus Post
55f5c18fd9
Harden internode DeadlineConn (#20631)
Since DeadlineConn would send deadline updates directly upstream,
it would race with Read/Write operations. The stdlib will perform a read, 
but do an async SetReadDeadLine(unix(1)) to cancel the Read in 
`abortPendingRead`. In this case, the Read may override the 
deadline intended to cancel the read.

Stop updating deadlines if a deadline in the past is seen and when Close is called. 
A mutex now protects all upstream deadline calls to avoid races. 

This should fix the short-term buildup of...

```
365 @ 0x44112e 0x4756b9 0x475699 0x483525 0x732286 0x737407 0x73816b 0x479601
#	0x475698	sync.runtime_notifyListWait+0x138		runtime/sema.go:569
#	0x483524	sync.(*Cond).Wait+0x84				sync/cond.go:70
#	0x732285	net/http.(*connReader).abortPendingRead+0xa5	net/http/server.go:729
#	0x737406	net/http.(*response).finishRequest+0x86		net/http/server.go:1676
#	0x73816a	net/http.(*conn).serve+0x62a			net/http/server.go:2050
```

AFAICT Only affects internode calls that create a connection (non-grid).
2024-11-11 09:15:17 -08:00
Klaus Post
4972735507
Fix lint issues from v1.62.0 upgrade (#20633)
* Fix lint issues from v1.62.0 upgrade

* Fix xlMetaV2TrimData version checks.
2024-11-11 06:51:43 -08:00
Harshavardhana
cefc43e4da simplify the Get()/GetMultiple() re-use GetRaw() for both (#179)
Remember GetMultiple() must be used if your target is calling
PutMultiple(), without that the multiple events will not be
replayed.
2024-11-06 16:52:20 -08:00
Ramon de Klein
25e34fda5f
decompress audit log properly before sending to remote target (#20619) 2024-11-06 13:25:24 -08:00
Klaus Post
8d42f37e4b
Fix msgUnPath crash (#20614)
These are needed checks for the functions to be un-crashable with any input 
given to `msgUnPath` (tested with fuzzing).

Both conditions would result in a crash, which prevents that. Some 
additional upstream checks are needed.

Fixes #20610
2024-11-05 04:37:59 -08:00
Harshavardhana
1615920f48 fix typos reported in CI/CD 2024-11-04 11:06:02 -08:00
Harshavardhana
7f1e1713ab
use absolute path for binary checksum verification (#20487) 2024-09-26 08:03:08 -07:00
Anis Eleuch
2b0156b1fc Add TTFB to all APIs and enable for responses without body (#20479)
Add TTFB for all requests in metrics-v3 in addition to the existing
GetObject. Also for the requests that do not return a body in the
response, calculate TTFB as the HTTP status code and the headers are
sent.
2024-09-24 10:13:00 -07:00
Klaus Post
974cbb3bb7
Limit jstream parse depth (#20474)
Add https://github.com/bcicen/jstream/pull/15 by vendoring the package.

Sets JSON depth limit to 100 entries in S3 Select.
2024-09-23 12:35:41 -07:00
Harshavardhana
03e996320e
upgrade deps pkg/v3, madmin-go/v3 and lz4/v4 (#20467) 2024-09-21 17:33:43 -07:00
Ramon de Klein
3d152015eb
Use MinIO console v1.7.1 (#20465) 2024-09-20 18:18:54 -07:00
jiuker
ade8925155
fix: add default http timeout for audit webhook (#20460) 2024-09-20 09:02:50 -07:00
Klaus Post
05a6c170bf
Fix PutObject Trailing checksum (#20456)
PutObject would verify trailing checksums, but not store them.

Fixes #20455
2024-09-19 05:59:07 -07:00
Shubhendu
5bd27346ac
Added iam import tests for openid (#20432)
Tests if imported service accounts have 
required access to buckets and objects.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>

Co-authored-by: Harshavardhana <harsha@minio.io>
2024-09-17 09:45:46 -07:00
Klaus Post
8a30967542
Limit S3 Select JSON documents to 10MB (#20439)
Closes #20430

Limit allocations from badly formed documents.
2024-09-16 09:59:03 -07:00
Harshavardhana
c28a4beeb7
multipart support etag and pre-read small objects (#20423) 2024-09-12 05:24:04 -07:00
Sveinn
3bae73fb42
Add http_timeout to audit webhook configurations (#20421) 2024-09-11 15:20:42 -07:00
Harshavardhana
8c9ab85cfa
Add multipart uploads cache for ListMultipartUploads() (#20407)
this cache will be honored only when `prefix=""` while
performing ListMultipartUploads() operation.

This is mainly to satisfy applications like alluxio
for their underfs implementation and tests.

replaces https://github.com/minio/minio/pull/20181
2024-09-09 09:58:30 -07:00
Klaus Post
b1c849bedc
Don't send a canceled context to Unlock (#20409)
AFAICT we send a canceled context to unlock (and thereby releaseAll). This will cause network calls to fail.

Instead use background and add 30s timeout.
2024-09-09 08:49:49 -07:00
Harshavardhana
fb24bcfee0
fix: set audit/logger webhook retry interval to maximum 1m (#20404) 2024-09-09 02:36:47 -07:00
Harshavardhana
8268c12cfb
Add support for audit/logger max retry and retry interval (#20402)
Current implementation retries forever until our
log buffer is full, and we start dropping events.

This PR allows you to set a value until we give
up on existing audit/logger batches to proceed to
process the new ones.

Bonus:
 - do not blow up buffers beyond batchSize value
 - do not leak the ticker if the worker returns
2024-09-08 05:15:09 -07:00
Sveinn
3f39da48ea
fix: retries and failed message counter (#20401) 2024-09-07 17:13:57 -07:00
Klaus Post
9d5cdaa2e3
Limit Response Recorder memory (#20399)
Disable body recording for...

* admin inspect
* admin metrics
* profiling download

Also, if the recorded body is > 10MB, drop it.
2024-09-07 12:16:04 -07:00
Praveen raj Mani
261111e728
Kafka notify: support batched commits for queue store (#20377)
The items will be saved per target batch and will
be committed to the queue store when the batch is full

Also, periodically commit the batched items to the queue store
based on configured commit_timeout; default is 30s;

Bonus: compress queue store multi writes
2024-09-06 16:06:30 -07:00
Harshavardhana
0f1e8db4c5
all 2xx status codes to be success for audit (#20394) 2024-09-06 15:53:34 -07:00
jiuker
241be9709c
fix: jwt error overrwriten by nil public key (#20387) 2024-09-05 19:46:36 -07:00
Anis Eleuch
9b79eec29e
site-repl: Fix ILM document replication in some cases (#20380)
S3 spec does not accept an ILM XML document containing both <Filter>
and <Prefix> XML tags, even if both are empty. That is why we added
a 'set' field in some lifecycle structures to decide when and when not to
show a tag. However, we forgot to disallow marshaling of Filter when
'set' is set to false.

This will fix ILM document replication in a site replication
configuration in some cases.
2024-09-04 10:01:26 -07:00
Harshavardhana
c2e318dd40
remove mincache EOS related feature from upstream (#20375) 2024-09-03 11:23:41 -07:00
Harshavardhana
504e52b45e
protect bpool from buffer pollution by invalid buffers (#20342) 2024-08-28 18:40:52 -07:00
Harshavardhana
c65e67c357
add more details on the payload sent to webhook audit (#20335) 2024-08-28 08:31:56 -07:00
Mark Theunissen
9511056f44
fix: simplify error logged when logger target is unreachable (#20304) 2024-08-22 02:43:48 -07:00
Aditya Manthramurthy
8a11282522
[fix] S3Select: Add some missing input validation (#20278)
Prevents server panic when some CSV parameters are empty.
2024-08-20 11:31:45 -07:00
Mark Theunissen
6378ca10a4
kms.ListKeys returns CreatedBy/CreatedAt when information is available (#20223) 2024-08-17 23:43:03 -07:00
Harshavardhana
a5702f978e
remove requests deadline, instead just reject the requests (#20272)
Additionally set

 - x-ratelimit-limit
 - x-ratelimit-remaining

To indicate the request rates.
2024-08-16 01:43:49 -07:00
Klaus Post
f1302c40fe
Fix uninitialized replication stats (#20260)
Services are unfrozen before `initBackgroundReplication` is finished. This means that 
the globalReplicationStats write is racy. Switch to an atomic pointer.

Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed 
from the atomic pointer on every use.

All other loads and checks are nil, and calls return empty values when stats 
still haven't been initialized.
2024-08-15 05:04:40 -07:00
Harshavardhana
3b1aa40372
support relative paths for KMS_SECRET_KEY_FILE (#20264)
fixes #20251
2024-08-15 04:46:39 -07:00
Sveinn
743ddb196a
Removing the audit log retry mechanism (#20259) 2024-08-14 15:25:08 -07:00
Klaus Post
3ffeabdfcb
Fix govet+staticcheck issues (#20263)
This is better: https://github.com/golang/go/issues/60529
2024-08-14 10:11:51 -07:00
Harshavardhana
e7a56f35b9
flatten out audit tags, do not send as free-form (#20256)
move away from map[string]interface{} to map[string]string
to simplify the audit, and also provide concise information.

avoids large allocations under load(), reduces the amount
of audit information generated, as the current implementation
was a bit free-form. instead all datastructures must be
flattened.
2024-08-13 15:22:04 -07:00
Harshavardhana
acdb355070
update deps and update azure WARM tier implementation (#20247) 2024-08-13 11:21:34 -07:00
Klaus Post
d8f0e0ea6e
Simplify error logging on event send (#20246)
Overly verbose, hard to read and can leak data.

Print even as JSON and simplify target&error printing.
2024-08-12 08:55:28 -07:00
Harshavardhana
2e0fd2cba9
implement a safer completeMultipart implementation (#20227)
- optimize writing part.N.meta by writing both part.N
  and its meta in sequence without network component.

- remove part.N.meta, part.N which were partially success
  ful, in quorum loss situations during renamePart()

- allow for strict read quorum check arbitrated via ETag
  for the given part number, this makes it double safer
  upon final commit.

- return an appropriate error when read quorum is missing,
  instead of returning InvalidPart{}, which is non-retryable
  error. This kind of situation can happen when many
  nodes are going offline in rotation, an example of such
  a restart() behavior is statefulset updates in k8s.

fixes #20091
2024-08-12 01:38:15 -07:00
Andreas Auernhammer
14876a4df1
ldap: use custom TLS cipher suites (#20221)
This commit replaces the LDAP client TLS config and
adds a custom list of TLS cipher suites which support
RSA key exchange (RSA kex).

Some LDAP server connections experience a significant slowdown
when these cipher suites are not available. The Go TLS stack
disables them by default. (Can be enabled via GODEBUG=tlsrsakex=1).

fixes https://github.com/minio/minio/issues/20214

With a custom list of TLS ciphers, Go can pick the TLS RSA key-exchange
cipher. Ref:
```
	if c.CipherSuites != nil {
		return c.CipherSuites
	}
	if tlsrsakex.Value() == "1" {
		return defaultCipherSuitesWithRSAKex
	}
```
Ref: https://cs.opensource.google/go/go/+/refs/tags/go1.22.5:src/crypto/tls/common.go;l=1017

Signed-off-by: Andreas Auernhammer <github@aead.dev>
2024-08-07 05:59:47 -07:00
Harshavardhana
a17f14f73a
separate lock from common grid to avoid epoll contention (#20180)
epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.

upstream issue https://github.com/golang/go/issues/65064
It seems to be a deeper problem; haven't yet tried the fix
provide in this issue, but however this change without
changing the compiler helps. 

Of course, this is a workaround for now, hoping for a
more comprehensive fix from Go runtime.
2024-07-29 11:10:04 -07:00
Klaus Post
59788e25c7
Update connection deadlines less frequently (#20166)
Only set write deadline on connections every second. Combine the 2 write locations into 1.
2024-07-26 10:40:11 -07:00
Harshavardhana
064f36ca5a
move to GET for internal stream READs instead of POST (#20160)
the main reason is to let Go net/http perform necessary
book keeping properly, and in essential from consistency
point of view its GETs all the way.

Deprecate sendFile() as its buggy inside Go runtime.
2024-07-26 05:55:01 -07:00
Klaus Post
15b609ecea
Expose RPC reconnections and ping time (#20157)
- Keeps track of reconnection count.
- Keeps track of connection ping roundtrip times. 
  Sends timestamp in ping message.
- Allow ping without payload.
2024-07-25 14:07:21 -07:00