141 Commits

Author SHA1 Message Date
Harshavardhana
80ca120088
remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Harshavardhana
dd2542e96c
add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
Harshavardhana
7c948adf88
allow pre-allocating buffers to reduce frequent GCs during growth (#18686)
This PR also increases per node bpool memory from 1024 entries
to 2048 entries; along with that, it also moves the byte pool
centrally instead of being per pool.
2023-12-21 08:59:38 -08:00
Harshavardhana
e7c144eeac
avoid double MRF heal when there is versions disparity (#18585) 2023-12-04 11:13:50 -08:00
Anis Eleuch
b7d11141e1
rename Force to Immediate for clarity (#18540) 2023-11-28 22:35:16 -08:00
jiuker
be02333529
feat: drive sub-sys to max timeout reload (#18501) 2023-11-27 09:15:06 -08:00
Harshavardhana
a4cfb5e1ed
return errors if dataDir is missing during HeadObject() (#18477)
Bonus: allow replication to attempt Deletes/Puts when
the remote returns quorum errors of some kind, this is
to ensure that MinIO can rewrite the namespace with the
latest version that exists on the source.
2023-11-20 21:33:47 -08:00
Klaus Post
508710f4d1
Re-add duplicate upload id sanity check. (#18339)
https://github.com/minio/minio/pull/18307 partially removed the duplicate upload id check.

While I can't really see how ListDir can return duplicate entries, let's re-add it, since it is a cheap sanity check.
2023-10-29 08:33:30 -07:00
Harshavardhana
c60f54e5be
make ListMultipart/ListParts more reliable skip healing disks (#18312)
this PR also fixes old flaky tests, by properly marking disk offline-based tests.
2023-10-24 23:33:25 -07:00
Harshavardhana
069d118329
fix: listObjectParts to prefer local and single disks (#18309) 2023-10-24 13:51:57 -07:00
Klaus Post
6415dec37a
Improve multipart listing speed (#18307) 2023-10-24 12:06:06 -07:00
Harshavardhana
bbfea29c2b
use object modTime for the event sequencer ID (#18285)
always set modTime after lock is acquired in
completemultipart stage to make sure that the
modTime is not racy.
2023-10-20 19:28:05 -07:00
Klaus Post
7926df0b80
Fix globalDeploymentID race (#18275)
globalDeploymentID was being read while it was being set.

Fixes race:

```
WARNING: DATA RACE
Write at 0x0000079605a0 by main goroutine:
  github.com/minio/minio/cmd.connectLoadInitFormats()
      github.com/minio/minio/cmd/prepare-storage.go:269 +0x14f0
  github.com/minio/minio/cmd.waitForFormatErasure()
      github.com/minio/minio/cmd/prepare-storage.go:294 +0x21d
...

Previous read at 0x0000079605a0 by goroutine 105:
  github.com/minio/minio/cmd.newContext()
      github.com/minio/minio/cmd/utils.go:817 +0x31e
  github.com/minio/minio/cmd.adminMiddleware.func1()
      github.com/minio/minio/cmd/admin-router.go:110 +0x96
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
  github.com/minio/minio/cmd.setBucketForwardingMiddleware.func1()
      github.com/minio/minio/cmd/generic-handlers.go:460 +0xb1a
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
...
```
2023-10-18 08:06:57 -07:00
Harshavardhana
77e94087cf
fix: calling statfs() call moves the disk head (#18203)
if erasure upgrade is needed rely on the in-memory
values, instead of performing a "DiskInfo()" call.

https://brendangregg.com/blog/2016-09-03/sudden-disk-busy.html

for HDDs these are problematic, lets avoid this because
there is no value in "being" absolutely strict here
in terms of parity. We are okay to increase parity
as we see based on the in-memory online/offline ratio.
2023-10-10 13:47:35 -07:00
Harshavardhana
d6446cb096
do not return an error in AbortMultipartUpload() (#18135)
returning an error is a bit undefined in AWS S3
as it may return an error or not depending on the
time from AbortMultipartUpload().
2023-09-29 10:28:19 -07:00
Harshavardhana
1647fc7edc
fix: optimize listMultipartUploads to serve via local disks (#18034)
and remove unused getLoadBalancedDisks()
2023-09-15 08:34:03 -07:00
Aditya Manthramurthy
1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana
8a57b6bced
use renameat2 Linux extension syscall (#17757)
this is a faster and safer alternative
on newer kernel versions.
2023-08-27 09:57:11 -07:00
Harshavardhana
dde1a12819
fix: validate incoming uploadID to be base64 encoded (#17865)
Bonus fixes include

- do not have to write final xl.meta (renameData) does this
  already, saves some IOPs.

- make sure to purge the multipart directory properly using
  a recursive delete, otherwise this can easily pile up and
  rely on the stale uploads cleanup.

fixes #17863
2023-08-17 09:37:55 -07:00
Harshavardhana
4643efe6be
fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana
81be718674
fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Harshavardhana
1f8b9b4bd5
fix: do not listAndHeal() inline with PutObject() (#17499)
there is a possibility that slow drives can actually add latency
to the overall call, leading to a large spike in latency.

this can happen if there are other parallel listObjects()
calls to the same drive, in-turn causing each other to sort
of serialize.

this potentially improves performance and makes PutObject()
also non-blocking.
2023-06-24 19:31:04 -07:00
Harshavardhana
64de61d15d
fallback on etags if they match when mtime is not same (#17424)
on "unversioned" buckets there are situations
when successive concurrent I/O can lead to
an inconsistent state() with mtime while the
etag might be the same for the object on disk.

in such a scenario it is possible for us to
allow reading of the object since etag matches
and if etag matches we are guaranteed that we
have enough copies the object will be readable
and same.

This PR allows fallback in such scenarios.
2023-06-17 19:18:20 -07:00
Harshavardhana
75c6fc4f02
only allow decryption of etag for only sse-s3 (#17335) 2023-06-05 13:08:51 -07:00
Harshavardhana
1cd7f1e38d
fix: cleanup empty multipart folders upon stale upload cleanup (#17312) 2023-05-30 09:56:50 -07:00
Harshavardhana
394690dcfb
check for upto 50%+ data disks to be offline (#17294) 2023-05-26 22:56:19 -07:00
Klaus Post
c839b64f6a
fix: compressed+encrypted block overhead (#17289) 2023-05-26 10:57:07 -07:00
Anis Eleuch
54c5c88fe6
Add number of offline disks in quorum errors (#16822) 2023-05-25 09:39:06 -07:00
Harshavardhana
ef54200db7
offline drives more than 50% of total drives return error (#17252) 2023-05-23 07:57:57 -07:00
Harshavardhana
f7d29b4a53
cleanup of multipart per disk must cleanup itself only (#17223) 2023-05-17 01:45:58 -07:00
Klaus Post
76913a9fd5
Signed trailers for signature v4 (#16484) 2023-05-05 19:53:12 -07:00
Klaus Post
7fad0c8b41
Remove checksums from HTTP range request, add part checksums (#17105) 2023-04-28 08:26:32 -07:00
Praveen raj Mani
72802a5972
Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Krishnan Parthasarathi
fae9000304
heal: Pick maximally occuring modTime in quorum (#17071) 2023-04-25 10:13:57 -07:00
Harshavardhana
84f31ed45d
simplify MRF, converge it to regular healing (#17026) 2023-04-19 07:47:42 -07:00
Klaus Post
c133979b8e
Add part count to checksum (#17035) 2023-04-14 09:44:45 -07:00
Klaus Post
9acf1024e4
Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Anis Elleuch
acc9c033ed
debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Anis Elleuch
89db3fdb5d
Do not return an error when version disparity is detected (#16269) 2022-12-16 08:52:12 -08:00
Harshavardhana
444ff20bc5
do not rename multipart failed transactions back to tmp (#16204) 2022-12-12 01:40:29 -08:00
Klaus Post
12fd6678ee
Encrypt checksums with KMS on CompleteMultipartUpload (#16177) 2022-12-07 10:18:18 -08:00
Poorna
63fc6ba2cd
preserve replicated ETag properly on target (#16129) 2022-11-26 14:43:32 -08:00
Klaus Post
98ba622679
Reduce temporary file clean-up waits (#16110) 2022-11-22 07:23:36 -08:00
Harshavardhana
6aea950d74
avoid partID lock validating uploadID exists prematurely (#16086) 2022-11-18 03:09:35 -08:00
Aditya Manthramurthy
5f1999cc71
fix: avoid URL unsafe chars in multipart upload ID (#16034) 2022-11-09 16:41:16 -08:00
Harshavardhana
fd6f6fc8df
cleanup stale parent multipart directories (#15980) 2022-11-01 08:00:02 -07:00
Klaus Post
86420a1f46
Store multipart checksums (#15953) 2022-10-26 18:14:58 -07:00
Poorna
ce8456a1a9
proxy multipart to peers via multipart uploadID (#15926) 2022-10-25 10:52:29 -07:00
Harshavardhana
328d660106
support CRC32 Checksums on single drive setup (#15873) 2022-10-15 11:58:47 -07:00
Harshavardhana
b04c0697e1
validate correct ETag for the parts sent during CompleteMultipart (#15751) 2022-09-23 21:17:08 -07:00