Commit Graph

6385 Commits

Author SHA1 Message Date
Harshavardhana
064f36ca5a move to GET for internal stream READs instead of POST (#20160)
the main reason is to let Go net/http perform necessary
book keeping properly, and in essential from consistency
point of view its GETs all the way.

Deprecate sendFile() as its buggy inside Go runtime.
2024-07-26 05:55:01 -07:00
Krishnan Parthasarathi
4a1edfd9aa Different read quorum for tiered objects (#20115)
For a non-tiered object, MinIO requires that EcM (# of data blocks) of
xl.meta agree, corresponding to the number of data blocks needed to 
read this object.

OTOH, tiered objects have metadata in the hot tier and data in the 
warm tier. The data and its integrity are offloaded to the warm tier. This
allows us to reduce the read quorum from EcM (typically > N/2, where N -
erasure stripe width) to N/2 + 1. The simple majority of metadata
ensures consensus on what the object is and where it is
located.
2024-07-25 14:02:50 -07:00
Anis Eleuch
b7f319b62a properly reload a fresh drive when found in a failed state during startup (#20145)
When a drive is in a failed state when a single node multiple drives
deployment is started, a replacement of a fresh disk will not be
properly healed unless the user restarts the node.

Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also
remove globalLocalDrives for simplification, a map to store local node
drives can still be used since the order of local drives of a node is
not defined.
2024-07-24 16:30:33 -07:00
Anis Eleuch
33c101544d kms: Expose API when bucket federation is enabled (#20143)
kms: Expose API available when bucket federation is enabled

When bucket federation feature is enabled, KMS API will not work, such
as `mc admin kms key list`

The commit will fix the issue by disabling bucket forwarding when this
is a KMS request.
2024-07-24 15:44:29 -07:00
Harshavardhana
3b21bb5be8 use unixNanoTime instead of time.Time in lockRequestorInfo (#20140)
Bonus: Skip Source, Quorum fields in lockArgs that are never
sent during Unlock() phase.
2024-07-24 03:24:01 -07:00
Harshavardhana
6fe2b3f901 avoid sendFile() for ranges or object lengths < 4MiB (#20141) 2024-07-24 03:22:50 -07:00
Taran Pelkey
b368d4cc13 Fix updateGroupMembershipsForLDAP behavior with unicode (#20137) 2024-07-23 19:10:03 -07:00
Klaus Post
0680af7414 Set O_NONBLOCK for reads and writes on unix (#20133)
Tracing syscalls, opening and reading an `xl.meta` looks like this:

```
openat(AT_FDCWD, "/mnt/drive1/ss8-old/testbucket/ObjSize4MiBThreads72/(554O51H/peTb(0iztdbTKw59.csv/xl.meta", O_RDONLY|O_NOATIME|O_CLOEXEC) = 34 <0.000>
fcntl(34, F_GETFL)                      = 0x48000 (flags O_RDONLY|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) = 0 <0.000>
epoll_ctl(4, EPOLL_CTL_ADD, 34, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3172471557, u64=8145488475984499461}}) = -1 EPERM (Operation not permitted) <0.000>
fcntl(34, F_GETFL)                      = 0x48800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_LARGEFILE|O_NOATIME) = 0 <0.000>
fstat(34, {st_mode=S_IFREG|0644, st_size=354, ...}) = 0 <0.000>
read(34, "XL2 \1\0\3\0\306\0\0\1P\2\2\1\304$\225\304\20\0\0\0\0\0\0\0\0\0\0\0"..., 354) = 354 <0.000>
close(34)                               = 0 <0.000>
```

Everything until `fstat` is the `os.Open` call.

Looking at the code: https://github.com/golang/go/blob/master/src/os/file_unix.go#L212-L243

It seems for every file it "tries" to see if it is pollable. This causes `syscall.SetNonblock(fd, true)` to be called. This is the first `F_SETFL`.

It then calls `f.pfd.Init("file", true)`. This will attempt to set it as pollable using `epoll_ctl`. This will always fail for files. It therefore calls `syscall.SetNonblock(fd, false)` resulting in the second `F_SETFL`.

If we set the `O_NONBLOCK` call on the initial open, we should avoid the 4 `fcntl` syscalls per file.

I don't see any way to avoid the `epoll_ctl` call, since kind is either `kindOpenFile` or `kindNonBlock`, so "pollable" will always be true. However avoiding 4 of 6 syscalls still seems worth it.

This should not have any effect, since files will end up with "nonblock" anyway.
2024-07-23 09:36:24 -07:00
Harshavardhana
91805bcab6 add optimizations to bring performance on unversioned READS (#20128)
allow non-inlined on disk to be inlined via
an unversioned ReadVersion() call, we only
need ReadXL() to resolve objects with multiple
versions only.

The choice of this block makes it to be dynamic
and chosen by the user via `mc admin config set`

Other bonus things

- Start measuring internode TTFB performance.
- Set TCP_NODELAY, TCP_CORK for low latency
2024-07-23 03:53:03 -07:00
jiuker
b3a94c4e85 fix: Use xtime duration to parse batch job (#20117) 2024-07-23 00:05:53 -07:00
Harshavardhana
8e618d45fc remove unnecessary LRU for internode auth token (#20119)
removes contentious usage of mutexes in LRU, which
were never really reused in any manner; we do not
need it.

To trust hosts, the correct way is TLS certs; this PR completely
removes this dependency, which has never been useful.

```
0  0%  100%  25.83s 26.76%  github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
0  0%  100%  28.03s 29.04%  github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
```

Bonus: use `x-minio-time` as a nanosecond to avoid unnecessary
parsing logic of time strings instead of using a more
straightforward mechanism.
2024-07-22 00:04:48 -07:00
Harshavardhana
3ef59d2821 do not set KMSSecretKey env from KMSSecretKeyFile (#20122)
fixes #20121
2024-07-21 14:39:15 -07:00
Anis Eleuch
d9ee668b6d s3: Fix wrong continuation token during listing with ILM enabled bucket (#20113) 2024-07-18 13:37:34 -07:00
Anis Eleuch
2e5d792f0c batch-expiry: Save progress regularly in the drives and at the end (#20098)
- Also, fix failure reporting at the end.
- Also, avoid parsing report objects when listing or resuming jobs, this
does not cause any bugs, it is only printing, not useful errors.
2024-07-17 09:42:32 -07:00
Poorna
3535197f99 replication: proxy only on missing object or read quorum err (#20101) 2024-07-16 16:46:41 -07:00
Mark Theunissen
698bb93a46 Allow a KMS Action to specify keys in the Resources of a policy (#20079) 2024-07-16 07:03:03 -07:00
Harshavardhana
e8c54c3d6c add validation test for v3 metrics for all its endpoints (#20094)
add unit test for v3 metrics for all its exposed endpoints

Bonus:
  - support OpenMetrics encoding
  - adds boot time for prometheus
  - continueOnError is better to serve as
    much metrics as possible.
2024-07-15 09:28:02 -07:00
Shubhendu
f944a42886 Removed user and group details from logs (#20072)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-07-14 11:12:07 -07:00
Harshavardhana
eff0ea43aa fix: typo in BucketUsageMetrics group registration in v3 metrics (#20090)
```
curl http://localhost:9000/minio/metrics/v3/cluster/usage/buckets
```

Did not work as documented, due to the fact that there was a typo
in the bucket usage metrics registration group. This endpoint is
a cluster endpoint and does not require any `buckets` argument.
2024-07-14 11:11:42 -07:00
Harshavardhana
7fcb428622 do not print unexpected logs (#20083) 2024-07-12 13:51:54 -07:00
Klaus Post
83adc2eebf Fix ListObjects aborting after 3 minute on async request (#20074)
When creating the async listing, if the first request does not return within 3 
minutes, it is stopped, since it isn't being kept alive.

Keep updating `lastHandout` while we are waiting for the initial request to be fulfilled.
2024-07-12 09:23:16 -07:00
Poorna
989c318a28 replication: make large workers configurable (#20077)
This PR also improves throttling by reducing tokens requested
from rate limiter based on available tokens to avoid exceeding
throttle wait deadlines
2024-07-12 07:57:31 -07:00
Taran Pelkey
f5d2fbc84c Add DecodeDN and QuickNormalizeDN functions to LDAP config (#20076) 2024-07-11 18:04:53 -07:00
Allan Roger Reid
e139673969 Audit failure in batch job key rotate (#20073) 2024-07-11 16:13:15 -07:00
Harshavardhana
a8c6465f22 hide some deprecated fields from 'get' output (#20069)
also update wording on `subnet license="" api_key=""`
2024-07-10 13:16:44 -07:00
Taran Pelkey
6c6f0987dc Add groups to policy entities (#20052)
* Add groups to policy entities

* update comment

---------

Co-authored-by: Harshavardhana <harsha@minio.io>
2024-07-10 11:41:49 -07:00
Austin Chang
5f64658faa clarify error message for root user credential (#20043)
Signed-off-by: Austin Chang <austin880625@gmail.com>
2024-07-10 09:57:01 -07:00
Anis Eleuch
ce183cb2b4 heal: List and heal again for any listing error (#19999)
When a fresh drive healing is finished, add more checks for the drive listing
errors. If any, re-list and heal again. Although this is an infrequent use
case to have listPathRaw() returning nil when minDisks is set to 1, we
still need to handle all possible use cases to avoid missing healing
any object.

Also, check for HealObject result to decide of an object is healed in the
fresh disk since HealObject returns nil if an object is healed in any
disk, and not in the new fresh drive.
2024-07-10 09:55:36 -07:00
Klaus Post
b3bac73c0f Clarify post policy error message (#20067)
It is not really clear that the listed keys are missing.

Clarify the error
2024-07-10 07:18:44 -07:00
Anis Eleuch
e726d8ff0f list: Hide objects/versions with pending/failed replicated deletion (#20047)
In regular listing, this commit will avoid showing an object when its
latest version has a pending or failed deletion. In replicated setup.
It will also prevent showing older versions in the same case.
2024-07-09 15:26:42 -07:00
Shubhendu
f4230777b3 Log replication errors once (#20063)
Also, sort the error map for multiple sites in ascending order
of deployment IDs, so that the error message generated is always
definitive order and same.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-07-09 10:10:31 -07:00
Krishnan Parthasarathi
380233d646 batch: Update job info object on success (#20053) 2024-07-08 18:45:54 -07:00
Klaus Post
0d0b0aa599 Abstract grid connections (#20038)
Add `ConnDialer` to abstract connection creation.

- `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for 
   incoming custom connections.

- `ConnectWS` is provided to create web socket connections.
2024-07-08 14:44:00 -07:00
Anis Eleuch
b433bf14ba Add typos check to Makefile (#20051) 2024-07-08 14:39:49 -07:00
Klaus Post
107d951893 Log ILM failed object name (#20040)
Log so we know which object we are dealing with.

Log each object once.
2024-07-04 07:25:45 -07:00
Shireesh Anjal
22c53b1c70 Remove license update job (#20037) 2024-07-03 11:49:48 -07:00
Mark Theunissen
88926ad8e9 return appropriate error upon tier update for incorrect credentials (#20034) 2024-07-03 00:17:20 -07:00
Harshavardhana
32d04091a2 resume any batch jobs in a goroutine (#20035)
Bonus: move batch job initialization to the last item after all other initialization, 
            allowing for faster startup time for different subsystems.
2024-07-03 00:16:05 -07:00
Harshavardhana
be84a4fd68 do not proxy invalid object names (#20031) 2024-07-02 14:28:55 -07:00
Anis Eleuch
2ec1f404ac info: Always refresh the root disk status (#20023)
Add root drive status in the disk info cache function, so unmounting a
drive without restarting a local node reflects the correct value.
2024-07-02 13:41:29 -07:00
Klaus Post
2040559f71 Fix SkipReader performance with small initial read (#20030)
If `SkipReader` is called with a small initial buffer it may be doing a huge number if Reads to skip the requested number of bytes. If a small buffer is provided grab a 32K buffer and use that.

Fixes slow execution of `testAPIGetObjectWithMPHandler`.

Bonuses:

* Use `-short` with `-race` test.
* Do all suite test types with `-short`.
* Enable compressed+encrypted in `testAPIGetObjectWithMPHandler`.
* Disable big file tests in `testAPIGetObjectWithMPHandler` when using `-short`.
2024-07-02 08:13:05 -07:00
Anis Eleuch
ca0ce4c6ef tests: Fix setting max openfds as memory limit (#20029)
The code was advertenly passing max openfds to debug.SetMemoryLimit(),
fixing this accelerate go test in my machine.

This is only a testing bug, since the server context has always a valid
MaxMem, so the buggy code was never called in users environments.
2024-07-02 08:09:36 -07:00
Anis Eleuch
757cf413cb Add batch status API (#19679)
Currently the status of a completed or failed batch is held in the
memory, a simple restart will lose the status and the user will not
have any visibility of the job that was long running.

In addition to the metrics, add a new API that reads the batch status
from the drives. A batch job will be cleaned up three days after
completion.

Also add the batch type in the batch id, the reason is that the batch
job request is removed immediately when the job is finished, then we
do not know the type of batch job anymore, hence a difficulty to locate
the job report
2024-07-02 01:17:52 -07:00
Anis Eleuch
b35acb3dbc heal: Add support of healing particular pool/set (#20024) 2024-07-01 15:02:25 -07:00
Sveinn
e404abf103 Letting password enable auth bypass caPublicKey (only if passauth is … (#20022) 2024-07-01 15:02:01 -07:00
jiuker
f7ff19cb18 fix: warning for decommissioned pool while start (#20019) 2024-07-01 07:38:46 -07:00
Poorna
91faaa1387 fix panic in batch replicate (#20014)
Fixes:

```
panic: send on closed channel
	panic: close of closed channel

goroutine 878 [running]:
github.com/minio/minio/internal/ioutil.SafeClose[...](...)
	/Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:407
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2.2()
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2229 +0xc0
panic({0x108c25e60?, 0x1090b28d0?})
	/usr/local/go/src/runtime/panic.go:770 +0x124
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2.3({{0x1400e397316, 0x5}, {0x1400d88b8a8, 0x8}, {0x1f99d80, 0xede101c42, 0x0}, 0x3bc, 0x0, 0x0, ...})
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2235 +0xb4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func2()
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2277 +0xabc
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk in goroutine 575
	/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2210 +0x33c
```
2024-06-28 18:20:47 -07:00
Harshavardhana
f365a98029 fix: hot-reloading STS credential policy documents (#20012)
* fix: hot-reloading STS credential policy documents
* Support Role ARNs hot load policies (#28)

---------

Co-authored-by: Anis Eleuch <vadmeste@users.noreply.github.com>
2024-06-28 16:17:22 -07:00
Taran Pelkey
7ca4ba77c4 Update tests to use AttachPolicy(LDAP) instead of deprecated SetPolicy (#19972) 2024-06-28 02:06:25 -07:00
Poorna
13512170b5 list: Do not decrypt SSE-S3 Etags in a non encrypted format (#20008) 2024-06-27 19:44:56 -07:00