The `Token` parameter is a sensitive value that should not be output in the Audit log for STS AssumeRoleWithCustomToken API.
Bonus: Add a simple tool that echoes audit logs to the console.
When listing, with drives returning `errFileNotFound,` `errVolumeNotFound`, or `errUnformattedDisk,`,
we could get below `minDisks` drives being left.
This would result in a quorum never being reachable for any object. Therefore, the listing
would continue, but no results would ever be produced.
Include `fnf` in the mindisk check since it is incremented on these errors. This will stop
listing when minDisks are left.
Allow `opts.minDisks` to not return errVolumeNotFound or errFileNotFound and return that.
That will allow for good results even if disks return something else.
We switch `errUnformattedDisk` to a regular error. If we have enough of those, we should just fail.
Typically not all drives are connected, so we delay 3 minutes before resuming.
This greatly reduces risk of starting to list unconnected drives, or drives we risk being disconnected soon.
This delay is not applied when starting with an admin call.
ConsoleUI like applications rely on combination of
ListServiceAccounts() and InfoServiceAccount() to populate
UI elements, however individually these calls can be slow
causing the entire UI to load sluggishly.
i.e., this rule element doesn't apply to DEL markers.
This is a breaking change to how ExpiredObejctDeleteAllVersions
functions today. This is necessary to avoid the following highly probable
footgun scenario in the future.
Scenario:
The user uses tags-based filtering to select an object's time to live(TTL).
The application sometimes deletes objects, too, making its latest
version a DEL marker. The previous implementation skipped tag-based filters
if the newest version was DEL marker, voiding the tag-based TTL. The user is
surprised to find objects that have expired sooner than expected.
* Add DelMarkerExpiration action
This ILM action removes all versions of an object if its
the latest version is a DEL marker.
```xml
<DelMarkerObjectExpiration>
<Days> 10 </Days>
</DelMarkerObjectExpiration>
```
1. Applies only to objects whose,
• The latest version is a DEL marker.
• satisfies the number of days criteria
2. Deletes all versions of this object
3. Associated rule can't have tag-based filtering
Includes,
- New bucket event type for deletion due to DelMarkerExpiration
calling a remote target remove with a perfectly
well constructed ARN can lead to a crash for a bucket
with no replication configured.
This PR fixes, and adds a crash check for ImportMetadata
as well.
Algorithms are comma separated.
Note that valid values does not in all cases represent default values.
`--sftp=pub-key-algos=...` specifies the supported client public key
authentication algorithms. Note that this doesn't include certificate types
since those use the underlying algorithm. This list is sent to the client if
it supports the server-sig-algs extension. Order is irrelevant.
Valid values
```
ssh-ed25519
sk-ssh-ed25519@openssh.comsk-ecdsa-sha2-nistp256@openssh.com
ecdsa-sha2-nistp256
ecdsa-sha2-nistp384
ecdsa-sha2-nistp521
rsa-sha2-256
rsa-sha2-512
ssh-rsa
ssh-dss
```
`--sftp=kex-algos=...` specifies the supported key-exchange algorithms in preference order.
Valid values:
```
curve25519-sha256
curve25519-sha256@libssh.org
ecdh-sha2-nistp256
ecdh-sha2-nistp384
ecdh-sha2-nistp521
diffie-hellman-group14-sha256
diffie-hellman-group16-sha512
diffie-hellman-group14-sha1
diffie-hellman-group1-sha1
```
`--sftp=cipher-algos=...` specifies the allowed cipher algorithms.
If unspecified then a sensible default is used.
Valid values:
```
aes128-ctr
aes192-ctr
aes256-ctr
aes128-gcm@openssh.comaes256-gcm@openssh.comchacha20-poly1305@openssh.com
arcfour256
arcfour128
arcfour
aes128-cbc
3des-cbc
```
`--sftp=mac-algos=...` specifies a default set of MAC algorithms in preference order.
This is based on RFC 4253, section 6.4, but with hmac-md5 variants removed because they have
reached the end of their useful life.
Valid values:
```
hmac-sha2-256-etm@openssh.comhmac-sha2-512-etm@openssh.com
hmac-sha2-256
hmac-sha2-512
hmac-sha1
hmac-sha1-96
```
This would reduce the size of data in response of metrics
listing. While graphing we can default these metrics with
a zero value if not found.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Unfreeze as soon as the incoming connection is terminated and don't wait for everything to complete.
We don't want to keep the services frozen if something becomes stuck.
- handle errFileCorrupt properly
- micro-optimization of sending done() response quicker
to close the goroutine.
- fix logger.Event() usage in a couple of places
- handle the rest of the client to return a different error other than
lastErr() when the client is closed.
listPathRaw() counts errDiskNotFound as a valid error to indicate a
listing stream end. However, storage.WalkDir() is allowed to return
errDiskNotFound anytime since grid.ErrDisconnected is converted to
errDiskNotFound.
This affects fresh disk healing and should affect S3 listing as well.
endpoint: /minio/metrics/v3/system/process
metrics:
- locks_read_total
- locks_write_total
- cpu_total_seconds
- go_routine_total
- io_rchar_bytes
- io_read_bytes
- io_wchar_bytes
- io_write_bytes
- start_time_seconds
- uptime_seconds
- file_descriptor_limit_total
- file_descriptor_open_total
- syscall_read_total
- syscall_write_total
- resident_memory_bytes
- virtual_memory_bytes
- virtual_memory_max_bytes
Since the standard process collector implements only a subset of these
metrics, remove it and implement our own custom process collector that
captures all the process metrics we need.
Since the object is being permanently deleted, the lack of read quorum should not
matter as long as sufficient disks are online to complete the deletion with parity
requirements.
If several pools have the same object with insufficient read quorum, attempt to
delete object from all the pools where it exists
At server startup, LDAP configuration is validated against the LDAP
server. If the LDAP server is down at that point, we need to cleanly
disable LDAP configuration. Previously, LDAP would remain configured but
error out in strange ways because initialization did not complete
without errors.
When importing access keys (i.e. service accounts) for LDAP accounts,
we are requiring groups to exist under one of the configured group base
DNs. This is not correct. This change fixes this by only checking for
existence and storing the normalized form of the group DN - we do not
return an error if the group is not under a base DN.
Test is updated to illustrate an import failure that would happen
without this change.
Existing IAM import logic for LDAP creates new mappings when the
normalized form of the mapping key differs from the existing mapping key
in storage. This change effectively replaces the existing mapping key by
first deleting it and then recreating with the normalized form of the
mapping key.
For e.g. if an older deployment had a policy mapped to a user DN -
`UID=alice1,OU=people,OU=hwengg,DC=min,DC=io`
instead of adding a mapping for the normalized form -
`uid=alice1,ou=people,ou=hwengg,dc=min,dc=io`
we should replace the existing mapping.
This ensures that duplicates mappings won't remain after the import.
Some additional cleanup cases are also covered. If there are multiple
mappings for the name normalized key such as:
`UID=alice1,OU=people,OU=hwengg,DC=min,DC=io`
`uid=alice1,ou=people,ou=hwengg,DC=min,DC=io`
`uid=alice1,ou=people,ou=hwengg,dc=min,dc=io`
we check if the list of policies mapped to all these keys are exactly
the same, and if so remove all of them and create a single mapping with
the normalized key. However, if the policies mapped to such keys differ,
the import operation returns an error as the server cannot automatically
pick the "right" list of policies to map.
`minio_cluster_webhook_queue_length` was wrongly defined as `counter`
where-as it should be `gauge`
Following were wrongly defined as `gauge` when they should actually be
`counter`:
- minio_bucket_replication_sent_bytes
- minio_bucket_replication_received_bytes
- minio_bucket_replication_total_failed_bytes
- minio_bucket_replication_total_failed_count
When LDAP is enabled, previously we were:
- rejecting creation of users and groups via the IAM import functionality
- throwing a `not a valid DN` error when non-LDAP group mappings are present
This change allows for these cases as we need to support situations
where the MinIO server contains users, groups and policy mappings
created before LDAP was enabled.
instead upon any error in renameData(), we still
preserve the existing dataDir in some form for
recoverability in strange situations such as out
of disk space type errors.
Bonus: avoid running list and heal() instead allow
versions disparity to return the actual versions,
uuid to heal. Currently limit this to 100 versions
and lesser disparate objects.
an undo now reverts back the xl.meta from xl.meta.bkp
during overwrites on such flaky setups.
Bonus: Save N depth syscalls via skipping the parents
upon overwrites and versioned updates.
Flaky setup examples are stretch clusters with regular
packet drops etc, we need to add some defensive code
around to avoid dangling objects.
RenameData could start operating on inline data after timing out
and the call returned due to WithDeadline.
This could cause a buffer to write to the inline data being written.
Since no writes are in `RenameData` and the call is canceled,
this doesn't present a corruption issue. But a race is a race and
should be fixed.
Copy inline data to a fresh buffer.
This PR makes a feasible approach to handle all the scenarios
that we must face to avoid returning "panic."
Instead, we must return "errServerNotInitialized" when a
bucketMetadataSys.Get() is called, allowing the caller to
retry their operation and wait.
Bonus fix the way data-usage-cache stores the object.
Instead of storing usage-cache.bin with the bucket as
`.minio.sys/buckets`, the `buckets` must be relative
to the bucket `.minio.sys` as part of the object name.
Otherwise, there is no way to decommission entries at
`.minio.sys/buckets` and their final erasure set positions.
A bucket must never have a `/` in it. Adds code to read()
from existing data-usage.bin upon upgrade.
This PR fixes a few things
- FIPS support for missing for remote transports, causing
MinIO could end up using non-FIPS Ciphers in FIPS mode
- Avoids too many transports, they all do the same thing
to make connection pooling work properly re-use them.
- globalTCPOptions must be set before setting transport
to make sure the client conn deadlines are honored properly.
- GCS warm tier must re-use our transport
- Re-enable trailing headers support.
This reverts commit 928c0181bf.
This change was not correct, reverting.
We track 3 states with the ProxyRequest header - if replication process wants
to know if object is already replicated with a HEAD, it shouldn't proxy back
- Poorna
AWS S3 trailing header support was recently enabled on the warm tier
client connection to MinIO type remote tiers. With this enabled, we are
seeing the following error message at http transport layer.
> Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 400 Bad Request\r\nContent-Type: text/plain; charset=utf-8\r\nConnection: close\r\n\r\n400 Bad Request"; err=<nil>
This is an interim fix until we identify the root cause for this behaviour in the
minio-go client package.
Keep the EC in header, so it can be retrieved easily for dynamic quorum calculations.
To not force a full metadata decode on every read the value will be 0/0 for data written in previous versions.
Size is expected to increase by 2 bytes per version, since all valid values can be represented with 1 byte each.
Example:
```
λ xl-meta xl.meta
{
"Versions": [
{
"Header": {
"EcM": 4,
"EcN": 8,
"Flags": 6,
"ModTime": "2024-04-17T11:46:25.325613+02:00",
"Signature": "0a409875",
"Type": 1,
"VersionID": "8e03504e11234957b2727bc53eda0d55"
},
...
```
Not used for operations yet.
Follow up for #19528
If there are multiple existing DN mappings for the same normalized DN,
if they all have the same policy mapping value, we pick one of them of
them instead of returning an import error.
This is a change to IAM export/import functionality. For LDAP enabled
setups, it performs additional validations:
- for policy mappings on LDAP users and groups, it ensures that the
corresponding user or group DN exists and if so uses a normalized form
of these DNs for storage
- for access keys (service accounts), it updates (i.e. validates
existence and normalizes) the internally stored parent user DN and group
DNs.
This allows for a migration path for setups in which LDAP mappings have
been stored in previous versions of the server, where the name of the
mapping file stored on drives is not in a normalized form.
An administrator needs to execute:
`mc admin iam export ALIAS`
followed by
`mc admin iam import ALIAS /path/to/export/file`
The validations are more strict and returns errors when multiple
mappings are found for the same user/group DN. This is to ensure the
mappings stored by the server are unambiguous and to reduce the
potential for confusion.
Bonus **bug fix**: IAM export of access keys (service accounts) did not
export key name, description and expiration. This is fixed in this
change too.
Reading the list metacache is not protected by a lock; the code retries when it fails
to read the metacache object, however, it forgot to re-read the metacache object
from the drives, which is necessary, especially if the metacache object is inlined.
This commit will ensure that we always re-read the metacache object from the drives
when it is retrying.
When resuming a versioned listing where `version-id-marker=null`, the `null` object would
always be returned, causing duplicate entries to be returned.
Add check against empty version
unlinking() at two different locations on a disk when there
are lots to purge, this can lead to huge IOwaits, instead
rely on rename() to .trash to avoid running multiple unlinks()
in parallel.
since mid 2018 we do not have any deployments
without deployment-id, it is time to put this
code to rest, this PR removes this old code as
its no longer valuable.
on setups with 1000's of drives these are all
quite expensive operations.
The rest of the peer clients were not consistent across nodes. So, meta cache requests
would not go to the same server if a continuation happens on a different node.
When no results match or another error occurs, add an error to the stream. Keep the "inspect-input.txt" as the only thing in the zip for reference.
Example:
```
λ mc support inspect --airgap myminio/testbucket/fjghfjh/**
mc: Using public key from C:\Users\klaus\mc\support_public.pem
File data successfully downloaded as inspect-data.enc
λ inspect inspect-data.enc
Using private key from support_private.pem
output written to inspect-data.zip
2024/04/11 14:10:51 next stream: GetRawData: No files matched the given pattern
λ unzip -l inspect-data.zip
Archive: inspect-data.zip
Length Date Time Name
--------- ---------- ----- ----
222 2024-04-11 14:10 inspect-input.txt
--------- -------
222 1 file
λ
```
Modifies inspect to read until end of stream to report the error.
Bonus: Add legacy commandline params
Add following metrics:
- used_inodes
- total_inodes
- healing
- online
- reads_per_sec
- reads_kb_per_sec
- reads_await
- writes_per_sec
- writes_kb_per_sec
- writes_await
- perc_util
To be able to calculate the `per_sec` values, we capture the IOStats-related
data in the beginning (along with the time at which they were captured),
and compare them against the current values subsequently. This is because
dividing by "time since server uptime." doesn't work in k8s environments.
the disk location never changes in the lifetime of a
MinIO cluster, even if it did validate this close to the
disk instead at the higher layer.
Return appropriate errors indicating an invalid drive, so
that the drive is not recognized as part of a valid
drive.
we have had numerous reports on some config
values not having default values, causing
features misbehaving and not having default
values set properly.
This PR tries to address all these concerns
once and for all.
Each new sub-system that gets added
- must check for invalid keys
- must have default values set
- must not "return err" when being saved into
a global state() instead collate as part of
other subsystem errors allow other sub-systems
to independently initialize.
* Allow specifying the local server, with env variable _MINIO_SERVER_LOCAL, in systems where the hostname cannot be resolved to local IP
* Limit scope of the _MINIO_SERVER_LOCAL solution to only containerized implementations
Return an error when the user specifies endpoints for both source
and target. This can generate many type of errors as the code considers
a deployment remote if its endpoint is specified.
HealObject() does not return an error in some cases, for example, when
an object is successfully reconstructed in one disk but fails with other
disks, another case is when a disk does not have the object is temporarily
disconnected
Add the After heal drives result in the audit output for better
analysis.
Set object's modTime when being restored
restored here refers to making a temporary local copy in the hot tier
for a tiered object using the RestoreObject API
we have been using an LRU caching for internode
auth tokens, migrate to using a typed implementation
and also do not cache auth tokens when its an error.
This fixes a regression from #19358 which prevents policy mappings
created in the latest release from being displayed in policy entity
listing APIs.
This is due to the possibility that the base DNs in the LDAP config are
not in a normalized form and #19358 introduced normalized of mapping
keys (user DNs and group DNs). When listing, we check if the policy
mappings are on entities that parse as valid DNs that are descendants of
the base DNs in the config.
Test added that demonstrates a failure without this fix.
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.
Also introduce bugLogIf() for errors that should not happen in the first
place.
This commit replaces the `KMS.Stat` API call with a
`KMS.GenerateKey` call. This approach is more reliable
since data key generation also works when the KMS backend
is unavailable (temp. offline), but KES has cached the
key. Ref: KES offline caching.
With this change, it is less likely that MinIO readiness
checks fail in cases where the KMS backend is offline.
Signed-off-by: Andreas Auernhammer <github@aead.dev>
Make sure to pass a nil pointer as a Transport to minio-go when the API config
is not initialized, this will make sure that we do not pass an interface
with a known type but a nil value.
This will also fix the update of the API remote_transport_deadline
configuration without requiring the cluster restart.
Use `ODirectPoolSmall` buffers for inline data in PutObject.
Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.
This fixes a bug where STS Accounts map accumulates accounts in memory
and never removes expired accounts and the STS Policy mappings were not
being refreshed.
The STS purge routine now runs with every IAM credentials load instead
of every 4th time.
The listing of IAM files is now cached on every IAM load operation to
prevent re-listing for STS accounts purging/reload.
Additionally this change makes each server pick a time for IAM loading
that is randomly distributed from a 10 minute interval - this is to
prevent server from thundering while performing the IAM load.
On average, IAM loading will happen between every 5-15min after the
previous IAM load operation completes.
If site replication enabled across sites, replicate the SSE-C
objects as well. These objects could be read from target sites
using the same client encryption keys.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Instead of relying on user input values, we use the DN value returned by
the LDAP server.
This handles cases like when a mapping is set on a DN value
`uid=svc.algorithm,OU=swengg,DC=min,DC=io` with a user input value (with
unicode variation) of `uid=svc﹒algorithm,OU=swengg,DC=min,DC=io`. The
LDAP server on lookup of this DN returns the normalized value where the
unicode dot character `SMALL FULL STOP` (in the user input), gets
replaced with regular full stop.
Bonus: remove persistent md5sum calculation, turn-off
sha256 as well. Instead we always enable crc32c which
is enough for payload verification also support for
trailing headers checksum.
Fix races in IAM cache
Fixes#19344
On the top level we only grab a read lock, but we write to the cache if we manage to fetch it.
a03dac41eb/cmd/iam-store.go (L446) is also flipped to what it should be AFAICT.
Change the internal cache structure to a concurrency safe implementation.
Bonus: Also switch grid implementation.
we must attempt to convert all errors at storage-rest-client
into StorageErr() regardless of what functionality is being
called in, this PR fixes this for multiple callers including
some internally used functions.
- old version was unable to retain messages during config reload
- old version could not go from memory to disk during reload
- new version can batch disk queue entries to single for to reduce I/O load
- error logging has been improved, previous version would miss certain errors.
- logic for spawning/despawning additional workers has been adjusted to trigger when half capacity is reached, instead of when the log queue becomes full.
- old version would json marshall x2 and unmarshal 1x for every log item. Now we only do marshal x1 and then we GetRaw from the store and send it without having to re-marshal.
panic seen due to premature closing of slow channel while listing is still sending or
list has already closed on the sender's side:
```
panic: close of closed channel
goroutine 13666 [running]:
github.com/minio/minio/internal/ioutil.SafeClose[...](0x101ff51e4?)
/Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:425 +0x24
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1()
/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2142 +0x170
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk in goroutine 1189
/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:1985 +0x228
```
Object names of directory objects qualified for ExpiredObjectAllVersions
must be encoded appropriately before calling on deletePrefix on their
erasure set.
e.g., a directory object and regular objects with overlapping prefixes
could lead to the expiration of regular objects, which is not the
intention of ILM.
```
bucket/dir/ ---> directory object
bucket/dir/obj-1
```
When `bucket/dir/` qualifies for expiration, the current implementation would
remove regular objects under the prefix `bucket/dir/`, in this case,
`bucket/dir/obj-1`.
In handlers related to health diagnostics e.g. CPU, Network, Partitions,
etc, globalMinioHost was being passed as the addr, resulting in empty
value for the same in the health report.
Using globalLocalNodeName instead fixes the issue.
IAM loading is a lazy operation, allow these
fallbacks to be in place when we cannot find
in-memory state().
this allows us to honor the request even if pay
a small price for lookup and populating the data.
When objects have more versions than their ILM policy expects to retain
via NewerNoncurrentVersions, but they don't qualify for expiry due to
NoncurrentDays are configured in that rule.
In this case, applyNewerNoncurrentVersionsLimit method was enqueuing empty
tasks, which lead to a panic (panic: runtime error: index out of range [0] with
length 0) in newerNoncurrentTask.OpHash method, which assumes the task
to contain at least one version to expire.
When returning the status of a decommissioned pool, a pool with zero
time StartedTime will be considered an active pool, which is unexpected.
This commit will always ensure that a pool's canceled/failed/completed
status is returned.
we were prematurely not writing 4k pages while we
could have due to the fact that most buffers would
be multiples of 4k upto some number and there shall
be some remainder.
We only need to write the remainder without O_DIRECT.
at scale customers might start with failed drives,
causing skew in the overall usage ratio per EC set.
make this configurable such that customers can turn
this off as needed depending on how comfortable they
are.
Currently, the code relies on object parity to decide whether it is a
delete marker or a regular object. In the case of a delete marker, the
return quorum is half of the disks in the erasure set. However, this
calculation must be corrected with objects with EC = 0, mainly
because EC is not a one-time fixed configuration.
Though all data are correct, the manifested symptom is a 503 with an
EC=0 object. This bug was manifested after we introduced the
fast Get Object feature that does not read all data from all disks in
case of inlined objects
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.
This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`
The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:
when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no
resource found error
All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.