Commit Graph

6275 Commits

Author SHA1 Message Date
Harshavardhana 134db72bb7
fix: reject service account access key same as root credentials (#19055) 2024-02-14 10:37:12 -08:00
Harshavardhana effe21f3eb
send correct objectname in audit events for DeleteAll ILM (#19053) 2024-02-14 08:07:58 -08:00
Praveen raj Mani 1118b285d3
fix: race in deleting objects during batch expiry (#19054) 2024-02-14 08:07:44 -08:00
Aditya Manthramurthy a14e192376
fix: remove unnecessary panic in iam-store (#19050) 2024-02-13 19:29:36 -08:00
Minio Trusted f8e15e7d09 Update yaml files to latest version RELEASE.2024-02-13T15-35-11Z 2024-02-13 16:01:38 +00:00
Shireesh Anjal 7b9f9e0628
fix incorrect disk io stats in k8s environment (#19016)
The previous logic of calculating per second values for disk io stats
divides the stats by the host uptime. This doesn't work in k8s
environment as the uptime is of the pod, but the stats (from
/proc/diskstats) are from the host.

Fix this by storing the initial values of uptime and the stats at the
timme of server startup, and using the difference between current and
initial values when calculating the per second values.
2024-02-13 07:35:11 -08:00
Praveen raj Mani ac8e9ce04f
Send a bucket notification event on DeleteObject() for non-existing object (#19037)
Send a bucket notification event on DeleteObject for non-existing objects
2024-02-13 07:34:17 -08:00
Praveen raj Mani cfd8645843
fix: update batch replication stats for snowball uploads (#19045) 2024-02-13 07:33:27 -08:00
Harshavardhana 0c068b15c7
add missing handler for reloading site replication config on peers (#19042) 2024-02-13 06:55:54 -08:00
Anis Eleuch 30a466aa71
sts: Add test for DurationSeconds condition (#19044) 2024-02-13 06:55:37 -08:00
Taran Pelkey 4d94609c44
FIx unexpected behavior when creating service account (#19036) 2024-02-13 02:31:43 -08:00
Poorna 0cc9fb73e1
metrics: fix typo in namespace for proxy tagging metric (#19039)
Relevant PR introducing this metric: #18957
2024-02-12 13:02:27 -08:00
Harshavardhana eac4e4b279
honor replaced disk properly by updating globalLocalDrives (#19038)
globalLocalDrives seem to be not updated during the
HealFormat() leads to a requirement where the server
needs to be restarted for the healing to continue.
2024-02-12 13:00:20 -08:00
Harshavardhana 6d381f7c0a
relax pre-emptive GetBucketInfo() for multi-object delete (#19035) 2024-02-12 08:46:46 -08:00
Anis Eleuch 4fa06aefc6
Convert service account add/update expiration to cond values (#19024)
In order to force some users allowed to create or update a service
account to provide an expiration satifying the user policy conditions.
2024-02-12 08:36:16 -08:00
Harshavardhana 0e177a44e0
preserve conflicting objects when parent object is being deleted (#19034)
a/prefix
a/prefix/1.txt

where `a/prefix` is an object which does not have `/` at the end,
we do not have to aggressively recursively delete all the sub-folders
as well. Instead convert the call into self contained to deleting
'xl.meta' and then subsequently attempting to Remove the parent.
2024-02-12 08:30:40 -08:00
Harshavardhana afd19de5a9
fix: allow configuring excess versions alerting (#19028)
Bonus: enable audit alerts for object versions
beyond the configured value, default is '100'
versions per object beyond which scanner will
alert for each such objects.
2024-02-11 23:41:53 -08:00
Harshavardhana e3fbac9e24
do not have to use the same distributionAlgo as first pool (#19031)
when we expand via pools, there is no reason to stick
with the same distributionAlgo as the rest. Since the
algo only makes sense with-in a pool not across pools.

This allows for newer pools to use newer codepaths to
avoid legacy file lookups when they have a pre-existing
deployment from 2019, they can expand their new pool
to be of a newer distribution format, allowing the
pool to be more performant.
2024-02-11 23:21:56 -08:00
Poorna a9cf32811c
Fix panic in tagging request proxying (#19032) 2024-02-11 18:18:43 -08:00
Harshavardhana 53997ecc79
avoid excessive logging for objects that do not exist (#19030)
in replicated setups, that have proxying enabled for
replicated buckets.
2024-02-11 14:21:08 -08:00
Harshavardhana 997ba3a574
introduce reader deadlines for net.Conn (#19023)
Bonus: set "retry-after" header for AWS SDKs if possible to honor them.
2024-02-09 13:25:16 -08:00
Harshavardhana 62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Harshavardhana 404d8b3084
fix: dangling objects honor parityBlocks instead of dataBlocks (#19019)
Bonus: do not recreate buckets if NoRecreate is asked.
2024-02-08 15:22:16 -08:00
Klaus Post 6005ad3d48
Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Harshavardhana 035a3ea4ae
optimize startup sequence performance (#19009)
- bucket metadata does not need to look for legacy things
  anymore if b.Created is non-zero

- stagger bucket metadata loads across lots of nodes to
  avoid the current thundering herd problem.

- Remove deadlines for RenameData, RenameFile - these
  calls should not ever be timed out and should wait
  until completion or wait for client timeout. Do not
  choose timeouts for applications during the WRITE phase.

- increase R/W buffer size, increase maxMergeMessages to 30
2024-02-08 11:21:21 -08:00
Aditya Manthramurthy e104b183d8
fix: skip policy usage validation for cache update (#19008)
When updating the policy cache, we do not need to validate policy usage
as the policy has already been deleted by the node sending the
notification.
2024-02-07 20:39:53 -08:00
Klaus Post 7e082f232e
Add GetBucketInfo toStorageErr conversion (#19005)
Convert error to storageError since it is used for quorum calculations here: ff80cfd83d/cmd/peer-s3-client.go (L339)
2024-02-07 14:24:24 -08:00
Harshavardhana d28bf71f25
listing must return WalkDir() errors first (#19006) 2024-02-07 13:20:07 -08:00
Harshavardhana 5b1a74b6b2
do not block iam.store registration (#18999)
current implementation would quite simply
block the sys.store registration, making
sys.Initialized() call to be blocked.
2024-02-07 12:41:58 -08:00
Klaus Post ebc6c9b498
Fix tracing send on closed channel (#18982)
Depending on when the context cancelation is picked up the handler may return and close the channel before `SubscribeJSON` returns, causing:

```
Feb 05 17:12:00 s3-us-node11 minio[3973657]: panic: send on closed channel
Feb 05 17:12:00 s3-us-node11 minio[3973657]: goroutine 378007076 [running]:
Feb 05 17:12:00 s3-us-node11 minio[3973657]: github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON.func1()
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:139 +0x12d
Feb 05 17:12:00 s3-us-node11 minio[3973657]: created by github.com/minio/minio/internal/pubsub.(*PubSub[...]).SubscribeJSON in goroutine 378010884
Feb 05 17:12:00 s3-us-node11 minio[3973657]:         github.com/minio/minio/internal/pubsub/pubsub.go:124 +0x352
```

Wait explicitly for the goroutine to exit.

Bonus: Listen for doneCh when sending to not risk getting blocked there is channel isn't being emptied.
2024-02-06 08:57:30 -08:00
Harshavardhana 630963fa6b
protect tracker copy properly to avoid race (#18984)
```
WARNING: DATA RACE
Write at 0x00c000aac1e0 by goroutine 1133:
  github.com/minio/minio/cmd.(*healingTracker).updateProgress()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:183 +0x117
  github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5()
      github.com/minio/minio/cmd/global-heal.go:292 +0x1d3

Previous read at 0x00c000aac1e0 by goroutine 1003:
  github.com/minio/minio/cmd.(*allHealState).updateHealStatus()
      github.com/minio/minio/cmd/admin-heal-ops.go:136 +0xcb
  github.com/minio/minio/cmd.(*healingTracker).save()
      github.com/minio/minio/cmd/background-newdisks-heal-ops.go:223 +0x424
```
2024-02-06 08:56:59 -08:00
Harshavardhana f674168b8b
Add missing gob register for map[string]string{} (#18974)
```
minio[1303918]: API: SYSTEM()
minio[1303918]: Time: 02:04:28 UTC 02/05/2024
minio[1303918]: DeploymentID: 0972de33-2d17-4499-8967-aff6437dd9da
minio[1303918]: Error: gob: type not registered for interface: map[string]string (*errors.errorString)
minio[1303918]:        4: internal/logger/logonce.go:118:logger.(*logOnceType).logOnceIf()
minio[1303918]:        3: internal/logger/logonce.go:149:logger.LogOnceIf()
minio[1303918]:        2: cmd/peer-rest-server.go:533:cmd.(*peerRESTServer).GetSysConfigHandler()
minio[1303918]:        1: net/http/server.go:2136:http.HandlerFunc.ServeHTTP()
```
2024-02-06 08:23:23 -08:00
Poorna 27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana 794a7993cb
calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana 6f16d1cb2c
do not count context canceled as timeout errors (#18975) 2024-02-05 18:16:13 -08:00
Anis Eleuch 7aa00bff89
sts: Add support of AssumeRoleWithWebIdentity and DurationSeconds (#18835)
To force limit the duration of STS accounts, the user can create a new
policy, like the following:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sts:AssumeRoleWithWebIdentity"],
    "Condition": {"NumericLessThanEquals": {"sts:DurationSeconds": "300"}}
  }]
}

And force binding the policy to all OpenID users, whether using a claim name or role
ARN.
2024-02-05 11:44:23 -08:00
Klaus Post e046eb1d17
Disable Rename2 metrics on non-linux (#18970)
Logging a call that always fails is pointless.
2024-02-05 10:48:14 -08:00
Anis Eleuch ba975ca320
Add defensive code to ignore checking parts with transitioned objects (#18973)
Though dataErrs are nil with transitioned objects, add a more defensive
code to ignore counting missing parts in that case
2024-02-05 10:48:03 -08:00
Harshavardhana fec13b0ec1
remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Harshavardhana 100c35c281
avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Harshavardhana f225ca3312
Add more advanced cases for dangling (#18968) 2024-02-04 14:36:13 -08:00
Frank Wessels 8b68e0bfdc
Fix typo in api-router.go (#18955) 2024-02-03 14:03:51 -08:00
Anis Eleuch 6ae97aedc9
xl: Disable rename2 in decommissioning/rebalance (#18964)
Always disable rename2 optimization in decom/rebalance
2024-02-03 14:03:30 -08:00
Harshavardhana 960d604013
disconnected returns, an unexpected error to List() returning 500s (#18959)
provide the error string appropriately so that the
matching of error types works.

Also add a string based fallback for the said error.
2024-02-03 01:04:33 -08:00
Harshavardhana ff80cfd83d
move Make,Delete,Head,Heal bucket calls to websockets (#18951) 2024-02-02 14:54:54 -08:00
Harshavardhana 99fde2ba85
deprecate disk tokens, instead rely on deadlines and active monitoring (#18947)
disk tokens usage is not necessary anymore with the implementation
of deadlines for storage calls and active monitoring of the drive
for I/O timeouts.

Functionality kicking off a bad drive is still supported, it's just that 
we do not have to serialize I/O in the manner tokens would do.
2024-02-02 10:10:54 -08:00
Frank Wessels 31743789dc
Fix some leftover issues from PR 18936 (#18946) 2024-02-01 19:42:56 -08:00
Anis Eleuch 6fd63e920a
log: Use error log type instead of Application/MinIO type (#18930)
* log: Use error log type instead of Application/MinIO type

Also bump github.com/shirou/gopsutil version to address cross
compilation issues.

* Apply suggestions from code review

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>

---------

Co-authored-by: Anis Eleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2024-02-01 16:13:57 -08:00
Aditya Manthramurthy 59cc3e93d6
fix: `null` inline policy handling for access keys (#18945)
Interpret `null` inline policy for access keys as inheriting parent
policy. Since MinIO Console currently sends this value, we need to honor it
for now. A larger fix in Console and in the server are required.

Fixes #18939.
2024-02-01 14:45:03 -08:00
Anis Eleuch 61a4bb38cd
batch: Fix a typo while validating smallerThan field (#18942) 2024-02-01 13:53:26 -08:00
Klaus Post b192bc348c
Improve object reuse for grid messages (#18940)
Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers.

This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool.

Add internal request reuse of internal types. Add for safe (pointerless) types explicitly.

Only log params for internal types. Doing Sprint(obj) is just a bit too messy.
2024-02-01 12:41:20 -08:00
Harshavardhana 6440d0fbf3
move a collection of peer APIs to websockets (#18936) 2024-02-01 10:47:20 -08:00
Anis Eleuch 24ecc44bac
Keep ServiceV1 admin stop/restart API and mark as deprecated (#18932) 2024-01-31 12:20:33 -08:00
Aditya Manthramurthy 0ae4915a93
fix: permission checks for editing access keys (#18928)
With this change, only a user with `UpdateServiceAccountAdminAction`
permission is able to edit access keys.

We would like to let a user edit their own access keys, however the
feature needs to be re-designed for better security and integration with
external systems like AD/LDAP and OpenID.

This change prevents privilege escalation via service accounts.
2024-01-31 10:56:45 -08:00
Harshavardhana caac9d216e
remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00
Harshavardhana 057192913c
add total usable capacity, free and used to DataUsageInfo() (#18921) 2024-01-30 17:49:37 -08:00
Harshavardhana f25cbdf43c
use all the available nr_requests for NVMe (#18920) 2024-01-30 14:10:06 -08:00
Klaus Post 6da4a9c7bb
Improve tracing & notification scalability (#18903)
* Perform JSON encoding on remote machines and only forward byte slices.
* Migrate tracing & notification to WebSockets.
2024-01-30 12:49:02 -08:00
Harshavardhana 80ca120088
remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Anis Eleuch a669946357
Add cgroup v2 support for memory limit (#18905) 2024-01-30 11:13:27 -08:00
Poorna 7ffc162ea8
exclude veeam virtual objects from replication (#18918)
Fixes: #18916
2024-01-30 10:43:58 -08:00
Poorna bcfd7fbbcf
reuse transports for callhome and remote tgt validation (#18912) 2024-01-29 23:05:39 -08:00
Harshavardhana 486e2e48ea
enable xattr capture by default (#18911)
- healing must not set the write xattr
  because that is the job of active healing
  to update. what we need to preserve is
  permanent deletes.

- remove older env for drive monitoring and
  enable it accordingly, as a global value.
2024-01-29 23:03:58 -08:00
Harshavardhana 2ddf2ca934
allow configuring maximum idle connections per host (#18908) 2024-01-29 16:50:37 -08:00
Poorna 29b1a29044
fix metrics panic in node metrics endpoint (#18894) 2024-01-29 12:32:44 -08:00
jiuker b4ab8e095a
fix: preserve bucket metric of data usage for replication info (#18895) 2024-01-29 08:54:20 -08:00
Harshavardhana cff8235068 remove getReplicationNodeMetrics() from peer metrics groups 2024-01-28 18:45:20 -08:00
Harshavardhana 944f3c1477
remove local disk metrics from cluster metrics (#18886)
local disk metrics were polluting cluster metrics
Please remove them instead of adding relevant ones.

- batch job metrics were incorrectly kept at bucket
  metrics endpoint, move it to cluster metrics.

- add tier metrics to cluster peer metrics from the node.

- fix missing set level cluster health metrics
2024-01-28 12:53:59 -08:00
Harshavardhana 1d3bd02089
avoid close 'nil' panics if any (#18890)
brings a generic implementation that
prints a stack trace for 'nil' channel
closes(), if not safely closes it.
2024-01-28 10:04:17 -08:00
Harshavardhana 6347fb6636
add missing proper error return in WalkDir() (#18884)
without this the caller might end up returning
incorrect errors and not ignoring the drive
properly.
2024-01-27 16:13:41 -08:00
Harshavardhana 32e668eb94
update() stale rebalance stats() object during pool expansion (#18882)
it is entirely possible that a rebalance process which was running
when it was asked to "stop" it failed to write its last statistics
to the disk.

After this a pool expansion can cause disruption and all S3 API
calls would fail at IsPoolRebalancing() function.

This PRs makes sure that we update rebalance.bin under such
conditions to avoid any runtime crashes.
2024-01-27 10:14:03 -08:00
Harshavardhana c88308cf0e
avoid 'panic' on mc admin update for single drive setup (#18876) 2024-01-26 12:07:03 -08:00
Harshavardhana 88837fb753
add new update v2 that updates per node, allows idempotent behavior (#18859)
add new update v2 that updates per node, allows idempotent behavior

new API ensures that

- binary is correct and can be downloaded checksummed verified
- committed to actual path
- restart returns back the relevant waiting drives
2024-01-26 08:40:13 -08:00
Harshavardhana d0283ff354
remove unnecessary logs in HealBucket() (#18875) 2024-01-26 08:39:57 -08:00
Harshavardhana f449a7ae2c
allow bucket import to be idempotent (#18873)
do not need to be defensive in our approach,
we should simply override anything everything
in import process, do not care about what
currently exists on the disk - backup is the
source of truth.
2024-01-25 17:20:54 -08:00
Klaus Post a113b2c394
Fix inspect format.json exclusion (#18871)
Right now the format.json is excluded if anything within `.minio.sys` is requested.

I assume the check was meant to exclude only if it was actually requesting it.
2024-01-25 15:59:00 -08:00
Harshavardhana 74851834c0
further bootstrap/startup optimization for reading 'format.json' (#18868)
- Move RenameFile to websockets
- Move ReadAll that is primarily is used
  for reading 'format.json' to to websockets
- Optimize DiskInfo calls, and provide a way
  to make a NoOp DiskInfo call.
2024-01-25 12:45:46 -08:00
Harshavardhana e377bb949a
migrate bootstrap logic directly to websockets (#18855)
improve performance for startup sequences by 2x for 300+ nodes.
2024-01-24 13:36:44 -08:00
Poorna b6e9d235fe
fix replication error logs to include target endpoint (#18863) 2024-01-24 13:05:43 -08:00
Klaus Post 4a6c97463f
Fix all racy use of NewDeadlineWorker (#18861)
AlmosAll uses of NewDeadlineWorker, which relied on secondary values, were used in a racy fashion,
which could lead to inconsistent errors/data being returned. It also propagates the deadline downstream.

Rewrite all these to use a generic WithDeadline caller that can return an error alongside a value.

Remove the stateful aspect of DeadlineWorker - it was racy if used - but it wasn't AFAICT.

Fixes races like:

```
WARNING: DATA RACE
Read at 0x00c130b29d10 by goroutine 470237:
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).ReadVersion()
      github.com/minio/minio/cmd/xl-storage-disk-id-check.go:702 +0x611
  github.com/minio/minio/cmd.readFileInfo()
      github.com/minio/minio/cmd/erasure-metadata-utils.go:160 +0x122
  github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.1()
      github.com/minio/minio/cmd/erasure-object.go:809 +0x27a
  github.com/minio/minio/cmd.erasureObjects.getObjectFileInfo.func1.2()
      github.com/minio/minio/cmd/erasure-object.go:828 +0x61

Previous write at 0x00c130b29d10 by goroutine 470298:
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).ReadVersion.func1()
      github.com/minio/minio/cmd/xl-storage-disk-id-check.go:698 +0x244
  github.com/minio/minio/internal/ioutil.(*DeadlineWorker).Run.func1()
      github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33

WARNING: DATA RACE
Write at 0x00c0ba6e6c00 by goroutine 94507:
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).StatVol.func1()
      github.com/minio/minio/cmd/xl-storage-disk-id-check.go:419 +0x104
  github.com/minio/minio/internal/ioutil.(*DeadlineWorker).Run.func1()
      github.com/minio/minio/internal/ioutil/ioutil.go:141 +0x33

Previous read at 0x00c0ba6e6c00 by goroutine 94463:
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).StatVol()
      github.com/minio/minio/cmd/xl-storage-disk-id-check.go:422 +0x47e
  github.com/minio/minio/cmd.getBucketInfoLocal.func1()
      github.com/minio/minio/cmd/peer-s3-server.go:275 +0x122
  github.com/minio/pkg/v2/sync/errgroup.(*Group).Go.func1()
```

Probably back from #17701
2024-01-24 10:08:31 -08:00
Frank Wessels 6c912ac960
Fix startup message when using single path (#18856) 2024-01-24 10:02:56 -08:00
Harshavardhana 708cebe7f0
add necessary protection err, fileInfo slice reads and writes (#18854)
protection was in place. However, it covered only some
areas, so we re-arranged the code to ensure we could hold
locks properly.

Along with this, remove the DataShardFix code altogether,
in deployments with many drive replacements, this can affect
and lead to quorum loss.
2024-01-24 01:08:23 -08:00
Harshavardhana f78d677ab6
pre-allocate EC memory by default at startup (#18846) 2024-01-23 20:41:11 -08:00
Poorna e39e2306d6
site replication: remove extraneous log for missing group (#18785) 2024-01-23 18:28:11 -08:00
Harshavardhana 52229a21cb
avoid reload of 'format.json' over the network under normal conditions (#18842) 2024-01-23 14:11:46 -08:00
Harshavardhana 961f7dea82
compress binary while sending it to all the nodes (#18837)
Also limit the amount of concurrency when sending
binary updates to peers, avoid high network over
TX that can cause disconnection events for the
node sending updates.
2024-01-22 12:16:36 -08:00
Shubhendu 65c4d550cb
Distribution bucket metrics with site replication (#18841)
If site replication is enabled, we should still show the size and
version distribution histogram metrics at bucket level.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-01-22 08:45:36 -08:00
Harshavardhana f9b4a8d6e8
improve server update behavior by re-using memory properly (#18831) 2024-01-19 18:27:58 -08:00
Harshavardhana e11d851aee
add new drive I/O waiting/tokens metric (#18836)
Bonus: add virtual memory used as well part of the system resource metrics.
2024-01-19 14:51:36 -08:00
Harshavardhana ac81f0248c
introduce new ServiceV2 API to handle guided restarts (#18826)
New API now verifies any hung disks before restart/stop,
provides a 'per node' break down of the restart/stop results.

Provides also how many blocked syscalls are present on the
drives and what users must do about them.

Adds options to do pre-flight checks to provide information
to the user regarding any hung disks. Provides 'force' option
to forcibly attempt a restart() even with waiting syscalls
on the drives.
2024-01-19 14:22:36 -08:00
Aditya Manthramurthy cc960adbee
fix: remove policy mapping file when empty (#18828)
On a policy detach operation, if there are no policies remaining
attached to the user/group, remove the policy mapping file, instead of
leaving a file containing an empty list of policies.
2024-01-19 10:31:40 -08:00
Shubhendu 19387cafab
Use +Inf label additionally for Histogram metrics (#18807) 2024-01-18 14:51:28 -08:00
Harshavardhana 7c0673279b
capture I/O in waiting and total tokens in diskMetrics (#18819)
This is needed for the subsequent changes
in ServerUpdate(), ServerRestart() etc.
2024-01-18 11:17:43 -08:00
Anis Eleuch 7ce0d71a96
Do not log volume not empty when healing dangling buckets (#18822)
Healing dangling buckets is conservative, and it is a typical use case to
fail to remove a dangling bucket because it contains some data because
healing danging bucket code is not allowed to remove data: only healing
the dangling object is allowed to do so.
2024-01-18 10:39:27 -08:00
Harshavardhana dd2542e96c
add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
Harshavardhana 21d60eab7c
remove all older unused APIs (#18769) 2024-01-17 20:41:23 -08:00
Harshavardhana a4a74e9844 re-init the worker group to ensure errs[] slice is fresh 2024-01-17 20:33:25 -08:00
Harshavardhana 9588978028
fix: HealBucket regression for empty buckets, simplify it (#18815) 2024-01-17 15:19:09 -08:00
chienguo 8cd967803c
fix: a typo in storeDataUsageInBackend() comment (#18778) 2024-01-16 15:48:54 -08:00
Harshavardhana a0e1163fb6
reject reference format from a different deployment (#18800)
reference format is constant for any lifetime of
a minio cluster, we do not have to ever replace
it during HealFormat() as it will never change.

additionally we should simply reject reference
formats that we do not understand early on.
2024-01-16 15:13:14 -08:00
Sveinn 30bd5e2669
adding a missing return case to fix GetObjectTagging (#18793) 2024-01-15 16:11:06 -08:00
Harshavardhana 38637897ba
fix: listing SSE encrypted multipart objects (#18786)
GetActualSize() was heavily relying on o.Parts()
to be non-empty to figure out if the object is multipart or not, 
However, we have many indicators of whether an object is multipart 
or not.

Blindly assuming that o.Parts == nil is not a multipart, is an 
incorrect expectation instead, multipart must be obtained via

- Stored metadata value indicating this is a multipart encrypted object.

- Rely on <meta>-actual-size metadata to get the object's actual size.
  This value is preserved for additional reasons such as these.

- ETag != 32 length
2024-01-15 00:57:49 -08:00
Harshavardhana 993d96feef
treat all localhost endpoints as local setup with same port (#18784)
fixes #18783 and avoids user mistakes
2024-01-12 23:53:03 -08:00
Poorna b2b26d9c95
support proxying of tagging requests in replication (#18649)
support proxying of tagging requests in active-active replication

Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging
will continue to report a 404 since the object is not present locally.
2024-01-12 23:51:33 -08:00
Krishnan Parthasarathi cba3dd276b
Add more size intervals to obj size histogram (#18772)
New intervals:
[1024B, 64KiB)
[64KiB, 256KiB)
[256KiB, 512KiB)
[512KiB, 1MiB)

The new intervals helps us see object size distribution with higher
resolution for the interval [1024B, 1MiB).
2024-01-12 23:51:08 -08:00
Anis Eleuch a47fc75c26
xl: Remove wrong wording for errCorruptedFormat (#18775)
Also add errCorruptedBackend to make it easier to differentiate between
corrupted content or something else wrong in the backend drive
2024-01-12 14:48:44 -08:00
Harshavardhana e5c8794b8b
avoid disk monitoring leaks under various conditions (#18777)
- HealFormat() was leaking healthcheck goroutines for
  disks, we are only interested in enabling healthcheck
  for the newly formatted disk, not for existing disks.

- When disk is a root-disk a random disk monitor was
  leaking while we ignored the drive.

- When loading the disk for each erasure set, we were
  leaking goroutines for the prepare-storage.go disks
  which were replaced via the globalLocalDrives slice

- avoid disk monitoring utilizing health tokens that
  would cause exhaustion in the tokens, prematurely
  which were meant for incoming I/O. This is ensured
  by avoiding writing O_DIRECT aligned buffer instead
  write 2048 worth of content only as O_DSYNC, which is
  sufficient.
2024-01-12 01:48:36 -08:00
Taran Pelkey ac90a873eb
Verify that remote target bucket is on MinIO server for bucket replication (#18656) 2024-01-11 14:56:16 -08:00
jiuker c1a78224cf
fix: prevent queries from starting before initialization (#18766) 2024-01-10 15:21:52 -08:00
Harshavardhana 39f9350697
optimize readdir() open calls to be dealt with directly via 'fd' (#18762) 2024-01-10 08:48:50 -08:00
Shubhendu e31081d79d
Heal buckets at node level (#18612)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-01-09 20:34:04 -08:00
Harshavardhana f02d282754
avoid frivolous logs for expired credentials (#18767) 2024-01-09 12:25:18 -08:00
Krishnan Parthasarathi 3a90af0bcd
Add line, col to types used in batch-expire (#18747) 2024-01-08 15:22:28 -08:00
jiuker 53ceb0791f
fix: prevent queries from starting before initialization (#18756)
Prevent queries from starting before initialization
2024-01-08 12:40:27 -08:00
jiuker 2cd98a0d21
remove outdated notes (#18755) 2024-01-08 08:04:19 -08:00
Anis Eleuch 04135fa6cd
audit: Add the drives where the dangling object is removed (#18737) 2024-01-05 14:17:24 -08:00
Harshavardhana 42dc6329e6
simplify success response for GetObjectAttributes() (#18746) 2024-01-05 12:50:07 -08:00
Sveinn 9b8ba97f9f
feat: add support for GetObjectAttributes API (#18732) 2024-01-05 10:43:06 -08:00
Anis Eleuch 7705605b5a
scanner: Add a config to disable short sleep between objects scan (#18734)
Add a hidden configuration under the scanner sub section to configure if
the scanner should sleep between two objects scan. The configuration has
only effect when there is no drive activity related to s3 requests or
healing.

By default, the code will keep the current behavior which is doing
sleep between objects.

To forcefully enable the full scan speed in idle mode, you can do this:

   `mc admin config set myminio scanner idle_speed=full`
2024-01-04 15:07:17 -08:00
Anis Eleuch 414bcb0c73
prom: Add read quorum per erasure set metric (#18736) 2024-01-04 15:05:13 -08:00
Harshavardhana f4710948c4
fix: an odd crash when deleting `null` DEL markers (#18727)
fixes #18724

A regression was introduced in #18547, that attempted
to file adding a missing `null` marker however we
should not skip returning based on versionID instead
it must be based on if we are being asked to create
a DEL marker or not.

The PR also has a side-affect for replicating `null`
marker permanent delete, as it may end up adding a
`null` marker while removing one.

This PR should address both scenarios.
2024-01-02 15:08:18 -08:00
Anis Eleuch 3f4488c589
scanner: Allow full throttle if there is no parallel disk ops (#18109) 2024-01-02 13:51:24 -08:00
Pedro Juarez 8f13c8c3bf
Support to store browser config settings (#18631)
* csp_policy
* hsts_seconds
* hsts_include_subdomains
* hsts_preload
* referrer_policy
2024-01-01 08:36:33 -08:00
Zhou Ting 31d16f6cc2
allow sha256 payload to be configurable for object perf test (#18712)
Signed-off-by: Zhou Ting <ting.z.zhou@intel.com>
2023-12-29 23:56:50 -08:00
Harshavardhana a50ea92c64
feat: introduce list_quorum="auto" to prefer quorum drives (#18084)
NOTE: This feature is not retro-active; it will not cater to previous transactions
on existing setups. 

To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment
variable as part of systemd service or k8s configmap. 

Once this has been enabled, you need to also set `list_quorum`. 

```
~ mc admin config set alias/ api list_quorum=auto` 
```

A new debugging tool is available to check for any missing counters.
2023-12-29 15:52:41 -08:00
Harshavardhana 5b2ced0119
re-use globalLocalDrives properly (#18721) 2023-12-29 09:30:10 -08:00
Anis Eleuch 8a0ba093dd
audit: Fix merrs and derrs object dangling message (#18714)
merrs and derrs are empty when a dangling object is deleted. Fix the bug
and adds invalid-meta data for data blocks
2023-12-27 22:27:04 -08:00
Daniel Valdivia 5fc7da345d
Upgrade Console to v0.44.0 (#18717)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-12-27 11:19:13 -08:00
Anis Eleuch 8bd4f6568b
server-info: Avoid initializing audit/log http/kafka targets (#18703)
This can cause unnecessary ServerInfo() call delay.
2023-12-22 10:25:08 -08:00
Harshavardhana da55499db0
fix: reject clients that do not send proper payload (#18701) 2023-12-22 01:26:17 -08:00
Anis Eleuch 22f8e39b58
tier: Allow edit of the new Azure and AWS auth params (#18690)
Allow editing for the service principal credentials from Azure
and the web identity token for AWS;

Also, more validation of input parameters.
2023-12-21 16:58:10 -08:00
Harshavardhana eba23bbac4
rename object_size -> block_size for cache subsystem (#18694) 2023-12-21 16:57:13 -08:00
Harshavardhana 4550535cbb
send proper IPv6 names avoid bracketing notation (#18699)
Following policies if present

```
       "Condition": {
         "IpAddress": {
            "aws:SourceIp": [
              "54.240.143.0/24",
               "2001:DB8:1234:5678::/64"
             ]
          }
        }
```

And client is making a request to MinIO via IPv6 can
potentially crash the server.

Workarounds are turn-off IPv6 and use only IPv4
2023-12-21 16:56:55 -08:00
Anis Eleuch 8432fd5ac2
prom: Add online and healing drives metrics per erasure set (#18700) 2023-12-21 16:56:43 -08:00
Harshavardhana 7c948adf88
allow pre-allocating buffers to reduce frequent GCs during growth (#18686)
This PR also increases per node bpool memory from 1024 entries
to 2048 entries; along with that, it also moves the byte pool
centrally instead of being per pool.
2023-12-21 08:59:38 -08:00
Krishnan Parthasarathi 56b7045c20
Export tier metrics (#18678)
minio_node_tier_ttlb_seconds - Distribution of time to last byte for streaming objects from warm tier
minio_node_tier_requests_success - Number of requests to download object from warm tier that were successful
minio_node_tier_requests_failure - Number of requests to download object from warm tier that failed
2023-12-20 20:13:40 -08:00
Poorna d55b6b9909
Fix quota config replication for SR (#18684)
Fixing regression introduced by PR #17988
2023-12-19 13:22:47 -08:00
Shireesh Anjal 7680e5f81d
Read new key license_v2 from SUBNET response (#18669)
SUBNET now has a v2 of license that is returned in the new key
`license_v2`. mc will start reading and storing the same. (The old key
`license` is deprecated but is still available in SUBNET response to
ensure that the current released version of minio doesn't break)
2023-12-18 08:21:44 -08:00
Taran Pelkey ad8a34858f
Add APIs to create and list access keys for LDAP (#18402) 2023-12-15 13:00:43 -08:00
Krishnan Parthasarathi 162eced7d2
Fix incorrect metric desc for bucketRequestsDuration (#18657) 2023-12-14 19:02:11 -08:00
Krishnan Parthasarathi bec1f7c26a
metrics: Refactor handling of histogram vectors (#18632) 2023-12-14 14:02:52 -08:00
Anis Eleuch 8771617199
tier: Add support of AWS S3 tiering with web identity token file (#18648) 2023-12-14 14:01:49 -08:00
Klaus Post 6c89a81af4
Fix CreateFile shared buffer corruption. (#18652)
`(*xlStorageDiskIDCheck).CreateFile` wraps the incoming reader in `xioutil.NewDeadlineReader`.

The wrapped reader is handed to `(*xlStorage).CreateFile`. This performs a Read call via `writeAllDirect`, 
which reads into an `ODirectPool` buffer.

`(*DeadlineReader).Read` spawns an async read into the buffer. If a timeout is hit while reading, 
the read operation returns to `writeAllDirect`. The operation returns an error and the buffer is reused.

However, if the async `Read` call unblocks, it will write to the now recycled buffer.

Fix: Remove the `DeadlineReader` - it is inherently unsafe. Instead, rely on the network timeouts. 
This is not a disk timeout, anyway.

Regression in https://github.com/minio/minio/pull/17745
2023-12-14 10:51:57 -08:00
Praveen raj Mani 10ca0a6936
Label the notification target metrics by their target IDs (#18633)
This patch adds the targetID to the existing notification target metrics
and deprecates the current target metrics which points to the overall
event notification subsystem
2023-12-14 09:09:26 -08:00
Harshavardhana b3314e97a6
re-use the same local drive used by remote-peer (#18645)
historically, we have always kept storage-rest-server
and a local storage API separate without much trouble,
since they both can independently operate due to no
special state() between them.

however, over some time, we have added state()
such as

- drive monitoring threads now there will be "2" of
  them per drive instead of just 1.

- concurrent tokens available per drive are now twice
  instead of just single shared, allowing unexpectedly
  high amount of I/O to go through.

- applying serialization by using walkMutexes can now
  be adequately honored for both remote callers and local
  callers.
2023-12-13 19:27:55 -08:00
Poorna 3781a0f9ad
replication: Pass metadata timestamps in CopyObject call (#18647)
Regression from #18285. CopyObject options were inheriting source MTime
for metadata timestamps if unspecified, removing this prevented metadata
updates from being applied on target.
2023-12-13 15:28:55 -08:00
Poorna e79b289325
fix datadir missing check on HeadObject (#18646)
versions pending purge in replication were seeing a errFileCorrupt
that prevents permanent deletion after replication.

Regression from PR#18477
2023-12-13 14:54:01 -08:00
Harshavardhana 3f72c7fcc7
healthcheck requests with user-agent mozilla do not need redirects (#18642)
apparently, windows powershell curl has this abhorrent behavior
2023-12-12 16:16:26 -08:00
Harshavardhana d521c84d55
reduce logging during permission denied errors (#18641)
log them if any only once
2023-12-12 16:11:17 -08:00
Anis Eleuch 4a21dce2b5
tier: Add support of SP credentials with Azure (#18630)
Co-authored-by: Anis Elleuch <anis@min.io>
2023-12-11 21:51:53 -08:00
Harshavardhana 65f34cd823
fix: remove ODirectReader entirely since we do not need it anymore (#18619) 2023-12-09 10:17:51 -08:00
Harshavardhana 196e7e072b
allow bitrot files to be healed in MRF (#18618)
bitrot scanMode was ignored in MRF,
allow it to heal relevant content if
needed when seen as an error.
2023-12-08 12:26:01 -08:00
Anis Eleuch 6f97663174
yml-config: Add support of rootUser and rootPassword (#18615)
Users can define the root user and password in the yaml configuration
file; Root credentials defined in the environment variable still take
precedence
2023-12-08 12:04:54 -08:00
Anis Eleuch aed7a1818a
info: Populate pool/set/disk indexes for offline disks (#18613)
This can be calculated from the disk layout and some external
applications would like to know the location of the offline
disks.
2023-12-08 08:13:04 -08:00
Poorna 6b06da76cb
add configuration to limit replication workers (#18601) 2023-12-07 16:22:00 -08:00
jiuker 6ca6788bb7
feat: add events_errors_total metric (#18610) 2023-12-07 16:21:17 -08:00
Anis Eleuch 2e23e61a45
Add support of conf file to pass arguments and options (#18592) 2023-12-07 01:33:56 -08:00
Harshavardhana 53ce92b9ca
fix: use the right channel to feed the data in (#18605)
this PR fixes a regression in batch replication
where we weren't sending any data from the Walk()
results due to incorrect channels being used.
2023-12-06 18:17:03 -08:00
Shireesh Anjal 7350a29fec
Capture percentage of cpu load and memory used (#18596)
By default the cpu load is the cumulative of all cores. Capture the
percentage load (load * 100 / cpu-count)

Also capture the percentage memory used (used * 100 / total)
2023-12-06 13:19:59 -08:00
jiuker 5cc2c62c66
fix: GetFreePort() will get the same port (#18604) 2023-12-06 10:36:42 -08:00
Harshavardhana 4bc5ed6c76
support LDAP service accounts via SFTP, FTP logins (#18599) 2023-12-06 04:31:35 -08:00
Harshavardhana 73dde66dbe
stick to go1.19 go.mod (#18600) 2023-12-06 01:09:22 -08:00
Harshavardhana e30c0e7ca3 Revert "Heal buckets at node level (#18504)"
This reverts commit 708296ae1b.
2023-12-05 22:34:46 -08:00
Shubhendu 708296ae1b
Heal buckets at node level (#18504) 2023-12-05 02:17:35 -08:00
Harshavardhana fbb5e75e01
avoid run-away goroutine build-up in notification send, use channels (#18533)
use memory for async events when necessary and dequeue them as
needed, for all synchronous events customers must enable

```
MINIO_API_SYNC_EVENTS=on
```

Async events can be lost but is upto to the admin to
decide what they want, we will not create run-away number
of goroutines per event instead we will queue them properly.

Currently the max async workers is set to runtime.GOMAXPROCS(0)
which is more than sufficient in general, but it can be made
configurable in future but may not be needed.
2023-12-05 02:16:33 -08:00
Harshavardhana f327b21557
handle crashes with ILM expiry changes (#18590) 2023-12-05 01:14:36 -08:00
Harshavardhana 45b7253f39
parallelize renameData() cleanup upon error (#18591) 2023-12-04 14:54:34 -08:00
Harshavardhana 05bb655efc
avoid caching metrics for timeout errors per drive (#18584)
Bonus: combine the loop for drive/REST registration.
2023-12-04 11:54:13 -08:00
Harshavardhana 8fdfcfb562
upon RenameData() quorum error delete any partial success (#18586)
there is potential for danglingWrites when quorum failed, where
only some drives took a successful write, generally this is left
to the healing routine to pick it up. However it is better that
we delete it right away to avoid potential for quorum issues on
version signature when there are many versions of an object.
2023-12-04 11:33:39 -08:00
Harshavardhana e7c144eeac
avoid double MRF heal when there is versions disparity (#18585) 2023-12-04 11:13:50 -08:00
Harshavardhana e98172d72d
avoid hot-tier SLA to be tied to warm-tier SLA (#18581)
it is okay if the warm-tier cannot keep up, we should continue
to take I/O at hot-tier, only fail hot-tier or block it when
we are disk full.

Bonus: add metrics counter for these missed tasks, we will
know for sure if one of the node is lagging behind or is
losing too many tasks during transitioning.
2023-12-02 13:02:12 -08:00
Krishnan Parthasarathi a50f26b7f5
Implement batch-expiration for objects (#17946)
Based on an initial PR from -
https://github.com/minio/minio/pull/17792

But fully completes it with newer finalized YAML spec.
2023-12-02 02:51:33 -08:00
Klaus Post 69294cf98a
Disable DMA optimization on windows (#18575)
It appears that Windows can lock up when errors occur. Use regular copy here.
2023-12-01 16:13:19 -08:00
Krishnan Parthasarathi c397fb6c7a
Minor fixes to bucket replication (#18578) 2023-12-01 16:13:08 -08:00
Klaus Post 961b0b524e
Do not require restart when a disk is unreachable during node boot (#18576)
A disk that is not able to initialize when an instance is started
will never have a handler registered, which means a user will
need to restart the node after fixing the disk;

This will also prevent showing the wrong 'upgrade is needed.'
error message in that case.

When the disk is still failing, print an error every 30 minutes;
Disk reconnection will be retried every 30 seconds.

Co-authored-by: Anis Elleuch <anis@min.io>
2023-12-01 12:01:14 -08:00
Harshavardhana 109a9e3f35
skip ILM expired objects from healing (#18569) 2023-12-01 07:56:24 -08:00
Klaus Post 5f971fea6e
Fix Mux Connect Error (#18567)
`OpMuxConnectError` was not handled correctly.

Remove local checks for single request handlers so they can 
run before being registered locally.

Bonus: Only log IAM bootstrap on startup.
2023-12-01 00:18:04 -08:00
Klaus Post 94fbcd8ebe
Add TLS cert checksum (#18557)
It allows validation of whether all certs match across clusters.
2023-11-30 12:13:50 -08:00
Harshavardhana 879d5dd236
site replication must heal policy mappings with correct userType (#18563) 2023-11-30 10:34:18 -08:00
Harshavardhana 0ee722f8c3
cleanup handling of STS isAllowed and simplifies the PolicyDBGet() (#18554) 2023-11-29 16:07:35 -08:00
Anis Eleuch b7d11141e1
rename Force to Immediate for clarity (#18540) 2023-11-28 22:35:16 -08:00
Klaus Post bea0b050cd
Improve env var config error reporting (#18549)
Improve env var config error

Env vars that were set on current server but not on remotes were not reported in errors.

Add these.
2023-11-28 10:39:02 -08:00
Shubhendu ce62980d4e
Fixed transition rules getting overwritten while healing (#18542)
While healing the latest changes of expiry rules across sites
if target had pre existing transition rules, they were getting
overwritten as cloned latest expiry rules from remote site were
getting written as is. Fixed the same and added test cases as
well.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-28 10:38:35 -08:00
Klaus Post dc88865908
fix: shadowed error in getObjectFileInfo() (#18548)
This will result in `done <- err == nil` always returning true
for this path, which seems unintentional.
2023-11-28 09:47:41 -08:00
Krishnan Parthasarathi 9fbd931058
Skip versions expired by DeleteAllVersionsAction (#18537)
Object versions expired by DeleteAllVersionsAction must not be included
toward data-usage accounting.
2023-11-28 08:39:21 -08:00
jiuker b0264bdb90
preserve null version delete marker on suspended bucket version (#18547) 2023-11-28 08:31:33 -08:00
bestgopher 95d6f43cc8
fix(cmd/notification.go): no error when retry successful (#18530) 2023-11-27 22:41:03 -08:00
Anis Eleuch 9cb94eb4a9
cleaning up will delete instead of rename to trash with full disk err (#18534)
moveToTrash() function moves a folder to .trash, for example, when 
doing some object deletions: a data dir that has many parts will be 
renamed to the trash folder; However, ENOSPC is a valid error from 
rename(), and it can cripple a user trying to free some space in an 
entire disk situation.

Therefore, this commit will try to do a recursive delete in that case.
2023-11-27 17:36:02 -08:00
Harshavardhana bd0819330d
avoid Walk() API listing objects without quorum (#18535)
This allows batch replication to basically do not
attempt to copy objects that do not have read quorum.

This PR also allows walk() to provide custom
values for quorum under batch replication, and
key rotation.
2023-11-27 17:20:04 -08:00
Harshavardhana 8d9e83fd99
support passing signatureAge conditional (#18529)
this PR allows following policy

```
{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "Deny a presigned URL request if the signature is more than 10 min old",
         "Effect": "Deny",
         "Action": "s3:*",
         "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET1/*",
         "Condition": {
            "NumericGreaterThan": {
               "s3:signatureAge": 600000
            }
         }
      }
   ]
}
```

This is to basically disable all pre-signed URLs that are older than 10 minutes.
2023-11-27 11:30:19 -08:00
jiuker be02333529
feat: drive sub-sys to max timeout reload (#18501) 2023-11-27 09:15:06 -08:00
Harshavardhana 506f121576
remove frivolous logging in transition object (#18526)
AWS S3 closes keep-alive connections frequently
leading to frivolous logs filling up the MinIO
logs when the transition tier is an AWS S3 bucket.

Ignore such transient errors, let MinIO retry
it when it can.
2023-11-26 22:18:09 -08:00
Klaus Post ca488cce87
Add detailed parameter tracing + custom prefix (#18518)
* Allow per handler custom prefix.
* Add automatic parameter extraction
2023-11-26 01:32:59 -08:00
Shireesh Anjal 11dc723324
Pass SUBNET URL to console (#18503)
When minio runs with MINIO_CI_CD=on, it is expected to communicate
with the locally running SUBNET. This is happening in the case of MinIO
via call home functionality. However, the subnet-related functionality inside the
console continues to talk to the SUBNET production URL. Because of this,
the console cannot be tested with a locally running SUBNET.

Set the env variable CONSOLE_SUBNET_URL correctly in such cases. 
(The console already has code to use the value of this variable
as the subnet URL)
2023-11-24 09:59:35 -08:00
Shubhendu dd6ea18901
fix: No shallow copy needed when looking at r.Form (#18499)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-24 09:46:55 -08:00
Harshavardhana 9032f49f25
DiskInfo() must return errDiskNotFound not internal errors (#18514) 2023-11-24 09:07:14 -08:00
Anis Eleuch fbc6f3f6e8
snowball-repl: Add support of immediate tiering (#18508)
Also, fix a possible crash when some fields are not added to the batch
snowball yaml
2023-11-22 16:33:11 -08:00
Harshavardhana fba883839d
feat: bring new HDD related performance enhancements (#18239)
Optionally allows customers to enable 

- Enable an external cache to catch GET/HEAD responses 
- Enable skipping disks that are slow to respond in GET/HEAD 
  when we have already achieved a quorum
2023-11-22 13:46:17 -08:00
Krishnan Parthasarathi a93214ea63
ilm: ObjectSizeLessThan and ObjectSizeGreaterThan (#18500) 2023-11-22 13:42:39 -08:00
Klaus Post e6b0fc465b
tweak healing to include version-id in healing result (#18225) 2023-11-22 12:30:31 -08:00
Anis Eleuch 70fbcfee4a
Implement batch snowball (#18485) 2023-11-22 10:51:46 -08:00
Sveinn d67e4d5b17
fix: check for bucket existence before FTP upload (#18496) 2023-11-21 21:36:32 -08:00
Harshavardhana fe3e49c4eb
use Access(F_OK) do not need to check for permissions (#18492) 2023-11-21 15:08:41 -08:00
Shubhendu 58306a9d34
Replicate Expiry ILM configs while site replication (#18130)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-21 09:48:06 -08:00
Harshavardhana a4cfb5e1ed
return errors if dataDir is missing during HeadObject() (#18477)
Bonus: allow replication to attempt Deletes/Puts when
the remote returns quorum errors of some kind, this is
to ensure that MinIO can rewrite the namespace with the
latest version that exists on the source.
2023-11-20 21:33:47 -08:00
Klaus Post 51aa59a737
perf: websocket grid connectivity for all internode communication (#18461)
This PR adds a WebSocket grid feature that allows servers to communicate via 
a single two-way connection.

There are two request types:

* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
  roundtrips with small payloads.

* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
  which allows for different combinations of full two-way streams with an initial payload.

Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.

Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.

If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte 
slices or use a higher-level generics abstraction.

There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.

The request path can be changed to a new one for any protocol changes.

First, all servers create a "Manager." The manager must know its address 
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.

```
func (m *Manager) Connection(host string) *Connection
```

All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight 
requests and responses must also be given for streaming requests.

The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.

* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
   performs a single request and returns the result. Any deadline provided on the request is
   forwarded to the server, and canceling the context will make the function return at once.

* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
   will initiate a remote call and send the initial payload.

```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
	// Responses from the remote server.
	// Channel will be closed after an error or when the remote closes.
	// All responses *must* be read by the caller until either an error is returned or the channel is closed.
	// Canceling the context will cause the context cancellation error to be returned.
	Responses <-chan Response

	// Requests sent to the server.
	// If the handler is defined with 0 incoming capacity this will be nil.
	// Channel *must* be closed to signal the end of the stream.
	// If the request context is canceled, the stream will no longer process requests.
	Requests chan<- []byte
}

type Response struct {
	Msg []byte
	Err error
}
```

There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.
2023-11-20 17:09:35 -08:00
Anis Eleuch 02331a612c
batch-repl: Replicate missing metadata and standard headers (#18484)
- Replicate Expires when the source is local or remote
- Replicate metadata when the source is remote
2023-11-18 19:12:44 -08:00
Anis Eleuch 8317557f70
decom: Fix listing quorum to be equal to deletion quorum (#18476)
With an odd number of drives per erasure set setup, the write/quorum is
the half + 1; however the decommissioning listing will still list those
objects and does not consider those as stale.

Fix it by using (N+1)/2 formula.

Co-authored-by: Anis Elleuch <anis@min.io>
2023-11-17 21:09:09 -08:00
Anis Eleuch 1bb7a2a295
Immediate transition ILM to avoid quick deferring to the scanner (#18475)
Immediate transition use case and is mostly used to fill warm
backend with a lot of data when a new deployment is created

Currently, if the transition queue is complete, the transition will be
deferred to the scanner; change this behavior by blocking the PUT request
until the transition queue has a new place for a transition task.
2023-11-17 16:16:46 -08:00
Harshavardhana 0a286153bb
remove checking for BucketInfo() peer call for every PUT() (#18464)
we already validate if the bucket doesn't exist in RenameData()
which can handle this cleanly, instead of making a network call
and returning errors.
2023-11-17 05:29:50 -08:00
Anis Eleuch 22d59e757d
Remove stale data in HEAD/GET object (#18460)
Currently if the object does not exist in quorum disks of an erasure
set, the dangling code is never called because the returned error will
be errFileNotFound or errFileVersionNotFound;

With this commit, when errFileNotFound or errFileVersionNotFound is
returning when trying to calculate the quorum of a given object, the
code checks if a disk returned nil, which means a stale object exists in
that disk, that will trigger deleteIfDangling() function
2023-11-16 08:39:53 -08:00
Andreas Auernhammer 0daa2dbf59
health: split liveness and readiness handler (#18457)
This commit splits the liveness and readiness
handler into two separate handlers. In K8S, a
liveness probe is used to determine whether the
pod is in "live" state and functioning at all.
In contrast, the readiness probe is used to
determine whether the pod is ready to serve
requests.

A failing liveness probe causes pod restarts while
a failing readiness probe causes k8s to stop routing
traffic to the pod. Hence, a liveness probe should
be as robust as possible while a readiness probe
should be used to load balancing.

Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Signed-off-by: Andreas Auernhammer <github@aead.dev>
2023-11-16 01:51:27 -08:00
Praveen raj Mani 38f35463b7
Load bucket configs during the metadata refresh (#18449)
This patch takes care of loading the bucket configs of failed buckets
during the periodic refresh. This makes sure the event notifiers and
remote bucket targets are properly initialized.
2023-11-15 12:43:25 -08:00
Harshavardhana 5573986e8e
fix: relax free inode check for single drive deployments (#18437)
users might use MinIO on NFS, GPFS that provide dynamic
inodes and may not even have a concept of free inodes.

to allow users to use MinIO on top of GPFS relax the
free inode check.
2023-11-14 09:31:16 -08:00
Sveinn f3367a1b20
Adding error handling for network errors in the SFTP layer (#18442) 2023-11-14 09:31:00 -08:00
Sveinn 8fbec30998
Adding a missing return to fix SFTP Rmdir message (#18438) 2023-11-14 09:26:46 -08:00
Harshavardhana a7466eeb0e
fix: ignore dperf on unformatted/unavailable/unmounted drives (#18435) 2023-11-13 22:32:08 -08:00
Harshavardhana 8b1e819bf3
fix: make sure to purge all the completed in resume() (#18429)
currently previously completed jobs would re-run
even if they are completed, causing incorrect behavior.
2023-11-13 08:15:00 -08:00
Anis Eleuch fe63664164
prom: Add drive failure tolerance per erasure set (#18424) 2023-11-13 00:59:48 -08:00
Sveinn 9afdb05bf4
fix: file consistency issue on SFTP upload (#18422)
* creating a byte buffer for SFTP file segments
* Adding an error condition for when there are 
  remaining segments in the queue
* Simplification of the queue using a map
2023-11-11 00:14:41 -08:00
Krishnan Parthasarathi 9569a85cee
Avoid allocs for MRF on-disk header (#18425) 2023-11-10 19:54:46 -08:00
Harshavardhana 54721b7c7b
fix: batch replication from source allow out of band deletes (#18423)
it is possible that ILM or Deletes got triggered on batch
of objects that we are attempting to batch replicate, ignore
this scenario as valid behavior.
2023-11-10 16:12:35 -08:00
Harshavardhana 91d8bddbd1
use sendfile/splice implementation to perform DMA (#18411)
sendfile implementation to perform DMA on all platforms

Go stdlib already supports sendfile/splice implementations
for

- Linux
- Windows
- *BSD
- Solaris

Along with this change however O_DIRECT for reads() must be
removed as well since we need to use sendfile() implementation

The main reason to add O_DIRECT for reads was to reduce the
chances of page-cache causing OOMs for MinIO, however it would
seem that avoiding buffer copies from user-space to kernel space
this issue is not a problem anymore.

There is no Go based memory allocation required, and neither
the page-cache is referenced back to MinIO. This page-
cache reference is fully owned by kernel at this point, this
essentially should solve the problem of page-cache build up.

With this now we also support SG - when NIC supports Scatter/Gather
https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing)
2023-11-10 10:10:14 -08:00
Harshavardhana 80adc87a14
converge WARM tier object name to hash of deployment+bucket (#18410)
this is to ensure that we can converge and save IOPs
when hot-tier accesses MinIO.
2023-11-10 02:15:13 -08:00
Taran Pelkey 117ad1b65b
Loosen requirements to detach policies for LDAP (#18419) 2023-11-09 14:44:43 -08:00
Klaus Post 2229509362
fix: leaking offline disks in MarkOffline() thread (#18414)
`monitorAndConnectEndpoints` will continue to attempt to reconnect offline disks.

Since disks were never closed, a `MarkOffline` would continue to try to check these disks forever.

Close previous disks.
2023-11-09 09:33:32 -08:00
Krishnan Parthasarathi 0a25083fdb
Tiered objects require ns locks unlike inlined (#18409) 2023-11-08 20:00:02 -08:00
Sveinn 15137d0327
refactor SFTP to use the new minio/pkg implementation (#18406) 2023-11-08 09:47:05 -08:00
Poorna 8c9974bc0f
site replication: avoid propagating bucket b/w settings (#18399)
replication mode and bucket bandwidth are one-way and should not be
propagated to peer cluster.

Regression from #18062
2023-11-08 00:40:25 -08:00
jiuker 079b6c2b50
fix: add err when all bucket resync failed (#18401) 2023-11-08 00:40:08 -08:00
Harshavardhana 754f7a8a39
replace io.Discard usage to fix some NUMA copy() latencies (#18394)
replace io.Discard usage to fix NUMA copy() latencies

On NUMA systems copying from 8K buffer allocated via
io.Discard leads to large latency build-up for every

```
copy(new8kbuf, largebuf)
```

can in-cur upto 1ms worth of latencies on NUMA systems
due to memory sharding across NUMA nodes.
2023-11-06 14:26:08 -08:00
Harshavardhana 64bafe1dfe
skip speedtest bucket from site-replication (#18393) 2023-11-06 11:52:33 -08:00
jiuker c3e456e7e6
fix: no resyncid when site-replication cancel (#18392) 2023-11-06 01:53:31 -08:00
vicmunoz da95a2d13f
fix: object versions metric help (#18388) 2023-11-03 11:43:52 -07:00
Shireesh Anjal cc5e05fdeb
Do not anonymize hostnames by default (#18387)
Anonymize them only if the parameter `anonymize` is set to `strict
2023-11-03 10:09:33 -07:00
jiuker 8a56af439c
fix: siteReplicationSys.startResync return no buckets return if error (#18374) 2023-11-02 16:00:03 -07:00
Shireesh Anjal f6e581ce54
Capture network device info in health report (#18381) 2023-11-02 09:49:49 -07:00
Klaus Post 7472818d94
Fix hanging scanner saves (#18368)
Fix various regressions from #18029

* If context is canceled the token is never returned. This will lead to scanner being unable to save and deadlocking.
* Fix backup not being able to get any data (hr empty)
* Reduce backup timeout.
2023-11-01 09:09:28 -07:00
Taran Pelkey 33322e6638
Change behavior of service account empty policies (#18346)
* Fix embedded/implied policy behavior

* assume implied policy if pased to empty

* fix for all

* Fix failing tests

---------

Co-authored-by: Prakash Senthil Vel <23444145+prakashsvmx@users.noreply.github.com>
2023-10-31 12:30:36 -07:00
Daniel López Guimaraes a1792ca0d1
fix: relax enforcing filename on PostPolicy (#18336)
The filename is not required to be on the form data.
2023-10-30 21:06:32 -07:00
Harshavardhana ac8c43fe9c
fix: allow missing hot-tier accounting (#18345) 2023-10-30 14:42:11 -07:00
Allan Roger Reid 4d40ee00e9
Add check for reverse proxy setups (#18310)
Add check for reverse proxy setups, to skip check for paths being served by different port on same address.
2023-10-30 10:49:04 -07:00
Adrian Najera 06f59ad631
fix: expiration time for share link when using OpenID (#18297) 2023-10-30 10:21:34 -07:00
Harshavardhana 877e0cac03
fix: tiering statistics handling a bug in clone() implementation (#18342)
Tiering statistics have been broken for some time now, a regression
was introduced in 6f2406b0b6

Bonus fixes an issue where the objects are not assumed to be
of the 'STANDARD' storage-class for the objects that have
not yet tiered, this should be conditional based on the object's
metadata not a default assumption.

This PR also does some cleanup in terms of implementation,

fixes #18070
2023-10-30 09:59:51 -07:00
Klaus Post 508710f4d1
Re-add duplicate upload id sanity check. (#18339)
https://github.com/minio/minio/pull/18307 partially removed the duplicate upload id check.

While I can't really see how ListDir can return duplicate entries, let's re-add it, since it is a cheap sanity check.
2023-10-29 08:33:30 -07:00
Matthew Toohey c2fedb4c3f
fix: log targetID instead of Name when event error occurs (#18335) 2023-10-28 08:32:57 -07:00
Poorna 03dc65e12d
Reload replication targets lazily if missing (#18333)
There can be rare situations where errors seen in bucket metadata
load on startup or subsequent metadata updates can result in missing
replication remotes.

Attempt a refresh of remote targets backed by a good replication config
lazily in 5 minute intervals if there ever occurs a situation where
remote targets go AWOL.
2023-10-27 21:08:53 -07:00
Praveen raj Mani 54aed421b8
fix: update the user cache while adding service accounts with expiry (#18320) 2023-10-26 08:11:29 -07:00
jiuker d5e8dac1cf
fix: canceling the heal caused goroutine to leak. (#18322) 2023-10-26 07:53:06 -07:00
Poorna 96ec8fcba1
Preserve replica timestamps in multipart (#18318)
Also a backward compatibility fix to use x-amz-replica-status
if present as replication status.
2023-10-25 21:24:10 -07:00
Harshavardhana 0663eb69ed
fix: do not preserve mtime during CopyObject() metadata updates (#18316)
mtime must be preserved only if destination mtime is set.

fixes #18314
2023-10-25 14:30:56 -07:00
Harshavardhana c60f54e5be
make ListMultipart/ListParts more reliable skip healing disks (#18312)
this PR also fixes old flaky tests, by properly marking disk offline-based tests.
2023-10-24 23:33:25 -07:00
Harshavardhana 483389f2e2 set diskMaxConcurrent to 32 if nrRequests is lower 2023-10-24 17:21:12 -07:00
Harshavardhana 069d118329
fix: listObjectParts to prefer local and single disks (#18309) 2023-10-24 13:51:57 -07:00
Harshavardhana a7b1834772
fix: flaky and stupid tests in root lockdown (#18308) 2023-10-24 13:22:44 -07:00
Klaus Post 6415dec37a
Improve multipart listing speed (#18307) 2023-10-24 12:06:06 -07:00
Harshavardhana 2dc917e87f
maxConcurrent must be set only once per node (#18303) 2023-10-23 21:42:36 -07:00
Aditya Manthramurthy 0a284a1a10
fix: SR: Add more info when IAM config differs (#18302)
Provide details on what IAM info mismatched when the validation fails
2023-10-23 21:16:40 -07:00
Harshavardhana 5c8339e1e8
fix: veeam SOS API to higher layers (#18287)
- support populating usage info from scanner info
- support populating quota for the bucket via quota
  settings for the bucket
2023-10-23 13:55:45 -07:00
Harshavardhana fd37418da2
fix: allow server not initialized error to be retried (#18300)
Since relaxing quorum the error across pools
for ListBuckets(), GetBucketInfo() we hit a
situation where loading IAM could potentially
return an error for second pool that server
is not initialized.

We need to handle this, let the pool come online
and retry transparently - this PR fixes that.
2023-10-23 12:30:20 -07:00
Harshavardhana bbfea29c2b
use object modTime for the event sequencer ID (#18285)
always set modTime after lock is acquired in
completemultipart stage to make sure that the
modTime is not racy.
2023-10-20 19:28:05 -07:00
Harshavardhana aa703dc903
relax write quorum requirement for ListBuckets()/HeadBucket() (#18288)
Also fix error handling for HeadBucket() to be pool specific
2023-10-20 17:50:21 -07:00
Harshavardhana 780882efcf
do not check for query params to be signed headers (#18283)
x-amz-signed-headers is meant for HTTP headers only
not for query params, using that to verify things
further can lead to failure.

The generated presigned URL with custom metadata
is already kosher (tamper proof).

fixes #18281
2023-10-19 21:32:49 -07:00
Klaus Post ba6218b354
fix: resource metrics "concurrent map iteration and map write" (#18273)
`resourceMetricsMap` has no protection against concurrent reads and writes.

Add a mutex and don't use maps from the last iteration.

Bug introduced in #18057

Fixes #18271
2023-10-18 13:28:50 -07:00
Harshavardhana 8e32de3ba9
cache DiskInfo() metrics call separately (#18270) 2023-10-18 11:17:32 -07:00
Klaus Post e37508fb8f
fix: linter errors in Windows specific code (#18276) 2023-10-18 11:08:15 -07:00
Klaus Post b46a717425
Remove unused config migration (#18277)
None of the migration is called. Remove dead code.
2023-10-18 11:05:24 -07:00
Klaus Post 7926df0b80
Fix globalDeploymentID race (#18275)
globalDeploymentID was being read while it was being set.

Fixes race:

```
WARNING: DATA RACE
Write at 0x0000079605a0 by main goroutine:
  github.com/minio/minio/cmd.connectLoadInitFormats()
      github.com/minio/minio/cmd/prepare-storage.go:269 +0x14f0
  github.com/minio/minio/cmd.waitForFormatErasure()
      github.com/minio/minio/cmd/prepare-storage.go:294 +0x21d
...

Previous read at 0x0000079605a0 by goroutine 105:
  github.com/minio/minio/cmd.newContext()
      github.com/minio/minio/cmd/utils.go:817 +0x31e
  github.com/minio/minio/cmd.adminMiddleware.func1()
      github.com/minio/minio/cmd/admin-router.go:110 +0x96
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
  github.com/minio/minio/cmd.setBucketForwardingMiddleware.func1()
      github.com/minio/minio/cmd/generic-handlers.go:460 +0xb1a
  net/http.HandlerFunc.ServeHTTP()
      net/http/server.go:2136 +0x47
...
```
2023-10-18 08:06:57 -07:00
Harshavardhana f91b257f50
choose different max_concurrent requests per drive based on HDD/NVMe (#18254)
currently the default for all drives is 512, which is a lot
for HDDs the recent testing has revealed moving this to 32
for HDDs seems like a fair value.
2023-10-16 17:18:13 -07:00
Harshavardhana edfb310a59
fix: always load ENVs from files first as soon as server starts (#18247)
This is a regression from #18231, however reading from ENV files
must happen well before any parsing logic is invoked.
2023-10-15 21:13:43 -07:00
Poorna 78f1f69d57
fix site replication resync status (#18245)
To persist status changes on disk upon completion.

Adds new tests to handle this functionality.
2023-10-13 22:17:22 -07:00
Harshavardhana e1e33077e8
fix: tests and resync replication status (#18244) 2023-10-13 17:03:34 -07:00
Aditya Manthramurthy b3e7de010d
Remove usage of errors.Join for go1.19 compat (#18243) 2023-10-13 15:14:16 -07:00
Shireesh Anjal bf1c6edb76
Revert "Capture network device info in health report" (#18241)
Introducing a new version of healthinfo struct for adding this info is
not correct. It needs to be implemented differently without adding a new
version.

This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.
2023-10-13 07:46:36 -07:00
jiuker 2ac7fee017
fix: missing fileName will upload failed when PostPolicyBucketHandler (#18240) 2023-10-13 07:31:23 -07:00
Klaus Post 128256e3ab
Add event counters (#18232)
Export metric for global events sent and skipped for the lifetime of the server.
2023-10-12 15:39:22 -07:00
Shireesh Anjal a66a7f3e97
Capture network device info in health report (#18213) 2023-10-12 15:33:31 -07:00
jiuker 20b79f8945
fix: env depend on the flag (#18231) 2023-10-12 15:32:38 -07:00
Klaus Post 9a877734b2
Fix various poolmeta races (#18230)
There is a fundamental race condition in `newErasureServerPools`, where setObjectLayer is 
called before the poolMeta has been loaded/populated.

We add a placeholder value to this field but disable all saving of the value, so we don't risk 
overwriting the value on disk. Once the value has been loaded or created, it is replaced with 
the proper value, which will also be saved.

Also fixes various accesses of `poolMeta` that were done without locks.

We make the `poolMeta.IsSuspended` return false, even if we shouldn't risk out-of-bounds 
reads anymore.
2023-10-12 15:30:42 -07:00
Harshavardhana 409c391850
implement helpers to get relevant info instead of FileInfo() (#18228) 2023-10-12 15:29:59 -07:00
jiuker 000928d34e
fix: should call func globalOSMetrics.time(s)() when updateOSMetrics (#18209) 2023-10-12 00:08:13 -07:00
Harshavardhana 6829ae5b13
completely remove drive caching layer from gateway days (#18217)
This has already been deprecated for close to a year now.
2023-10-11 21:18:17 -07:00
jiuker f09756443d
fix: a dynamic config will make a panic for addOrUpdateIDP (#18208) 2023-10-11 09:06:40 -07:00
jiuker 5512016885
fix: siteResyncMetrics init will make a deadlock when len(siteReplication) >= 3 (#18206) 2023-10-10 23:27:27 -07:00
Harshavardhana 21ecb941fe
fix: avoid counting out of band deletes during disk heal (#18205) 2023-10-10 14:39:48 -07:00
Harshavardhana 77e94087cf
fix: calling statfs() call moves the disk head (#18203)
if erasure upgrade is needed rely on the in-memory
values, instead of performing a "DiskInfo()" call.

https://brendangregg.com/blog/2016-09-03/sudden-disk-busy.html

for HDDs these are problematic, lets avoid this because
there is no value in "being" absolutely strict here
in terms of parity. We are okay to increase parity
as we see based on the in-memory online/offline ratio.
2023-10-10 13:47:35 -07:00
Klaus Post 9ab1f25a47
fix : PutObjectExtract data races (#18199)
Several callers to putObjectTar may be fighting to set sc. Move the write out of the loop.

Use static resp, and request elements.

Fixes tests with -race:

```
WARNING: DATA RACE
Read at 0x00c01cd680e0 by goroutine 691354:
  github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1()
      e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2130 +0x149
  github.com/minio/minio/cmd.untar.func1()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6
  github.com/minio/minio/cmd.untar.func8()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4

Previous write at 0x00c01cd680e0 by goroutine 691352:
  github.com/minio/minio/cmd.objectAPIHandlers.PutObjectExtractHandler.func1()
      e:/gopath/src/github.com/minio/minio/cmd/object-handlers.go:2131 +0x15d
  github.com/minio/minio/cmd.untar.func1()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:250 +0x2b6
  github.com/minio/minio/cmd.untar.func8()
      e:/gopath/src/github.com/minio/minio/cmd/untar.go:261 +0xa4
```
2023-10-10 08:36:44 -07:00
jiuker aaab7aefbe
fix: avoid nil panic upon error in GetObjectNInfo via InnerGetObjectNInfoFn (#18198) 2023-10-10 08:35:33 -07:00
Klaus Post 5b8599e52d
Do not log invalid tag errors (#18200)
Eliminate logging on invalid tags:

```
API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0A67480
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: Tags cannot be more than 10 (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0BEA514
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: Cannot provide multiple Tags with the same key (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE0E78970
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: The TagKey you have provided is invalid (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()

API: PutObjectTagging(bucket=aws-sdk-go-test-aupmzek4341ee2, object=sgehiqp24fwt4hafffmtwzkrqnq325)
Time: 07:40:33 UTC 10/10/2023
DeploymentID: f122cbfa-42b1-428f-9002-39c644cace71
RequestID: 178CAF0DE1002AE8
RemoteHost: 127.0.0.1
Host: 127.0.0.1:9001
UserAgent: aws-sdk-go/1.44.257 (go1.21.0; linux; amd64)
Error: The TagValue you have provided is invalid (*tags.errTag)
       5: internal\logger\logger.go:259:logger.LogIf()
       4: cmd\api-errors.go:2350:cmd.toAPIErrorCode()
       3: cmd\api-errors.go:2375:cmd.toAPIError()
       2: cmd\object-handlers.go:2912:cmd.objectAPIHandlers.PutObjectTaggingHandler()
       1: net\http\server.go:2136:http.HandlerFunc.ServeHTTP()
```
2023-10-10 08:35:03 -07:00
Harshavardhana 74e0c9ab9b
reduce unnecessary logging, simplify certain error handling (#18196)
remove a bunch of unnecessary logs
2023-10-10 00:33:42 -07:00
Harshavardhana dcce83b288
avoid rebalance state for getObjectTags if any (#18197)
fixes #18190
2023-10-09 23:56:26 -07:00
Matthew Toohey f731e7ea36
Fix current_send_in_progress metric always being zero (#18160) 2023-10-09 17:28:17 -07:00
Maxim Tkachenko ec30bb89a4
simplify channel send() in WalkDir() (#18186) 2023-10-09 17:27:55 -07:00
Klaus Post 7cd08594f6
Use better host names for metric errors (#18188)
Typically hosts would end up like this:

```
   "hosts": [
        ":9000",
        ":9000",
        ":9000",
...
```

Also add host name to errors.
2023-10-09 17:27:11 -07:00
Aditya Manthramurthy 2b4531f069
fix: O_DIRECT is on only for multi-disk setups (#18194)
Disable it for single disk/unsupported platforms
2023-10-09 17:08:40 -07:00
Harshavardhana 11544a62aa
fix: upon write failure on disk journal close the file properly (#18183)
close the file properly before dereferencing *os.File,
this can silently leak fd's in rare cases.

This PR fixes this properly.
2023-10-08 12:17:08 -07:00
Taran Pelkey 18550387d5
fix: DeleteServiceAccount API behavior (#18163) 2023-10-08 12:13:18 -07:00
Klaus Post 0de2b9a1b2
Fix panic on double unfreezeServices (#18177)
Calling unfreezeServices twice results in panic:

```
panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel
goroutine 14703 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
	github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e
panic({0x27c3020, 0x4c9b370})
	runtime/panic.go:884 +0x212
github.com/minio/minio/cmd.unfreezeServices()
	github.com/minio/minio/cmd/service.go:112 +0xc7
github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00)
	github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a
net/http.HandlerFunc.ServeHTTP(...)
```

If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic.

Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).
2023-10-06 07:51:50 -06:00
Poorna 9dc29d7687
Avoid ILM expiry on deleted versions that are yet to replicate (#18175)
Fixes #18167
2023-10-06 06:55:15 -06:00
Poorna 72871dbb9a
delete replication: avoid overwriting replication decision (#18174)
from ObjectInfo unless version purge status is present. Otherwise
there is potential to make incorrect replication decision if Stat
returned an error
2023-10-05 21:09:45 -06:00
Aditya Manthramurthy 4bda4e4e2b
fix: check for disk-level O_DIRECT support (#18173)
Disk level O_DIRECT support checking at xl storage initialization was
conditional on a config setting being enabled. (This never took effect
because config initialization happens after ObjectLayer is ready.) This
is not necessary as the config setting is dynamic - O_DIRECT should be
enabled via runtime config. So we need to do the disk level support
check regardless of the config setting.
2023-10-05 20:54:49 -06:00
Harshavardhana 1971c54a50
update buffer channels for both trace and listen events (#18171)
- Trace needs higher buffered channels than 4000 to ensure
  when we run `mc admin trace -a` it captures all information
  sufficiently.

- Listen event notification needs the event channel to be
  `apiRequestsMaxPerNode` * number of nodes
2023-10-05 18:16:04 -06:00
Anis Eleuch b336e9a79f
fix: loading usage cache to not fail early when reading the backup fails (#18158)
Currently, the retry is not fully used when there is no backup copy of
the data usage; use 5 retry attempts when we don't have any valid data, 
new or backup, unless we have seen an un-recognized error.
2023-10-02 19:22:35 -07:00
Harshavardhana a2ab21e91c
add max-keys=2 optimization for spark workloads (#18154)
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.

this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
2023-10-02 07:52:59 -06:00
Sveinn 603437e70f
Fix startup formatting (#18156)
Percentages in root user names are used for formatting.

Before:
```
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:51915 http://172.31.96.1:51915 http://127.0.0.1:51915
RootUser: "U4B6Zi!b75DXSPm%!!(MISSING)a(MISSING)vZb"
RootPass: "Q4#Q6y8G%!P(MISSING)x#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
FORMAT: %117s MESSAGE: $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%!a(MISSING)vZb" "Q4#Q6y8G%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

After:

```
Status:         1 Online, 0 Offline.
S3-API: http://192.168.50.21:9000  http://172.31.96.1:9000  http://127.0.0.1:9000
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Console: http://192.168.50.21:52421 http://172.31.96.1:52421 http://127.0.0.1:52421
RootUser: "U4B6Zi!b75DXSPm%avZb"
RootPass: "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"

Command-line: https://min.io/docs/minio/linux/reference/minio-mc.html#quickstart
   $ mc alias set myminio http://192.168.50.21:9000 "U4B6Zi!b75DXSPm%avZb" "Q4#Q6y8G%%Px#npP4dudUobU#NBcGB7RMKV4ajYb"
```

No need for special Windows case. `mc` works just fine.
2023-10-02 07:39:47 -06:00
Shireesh Anjal 6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Anis Eleuch 22d2dbc4e6
decom: Fix infinite retry when the decom is canceled (#18143)
Also, use rand.Float64() since it is thread-safe; otherwise go race
will complain.
2023-09-30 00:02:29 -07:00
Harshavardhana d6446cb096
do not return an error in AbortMultipartUpload() (#18135)
returning an error is a bit undefined in AWS S3
as it may return an error or not depending on the
time from AbortMultipartUpload().
2023-09-29 10:28:19 -07:00
Harshavardhana c34bdc33fb
make sure to set Versioned field to ensure rename2 is not called (#18141)
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.

Added healing code to ensure recovery of such content.
2023-09-29 09:08:24 -07:00
Anis Eleuch aec023f537
Avoid showing buckets without quorum in each pool (#18125) 2023-09-29 00:58:54 -07:00
Poorna e101eeeda9
fix: tier addition validation (#18136) 2023-09-28 22:33:24 -07:00
Harshavardhana 3c470a6b8b
fix: the inspect script to use scheme per deployment (#18118) 2023-09-27 08:22:50 -07:00
Poorna 6bc7d711b3
delete of a missing versionId return 204 (#18117) 2023-09-26 14:02:56 -07:00
Harshavardhana cdeab19673
fix: always check error upon w.Close() in Write() (#18111)
not checking w.Close() can prematurely make us
think that the w.Write() actually succeeded, apparently
Write() may or may not return an error but sometimes
only during a Close() call to the fd we may see the
error from Write() propagate.

Fdatasync(w) on the FD would return an error requiring
Close() error handling is less of a concern, however it may
happen such that fdatasync() did not return an error, where
as Close() would.
2023-09-26 11:04:00 -07:00
Anis Eleuch 22ee678136
tier: Avoid doing versioned operations since not required anymore (#18108)
Currently, setting a new tiering target returns an error when a bucket
is versioned and the tiering credentials does not have authorization to
specify a version-id when reading or removing a specific version;

Since tiering does not require versioning anymore; avoid doing versioned
operations when performing checklist ops while adding a new tiering
configuration.
2023-09-26 00:14:56 -07:00
Poorna 50a8f13e85
site replication: allow setting bandwidth default for bucket (#18062)
This can still be overridden at the bucket level
2023-09-25 15:50:52 -07:00
jiuker 6dec60b6e6
fix: check post policy like AWS S3 (#18074) 2023-09-25 12:35:25 -07:00
Harshavardhana ac3a19138a
fix: set scanning details locally to avoid cached values (#18092)
atomic variable results such as scanning must not use
cached values, instead rely on real-time information.
2023-09-25 08:26:29 -07:00
Klaus Post 21e8e071d7
Improve ListObject Compatibility (#18099)
Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after.

Fixes #18093
2023-09-25 08:13:08 -07:00
Klaus Post 57f84a8b4c
Add abandoned folder scanning to metrics (#18076)
Include object and versions heal scan times when checking non-empty abandoned folders.

Furthermore don't add delay between healing versions, instead do one per object wait.
2023-09-24 22:15:31 -07:00
Aditya Manthramurthy 22041bbcc4
fix: Update policy mapping properly in notification (#18088)
This is fixing a regression from an earlier change where STS account
loading was made lazy.
2023-09-22 20:47:50 -07:00
Harshavardhana 91ebac0a00
fix: move abandoned parts check after healing not in ILM path (#18087) 2023-09-22 12:07:52 -07:00
Harshavardhana 3a90fb108c only look for metadata if batch replication asks for metadata filters (#18082)
This PR changes the StatObject() to be must have for non-minio source
to being a conditional API call.

- Calls StatObject() when needed
- Calls GetObjectTagging() when needed

These calls if we do without these conditionals can cause a lot of
delays, so we avoid them if not needed in more common scenario.
2023-09-22 11:31:57 -07:00
Shubhendu 74cfb207c1
Added check for mandatory MINIO_KMS_KES_KEY_NAME env var (#18077)
If MinIO started with KMS enabled, MINIO_KMS_KES_KEY_NAME should
be set for server to start.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-21 10:37:37 -07:00
Harshavardhana 9788d85ea3
remove logging for invalid metadata values (#18068) 2023-09-20 15:49:55 -07:00
Anis Eleuch 69c0e18685
perf net: Add the endpoint name related to the perf net error (#18063)
In a perf test, one node will run speed test with all nodes. If there is
an error with a peer node, the peer node name is not included in the
error hence confusing the user.

This commit will add the peer endpoint string to the netperf error.
2023-09-19 22:41:06 -07:00
Aditya Manthramurthy 3cac927348
Load STS policy mappings periodically (#18061)
To ensure that policy mappings are current for service accounts
belonging to (non-derived) STS accounts (like an LDAP user's service
account) we periodically reload such mappings.

This is primarily to handle a case where a policy mapping update
notification is missed by a minio node. Such a node would continue to
have the stale mapping in memory because STS creds/mappings were never
periodically scanned from storage.
2023-09-19 17:57:42 -07:00
Harshavardhana 9081346c40 fix: more regressions listing policy mappings (#18060)
also relax ListServiceAccounts() returning error if
no service accounts exist.
2023-09-19 15:23:18 -07:00
Harshavardhana fcfadb0e51
fix: regression in loading LDAP users policy mappings (#18055)
LDAP users are stored as STS users, we need to load
their policy mappings appropriately.

Fixes a regression caused by #17994
2023-09-19 10:31:56 -07:00
Harshavardhana 2add57cfed
apply healing per object at 1024 cycles (#18050)
- we already have MRF for most recent failures
- we trigger healing during HEAD/GET operation

These are enough, also change the default max wait
from 5sec to 1sec for default scanner speed.
2023-09-19 09:24:22 -07:00
Poorna b73699fad8
replication: pass user tags while queueing (#18052)
Continues from #18032 - otherwise replication will fail on tag based rules.
2023-09-19 03:18:28 -07:00
Harshavardhana b8ebe54e53 Revert "skip tiered objects to GLACIER in batch replication (#18044)"
This reverts commit fd421ddd6f.

MinIO already provides `filter` based on metadata that would work
in this scenario already.
2023-09-19 00:05:40 -07:00
Harshavardhana c3d70e0795
cache usage, prefix-usage, and buckets for AccountInfo up to 10 secs (#18051)
AccountInfo is quite frequently called by the Console UI 
login attempts, when many users are logging in it is important
that we provide them with better responsiveness.

- ListBuckets information is cached every second
- Bucket usage info is cached for up to 10 seconds
- Prefix usage (optional) info is cached for up to 10 secs

Failure to update after cache expiration, would still
allow login which would end up providing information
previously cached.

This allows for seamless responsiveness for the Console UI
logins, and overall responsiveness on a heavily loaded
system.
2023-09-18 22:13:03 -07:00
Harshavardhana fd421ddd6f
skip tiered objects to GLACIER in batch replication (#18044)
tiered objects to GLACIER are not readable until
they are restored, we skip these as unreadable
2023-09-18 10:25:31 -07:00
jiuker 9947c01c8e
feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Eng Zer Jun a00db4267c
data-usage-cache: remove redundant nil check (#17970)
From the Go specification:

  "3. If the map is nil, the number of iterations is 0." [1]

Therefore, an additional nil check for before the loop is unnecessary.

[1]: https://go.dev/ref/spec#For_range

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-09-16 19:09:29 -07:00
Harshavardhana 36385010f5
use optimized pathJoin instead of path.Join (#18042)
this avoids allocations in scanner routine, they are tiny but 
they allocate a lot over many cycles of the scanner.
2023-09-16 19:08:59 -07:00
Harshavardhana fa6d082bfd
reduce all major allocations in replication path (#18032)
- remove targetClient for passing around via replicationObjectInfo{}
- remove cloing to object info unnecessarily
- remove objectInfo from replicationObjectInfo{} (only require necessary fields)
2023-09-16 02:28:06 -07:00
Poorna b733e6e83c
site replication turn off retry login for admin API calls (#18039)
additionally also mark site offline if n/w is down
2023-09-15 18:01:47 -07:00
Anis Eleuch 37aa5934a1
scanner: Fix loading data usage cache structure (#18037)
Return an empty data usage cache structure when the data usage cache
file does not exist, otherwise, the scanner won't work.
2023-09-15 13:11:08 -07:00
Harshavardhana 1647fc7edc
fix: optimize listMultipartUploads to serve via local disks (#18034)
and remove unused getLoadBalancedDisks()
2023-09-15 08:34:03 -07:00
Harshavardhana 7b92687397
remove generating presignedURLs with range header for lambda (#18033) 2023-09-14 21:58:17 -07:00
Alex dc48cd841a
Added MINIO_PROMETHEUS_AUTH_TOKEN env support (#18028)
Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>
2023-09-14 17:28:21 -07:00
Anis Eleuch b0e1776d6d
Do not use a chain for S3 tiering to return better error messages (#18030)
When using a chain provider all providers do not return a valid
access and secret key, an anonymous request is sent, which makes it hard
for users to figure out what is going on

In the case of S3 tiering, when AWS IAM temporary account generation returns
an error, an anonymous login will be used because of the chain provider.
Avoid this and use the AWS IAM provider directly to get a good error
message.
2023-09-14 15:28:20 -07:00
Aditya Manthramurthy 7a7068ee47
Move IAM periodic ops to a single go routine (#18026)
This helps reduce disk operations as these periodic routines would not
run concurrently any more.

Also add expired STS purging periodic operation: Since we do not scan
the on-disk STS credentials (and instead only load them on-demand) a
separate routine is needed to purge expired credentials from storage.
Currently this runs about a quarter as often as IAM refresh.

Also fix a bug where with etcd, STS accounts could get loaded into the
iamUsersMap instead of the iamSTSAccountsMap.
2023-09-14 15:25:17 -07:00
Aditya Manthramurthy cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Harshavardhana a2aabfabd9
add backups for usage-caches to rely on upon error (#18029)
This allows scanner to avoid lengthy scans, skip
things appropriately and also not lose metrics in
any manner.

reduce longer deadlines for usage-cache loads/saves
to match the disk timeout which is 2minutes now per
IOP.
2023-09-14 11:53:52 -07:00
Harshavardhana 32890342ce
introduce MINIO_BROWSER_REDIRECT env to enable/disable auto-redirect (#18025) 2023-09-13 18:43:57 -07:00
Aditya Manthramurthy ed2c2a285f
Load STS accounts into IAM cache lazily (#17994)
In situations with large number of STS credentials on disk, IAM load
time is high. To mitigate this, STS accounts will now be loaded into
memory only on demand - i.e. when the credential is used.

In each IAM cache (re)load we skip loading STS credentials and STS
policy mappings into memory. Since STS accounts only expire and cannot
be deleted, there is no risk of invalid credentials being reused,
because credential validity is checked when it is used.
2023-09-13 12:43:46 -07:00
Poorna 18e23bafd9
replication resync: report only the on-disk status (#18017)
Avoid reporting in-memory status since results can vary if different
nodes are queried, resync always runs at a single node.
2023-09-13 10:58:38 -07:00
Harshavardhana 8b8be2695f
optimize mkdir calls to avoid base-dir `Mkdir` attempts (#18021)
Currently we have IOPs of these patterns

```
[OS] os.Mkdir play.min.io:9000 /disk1 2.718µs
[OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs
```

It can be seen that we can save quite Nx levels such as
if your drive is mounted at `/disk1/minio` you can simply
skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`.

Since they are expected to exist already, this PR adds a way
for us to ignore all paths upto the mount or a directory which
ever has been provided to MinIO setup.
2023-09-13 08:14:36 -07:00
Poorna 96fbf18201
replication: queue existing objects to same workers as incoming (#18020)
Previously existing objects were queued to single worker and MRF re-queues
are also handled by same worker - this does not fully use the available
bandwidth in case there is no incoming workload.
2023-09-12 21:59:15 -07:00
Harshavardhana c8a57a8fa2
fix: send content-md5 for AWS S3 proactively (#18018)
fixes #17977
2023-09-12 19:11:13 -07:00
Harshavardhana b1c2dacab3
fix: allow dynamic ports for API only in non-distributed setups (#18019)
fixes #17998
2023-09-12 19:10:49 -07:00
Harshavardhana 08b3a466e8
fix: allow concurrent SFTP connections (#18013)
current implementation did not fully implement
the concurrent SFTP connection implementation,
this PR properly handles this.

fixes #17914
2023-09-12 12:41:52 -07:00
Harshavardhana 1df5e31706
optimize MRF replication queue to avoid memory leaks (#18007) 2023-09-11 20:59:11 -07:00
Harshavardhana 9f7044aed0
fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Anis Eleuch 41de53996b
heal: calculate the number of workers based on NRRequests (#17945) 2023-09-11 14:48:54 -07:00
Harshavardhana 9878031cfd
fix: change DISK_ to DRIVE_ for some drive related envs (#18005) 2023-09-11 12:19:22 -07:00
Poorna 703ed46d79
fix: replication of tags while removing (#17989)
A tag removal was not being replicated prior to this change
2023-09-06 19:05:02 -07:00
Harshavardhana f7ca6c63c2
fix: bucket quota clear and honor existing quota config (#17988) 2023-09-06 19:03:58 -07:00
Harshavardhana ad69b9907f
fix: report bucket metrics for only existing buckets (#17987) 2023-09-06 12:50:46 -07:00
Shubhendu bfddbb8b40
Embed file in ZIP with custom permissions (#17954)
This change enables embedding files in ZIP with custom permissions.
Also uses default creds for starting MinIO based on inspect data.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-06 09:24:01 -07:00
Poorna 13a2dc8485
replication resync: avoid blocking on results channel. (#17981)
continues fix in #17775
2023-09-05 20:22:39 -07:00
Harshavardhana 1e51424e8a
use syscall.Rename() directly instead of os.Rename() (#17982) 2023-09-05 20:22:23 -07:00
Harshavardhana 5b114b43f7
refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Poorna 812f5a02d7
metrics: fix panic in replication stats reporting (#17979) 2023-09-05 10:26:18 -07:00
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi 71c32e9b48
Return successorModTime in quorum when available (#17925) 2023-09-04 08:24:17 -07:00
Harshavardhana 380a59520b add missing testdata for benchmarking 2023-09-02 14:40:38 -07:00
Harshavardhana 3995355150
avoid repeated large allocations for large parts (#17968)
objects with 10,000 parts and many of them can
cause a large memory spike which can potentially
lead to OOM due to lack of GC.

with previous PR reducing the memory usage significantly
in #17963, this PR reduces this further by 80% under
repeated calls.

Scanner sub-system has no use for the slice of Parts(),
it is better left empty.

```
benchmark                            old ns/op     new ns/op     delta
BenchmarkToFileInfo/ToFileInfo-8     295658        188143        -36.36%

benchmark                            old allocs     new allocs     delta
BenchmarkToFileInfo/ToFileInfo-8     61             60             -1.64%

benchmark                            old bytes     new bytes     delta
BenchmarkToFileInfo/ToFileInfo-8     1097210       227255        -79.29%
```
2023-09-02 07:49:24 -07:00
Harshavardhana 8208bcb896
remove all unnecessary logging, logOnce when absolutely needed (#17965) 2023-09-01 16:19:18 -07:00
Poorna d665e855de
replication: remove check for empty version id (#17964) 2023-09-01 13:46:10 -07:00
Harshavardhana 18b3655c99
with xlv2 format we never had to fill in checksumInfo() (#17963)
- this PR avoids sending a large ChecksumInfo slice
  when its not needed

- also for a file with XLV2 format there is no reason
  to allocate Checksum slice while reading
2023-09-01 13:45:58 -07:00
Anis Eleuch 6a8d8f34a5
kafka: Do not require key when sending a message (#17962)
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;

This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
2023-09-01 08:37:22 -07:00
Harshavardhana b1c1f02132
use buffers for pathJoin, to re-use buffers. (#17960)
```
benchmark                        old ns/op     new ns/op     delta
BenchmarkPathJoin/PathJoin-8     79.6          55.3          -30.53%

benchmark                        old allocs     new allocs     delta
BenchmarkPathJoin/PathJoin-8     2              1              -50.00%

benchmark                        old bytes     new bytes     delta
BenchmarkPathJoin/PathJoin-8     48            24            -50.00%
```
2023-08-31 17:58:48 -07:00
yangw b13fcaf666
fix: read atomic variable in clientDevNull round trip time (#17955) 2023-08-31 08:31:01 -07:00
Harshavardhana 9458485e43
avoid double logging from healing (#17950) 2023-08-30 18:46:04 -07:00
Poorna b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Krishnan Parthasarathi 6a67c277eb
Reuse types for key-value, notification and retry (#17936) 2023-08-29 11:27:23 -07:00
Harshavardhana 7cafdc0512
fix: skip access checks further for known buckets (#17934) 2023-08-28 15:16:41 -07:00
Harshavardhana 8a57b6bced
use renameat2 Linux extension syscall (#17757)
this is a faster and safer alternative
on newer kernel versions.
2023-08-27 09:57:11 -07:00
Krishnan Parthasarathi 53abd25116
Don't log when object to be tiered is not found (#17924) 2023-08-25 23:34:16 -07:00
Harshavardhana 1ea7826c0e
do not have to consider replicationTimestamp for healing and quorum (#17922)
replicationTimestamp might differ if there were retries
in replication and the retried attempt overwrote in
quorum but enough shards with newer timestamp causing
the existing timestamps on xl.meta to be invalid, we
do not rely on this value for anything external.

this is purely a hint for debugging purposes, but there
is no real value in it considering the object itself
is in-tact we do not have to spend time healing this
situation.

we may consider healing this situation in future but
that needs to be decoupled to make sure that we do not
over calculate how much we have to heal.
2023-08-25 15:31:15 -07:00
Anis Eleuch 0cde37be50
Reduce the number of calls to import bucket metadata (#17899)
For each bucket, save the bucket metadata 
once, call the site replication hook once
2023-08-25 07:59:16 -07:00
jiuker 6aeca54ece
fix: replace context by timeout-context from parent-context when `selfSpeedTest` (#17906) 2023-08-25 07:58:38 -07:00
Harshavardhana 124e28578c
remove strict persistence requirements for List() .metacache objects (#17917)
.metacache objects are transient in nature, and are better left to
use page-cache effectively to avoid using more IOPs on the disks.

this allows for incoming calls to be not taxed heavily due to
multiple large batch listings.
2023-08-25 07:58:11 -07:00
Harshavardhana 62c9e500de
remove mTime requirement from pre-condition checks (#17916)
given a versionId the mtime is always the same, it
can never be different than its original value.

versionIds also do not conflict, since they are uuid's
and unique practically forever.
2023-08-24 14:33:58 -07:00
jiuker 02cc18ff29 refactor the perf client for TTFB and TotalResponseTime (#17901) 2023-08-24 10:21:08 -07:00
Harshavardhana ba4566e86d
add missing IAM node metrics to cluster and node endpoint (#17908) 2023-08-24 09:26:37 -07:00
Krishnan Parthasarathi 87cb0081ec
Retain current and upto NewerNoncurrentVersions versions (#17909)
applyNewerNoncurrentVersionLimit method should pass along versions
unaffected by NewerNoncurrentVersions rule for further ILM evaluation.
2023-08-24 09:26:29 -07:00
Poorna 4a6af93c83
mark replication target offline if network timeouts seen (#17907)
regular target liveness check every 5 secs will toggle state back
as target returns online.
2023-08-24 09:24:26 -07:00
Harshavardhana af564b8ba0
allow bootstrap to capture time-spent for each initializers (#17900) 2023-08-23 03:07:06 -07:00
Klaus Post 7c8746732b
Return cancelled storage calls as 499 (#17895)
Make upstream cancels more visible - right now they are just reported as "forbidden".
2023-08-22 11:10:41 -07:00
Klaus Post f506117edb
Reduce memory profiling rate (#17894)
Change profiling from every 4KB to every 128K, reducing the lock contention by a factor of 32.
2023-08-22 07:21:49 -07:00
Harshavardhana 1c5af7c31a
serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886)
we expect a certain level of IOPs and latency so this is okay.

fixes other miscellaneous bugs

- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
2023-08-21 16:44:50 -07:00
Harshavardhana 3a0125fa1f
remove unexpected logging from peer calls (#17888)
also make sure RequestID is set for system logs
2023-08-21 14:25:24 -07:00
Daniel Valdivia 328cb0a076
Pass environment variable to control session length to console (#17885)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-08-21 11:55:43 -07:00
jiuker e3ea97c964
fix: replace req context by locker context (#17880) 2023-08-19 22:09:07 -07:00
Andreas Auernhammer 8f8f8854f0
update `minio/kes-go` dep to v0.2.0 (#17850)
This commit updates the minio/kes-go dependency
to v0.2.0 and updates the existing code to work
with the new KES APIs.

The `SetPolicy` handler got removed since it
may not get implemented by KES at all and could
not have been used in the past since stateless KES
is read-only w.r.t. policies and identities.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2023-08-19 07:37:53 -07:00
Anis Eleuch 4c6869cd9a
ilm: Fix cleaning non current null versions (#17876) 2023-08-18 12:55:47 -07:00
Harshavardhana dde1a12819
fix: validate incoming uploadID to be base64 encoded (#17865)
Bonus fixes include

- do not have to write final xl.meta (renameData) does this
  already, saves some IOPs.

- make sure to purge the multipart directory properly using
  a recursive delete, otherwise this can easily pile up and
  rely on the stale uploads cleanup.

fixes #17863
2023-08-17 09:37:55 -07:00
Harshavardhana 9ebd10d3f4
Revert "Include SuccessorModTime for FileInfo quorum (#17732)" (#17860)
This reverts commit bf3901342c.

This is to fix a regression caused when there are inconsistent
versions, but one version is in quorum. SuccessorModTime issue
must be fixed differently.
2023-08-16 07:51:33 -07:00
Harshavardhana 3ba927edae
fix: batch status reporting after complete (#17852)
batch status can perpetually wait after completion
due to a race between the MetricsHandler() returning
the active metrics in intervals of 1sec and delete
of metrics after job completion.

this PR ensures that we keep the 'status' around
for a while, i.e upto 24hrs for all the batch jobs.
2023-08-15 12:22:30 -07:00
Harshavardhana c4ca0a5a57
add two more drive metrics when metrics is available (#17854) 2023-08-15 10:55:47 -07:00
Klaus Post 406ea4f281
Fix distributed listing not able to resume (#17855)
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.

This meant that in distributed systems listings would never be able to continue and would restart on every call.

Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.
2023-08-15 07:45:25 -07:00
Harshavardhana 64aa7feabd
allow specifying lower disks for Walk() (#17829)
useful when you may want Walk() with
reduced quorum requirements.
2023-08-14 21:32:39 -07:00
Poorna 875f4076ec
site replication: avoid retries when peer is offline (#17853) 2023-08-14 21:31:41 -07:00
Harshavardhana 4643efe6be
fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana b760137e1d
fix: add proxyByNode for batch jobs as part of their jobId (#17844) 2023-08-11 13:12:35 -07:00
Harshavardhana 5f56f441bf
fix: apply common notification code with content-type (#17843) 2023-08-11 11:34:43 -07:00
Klaus Post 96a22bfcbb
fix: wrapped io.EOF during ListObjects() (#17842)
When listing getObjectFileInfo can return `io.EOF` if file is being written.

When we wrap the error it will *not* retry upstream, since `io.EOF` is a valid return value.

Allow one retry before returning errors and canceling the listing.
2023-08-11 09:47:16 -07:00
Poorna dfaf735073
replication: fix queuing of large uploads (#17831)
Fixes regression from #17687
2023-08-10 15:48:42 -07:00
Anis Eleuch 7fcfde7f07
s3: Pick a pool with >85% if all other pools are in suspended state (#17826) 2023-08-10 11:06:31 -07:00
jiuker b1391d1991
feat: support perf client to show `TX` from client to server (#17718) 2023-08-10 07:14:46 -07:00
Harshavardhana eb55034dfe
optimize deletePrefix, use direct set location via object name (#17827)
* optimize deletePrefix, use direct set location via object name

instead of fanning out the calls for an object force delete
we can assume the set location and not do fan-out calls

* Apply suggestions from code review

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>

---------

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>
2023-08-09 16:30:22 -07:00
Harshavardhana c45bc32d98
skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana 6e860b6dc5
count all versions as part of DeleteAllVersionsAction (#17821) 2023-08-09 08:55:19 -07:00
Harshavardhana b732a673dc
reduce logging in bucket replication in retry scenarios (#17820) 2023-08-08 13:27:40 -07:00
Yang Wu 23e4895dfc
Create metrics slice when necessary (#17809) 2023-08-07 02:21:22 -07:00
Harshavardhana 8666c55ca6
fix: do not use PrefixEnabled() logic to ignore valid objects (#17677)
ignoring valid objects with valid replication metadata
after the Prefix was disabled must still honor the older
metadata.

this can lead to unexpected results, allow it during
READ phase always.
2023-08-05 13:56:01 -07:00
Anis Eleuch a3f00c5d5e
batch: Strict unmarshal yaml document to avoid user made typos (#17808)
// UnmarshalStrict is like Unmarshal except that any fields that are found
// in the data that do not have corresponding struct members, or mapping
// keys that are duplicates, will result in
// an error.
2023-08-05 13:51:48 -07:00
Poorna 26c23b30f4
replication: set context timeout for NewMultipartUpload calls (#17807) 2023-08-05 12:27:07 -07:00
Anis Eleuch a436fd513b
track client disconnections properly for all ListObjects calls (#17804)
Currently ListObjects* calls were returning 200 OK for timed-out clients,
this makes debugging via `mc admin trace` very hard.
2023-08-04 15:57:27 -07:00
Harshavardhana 533cd8d6df
fix: batch replication pull must preserve versionID (#17805)
batch replication pull must preserve versionID regardless
of destination bucket versioning configuration.

This is similar to the issue with decommissioning and rebalancing
2023-08-04 12:09:10 -07:00
Harshavardhana cb089dcb52
error out by default beyond 10000 versions per object (#17803)
```
You've exceeded the limit on the number of versions you can create on this object
```
2023-08-04 10:40:21 -07:00
Harshavardhana 239ccc9c40
fix: crash in globalTierJournal when TierConfig is not initialized (#17791) 2023-08-03 14:16:15 -07:00
Poorna b762fbaf21
sts: validate if iam subsystem initialized in handlers (#17796) 2023-08-03 13:24:25 -07:00
Praveen raj Mani 0285df5a02
fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Harshavardhana 45fb375c41
allow healing to prefer local disks over remote (#17788) 2023-08-03 02:18:18 -07:00
Harshavardhana 4a4950fe41
fix: honor requested allow origin settings properly (#17789)
fixes #17778
2023-08-02 20:41:21 -07:00
Anis Eleuch 1664fd8bb1
Avoid logging errors twice during transitioned objects expiration (#17782) 2023-08-02 09:06:03 -07:00
Harshavardhana 21cdd2bf5d
avoid overwriting metrics on success, save it in defer (#17780) 2023-08-01 22:19:56 -07:00
Harshavardhana 0153f96a20
add deadlines for readMetadata() in listing (#17776)
Bonus: also skip spending time looking for xl.json

- Listing()
- Delete()
2023-08-01 21:52:31 -07:00
Harshavardhana a7a7533190
add new errors for Disks with timeouts (#17770) 2023-08-01 12:47:50 -07:00
Poorna 311380f8cb
replication resync: fix queueing (#17775)
Assign resync of all versions of object to the same worker to avoid locking
contention. Fixes parallel resync implementation in #16707
2023-08-01 11:51:15 -07:00
Harshavardhana b0f0e53bba
fix: make sure to correctly initialize health checks (#17765)
health checks were missing for drives replaced since

- HealFormat() would replace the drives without a health check
- disconnected drives when they reconnect via connectEndpoint()
  the loop also loses health checks for local disks and merges
  these into a single code.
- other than this separate cleanUp, health check variables to avoid
  overloading them with similar requirements.
- also ensure that we compete via context selector for disk monitoring
  such that the canceled disks don't linger around longer waiting for
  the ticker to trigger.
- allow disabling active monitoring.
2023-08-01 10:54:26 -07:00
Klaus Post 004f1e2f66
Fix trailing header signature mismatch (#17774)
Seems like clients may omit a newline at the end of the trailer chunk. Each header should end with a newline. Add that if missing.

Fixes #17662
2023-08-01 08:45:57 -07:00
Harshavardhana 2fa561f22e
do not crash on invalid metric values (#17764)
```
minio[1032735]: panic: label value "\xc0.\xc0." is not valid UTF-8
minio[1032735]: goroutine 1781101 [running]:
minio[1032735]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
```

log such errors for investigation
2023-08-01 00:55:39 -07:00
Harshavardhana 81be718674
fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Sho Ce 49a1e2f98e
update-notifier.go: misleading version age message (#17750) 2023-07-31 08:36:19 -07:00
Klaus Post 684c46369c
Send events for extracted objects (#17760)
Fixes #17759
2023-07-31 08:33:51 -07:00
Harshavardhana 73edd5b8fd
introduce 'mc admin config set alias/ api odirect=on' (#17753)
change disable_odirect=off -> odirect=on to make it
easier to understand, instead of making it double
negative.
2023-07-31 00:12:53 -07:00
Harshavardhana 5e5bdf5432
capture total errors data availability and any timeout errors (#17748) 2023-07-29 23:26:26 -07:00
Harshavardhana f13cfcb83e
allow disabling O_DIRECT for write ops (#17751)
on really slow systems, O_DIRECT simply kills the drives
allow for a way to disable them.
2023-07-29 15:17:56 -07:00
Harshavardhana 731e03fe5a
add ReadFileStream deadline for disk call (#17745)
timeout the reader side if hung via disk max timeout
2023-07-28 15:37:53 -07:00
Anis Eleuch 7057d00a28
s3: Return invalid bucket name the first thing in all S3 calls (#17742) 2023-07-28 10:49:20 -07:00
Harshavardhana 114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
ruspaul013 a92cb66468
Get the signed headers in the order they were signed (#17690)
use pSignValues to get signed headers in order
2023-07-27 11:45:30 -07:00
ruspaul013 535f97ba61
check if metadata headers/url values are equal with signed headers (#17737) 2023-07-27 11:44:56 -07:00
drivebyer 14ebd82dbd
fix: missing disk metrics when query metric api from peer (#17738) 2023-07-27 11:44:13 -07:00
Harshavardhana 47dcfcbdd4
introduce deadlines on READ operations (#17724) 2023-07-27 07:33:05 -07:00
Krishnan Parthasarathi bf3901342c
Include SuccessorModTime for FileInfo quorum (#17732) 2023-07-26 17:04:16 -07:00
Harshavardhana b28bcad11b
avoid Access() calls on known bucket paths (#17719) 2023-07-26 11:31:40 -07:00
Harshavardhana a7c71e4c6b
protect disk monitoring to avoid busy loop configuration (#17723) 2023-07-25 20:02:22 -07:00
Poorna 1a42693d68
replication: limit larger uploads to a subset of workers (#17687)
Limit large uploads (> 128MiB)  to a max of 10 workers, intent is to avoid
larger uploads from using all replication bandwidth, giving room for smaller
uploads to sync faster.
2023-07-25 20:02:02 -07:00
Harshavardhana e7b60c4d65
Add slow drive timeouts to match with active disk monitoring (#17701)
allow active disk-monitoring to be configurable, and use
these add deadlines in various call layers for various
syscalls.
2023-07-25 16:58:31 -07:00
Poorna f95129894d
Use decrypted object size while computing object size summary (#17717)
Corrects an issue with encrypted versioned objects being reported under
`unversioned` bin in the object version histogram
2023-07-24 17:13:25 -07:00
Harshavardhana c32c71c836
allow DNS cache TTL to be configurable (#17709)
this is added for now as a hidden variable
2023-07-24 15:13:35 -07:00
Harshavardhana 14e1ace552
remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707)
slower drives get knocked off because they are too slow via 
active monitoring, we do not need to block calls arbitrarily.

Serializing adds latencies for already slow calls, remove
it for SSDs/NVMEs

Also, add a selection with context when writing to `out <-`
channel, to avoid any potential blocks.
2023-07-24 09:30:19 -07:00
drivebyer a7fb3a3853
fix: Create metrics slice when necessary in getCacheMetrics() (#17711) 2023-07-24 08:40:21 -07:00
Klaus Post 2da4bd5f1a
Revert "don't error when asked for 0-based range on empty objects (#17708) (#17713)
Revert "don't error when asked for 0-based range on empty objects (#17708)"

This reverts commit 7e76d66184.

There is no valid way to specify offsets in a 0-byte file. Blame it on the [RFC](https://datatracker.ietf.org/doc/html/rfc7233#section-4.4)

> The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the 
> request's Range header field (Section 3.1) overlap the current extent of the selected resource...

A request for "bytes=0-" is a request for the first byte of a resource. If the resource is 0-length, 
the range [0,0] does not overlap the resource content and the server responds with an error.
2023-07-24 07:56:28 -07:00
flisk 7e76d66184
don't error when asked for 0-based range on empty objects (#17708)
In a reverse proxying setup, a proxy in front of MinIO may attempt to
request objects in slices for enhanced cache efficiency. Since such a
a proxy cannot have prior knowledge of how large a requested resource is,
it usually sends a header of the form:

        Range: 0-$slice_size

... and, depending on the size of the resource, expects either:

- an empty response, if $resource_size == 0
- a full response, if $resource_size <= $slice_size
- a partial response, if $resource_size > $slice_size

Prior to this change, MinIO would respond 416 Range Not Satisfiable if a
client tried to request a range on an empty resource. This behavior is
technically consistent with RFC9110[1] – However, it renders sliced
reverse proxying, such as implemented in Nginx, broken in the case of
empty files. Nginx itself seems to break this convention to enable
"useful" responses in these cases, and MinIO should probably do that
too.

[1]: https://www.rfc-editor.org/rfc/rfc9110#byte.ranges
2023-07-23 00:10:03 -07:00
Harshavardhana 7764f4a8e3
return tags as part of Head/Get calls (#17635)
AWS S3 only returns the number of tag
counts, along with that we must return
the tags as well to avoid another metadata
call to the server.
2023-07-22 07:19:43 -07:00
Kaan Kabalak 6624f970c0
Fix spelling of 'already' across repository (#17703) 2023-07-21 08:45:08 -07:00
Harshavardhana 331bdc2245
fix: remove CompleteMultipartUpload() 200 OK response for blocking calls (#17699)
sending whitespace character with CompleteMultipartUpload()
with 200 OK was an AWS S3 compatible implementation detail,
and it was expected that the client SDK must look for both
successful XML as well as error XML for 200 OK.

But this is not useful anymore on MinIO, since we do not
have any large delayed coalescing of parts anymore.
2023-07-20 22:14:38 -07:00
Harshavardhana e12ab486a2
avoid using os.Getenv for internal code, use env.Get() instead (#17688) 2023-07-20 07:52:49 -07:00
Krishnan Parthasarathi 9eeee92d36
Add deletemarker_total metric (#17689) 2023-07-20 07:52:32 -07:00
Anis Eleuch 756d6aa729
fix: report correct pool/set/disk indexes for offline disks (#17695) 2023-07-20 07:48:21 -07:00
Harshavardhana bddd53d6d2
fix: retry listing in decommissioning if it fails perpetually (#17682) 2023-07-19 13:09:37 -07:00
jiuker a99cd825ab
fix: byHost realTime metrics API (#17681) 2023-07-18 23:50:30 -07:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Harshavardhana 4f257bf1e6
pick internode interface properly via globalLocalNodeName (#17680)
current code will not pick the right interface name
if --address or --interface is not provided.
2023-07-18 19:18:11 -07:00
Krishnan Parthasarathi 0120ff93bc
admin-info: add DeleteMarkers count (#17659) 2023-07-18 10:49:40 -07:00
Anis Eleuch 49638fa533
s3: Delete Bucket should not recreate bucket if it does not exist (#17676)
Also return Bucket Not Found error in the same use case.
2023-07-18 09:32:19 -07:00
Shubhendu 7a3a7b19e5
Added a start script to inspect command output (#17591)
Using this script, post decrypt we should be able to bring up the
MinIO instance with same configuration.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-17 14:15:28 -07:00
Harshavardhana 24e86d0c59
avoid passing around poolIdx, setIdx instead pass the relevant disks (#17660) 2023-07-17 09:52:05 -07:00
jiuker d118031ed6
fix: when Origin: null is set return back '*' for allow origins (#17651) 2023-07-15 12:15:06 -07:00
Anis Eleuch 341a89c00d
return a descriptive error when loading any IAM item fails (#17654)
Sometimes IAM fails to load certain items, which could be a user, 
a service account or a policy but with not enough information for 
us to debug.

This commit will create a more descriptive error to make it easier to
debug in such situations.
2023-07-14 20:17:14 -07:00
Anis Eleuch df29d25e6b
return different status code for internode communication (#17655)
mc admin trace -a will be able to quickly show
401 Unauthorized header to pinpoint trivial issues
between nodes, such as wrong root 
credentials and skewed time.
2023-07-14 18:34:55 -07:00
Harshavardhana 3e196fa7b3
fix: ILM newer noncurrent version limit must return correct versions (#17652)
objects/versions that are not expired via NewerNoncurrentVersions
must be properly returned to be applied under further ILM actions.

this would cause legitimately expired objects to be missed
from expiration.
2023-07-14 16:42:35 -07:00
drivebyer 04c792476f
fix: provide a possible slice cap for heal failed metrics items (#17647)
Signed-off-by: Wu <yang.wu@daocloud.io>
2023-07-14 11:02:45 -07:00
Harshavardhana 005a4a275a
add more bootstrap messages to provide latency (#17650)
- simplify refreshing bucket metadata, wait() to
  depend on how fast the bucket metadata can load.

- simplify resync to start resync in single pass.
2023-07-14 04:00:29 -07:00
Harshavardhana bdddf597f6
shuffle buckets randomly before being scanned (#17644)
this randomness is needed to avoid scanning
the same buckets across different erasure sets,
in the same order.

allow random buckets to be scanned instead
allowing a wider spread of ILM, replication
checks.

Additionally do not loop over twice to fill
the channel, fill the channel regardless of
having bucket new or old.
2023-07-14 02:25:40 -07:00
Aditya Manthramurthy bb6921bf9c
Send AuditLog via new middleware fn for admin APIs (#17632)
A new middleware function is added for admin handlers, including options
for modifying certain behaviors. This admin middleware:

- sets the handler context via reflection in the request and sends AuditLog
- checks for object API availability (skipping it if a flag is passed)
- enables gzip compression (skipping it if a flag is passed)
- enables header tracing (adding body tracing if a flag is passed)

While the new function is a middleware, due to the flags used for
conditional behavior modification, which is used in each route registration
call.

To try to ensure that no regressions are introduced, the following
changes were done mechanically mostly with `sed` and regexp:

- Remove defer logger.AuditLog in admin handlers
- Replace newContext() calls with r.Context()
- Update admin routes registration calls

Bonus: remove unused NetSpeedtestHandler

Since the new adminMiddleware function checks for object layer presence
by default, we need to pass the `noObjLayerFlag` explicitly to admin
handlers that should work even when it is not available. The following
admin handlers do not require it:

- ServerInfoHandler
- StartProfilingHandler
- DownloadProfilingHandler
- ProfileHandler
- SiteReplicationDevNull
- SiteReplicationNetPerf
- TraceHandler

For these handlers adminMiddleware does not check for the object layer
presence (disabled by passing the `noObjLayerFlag`), and for all other
handlers, the pre-check ensures that the handler is not called when the
object layer is not available - the client would get a
ErrServerNotInitialized and can retry later.

This `noObjLayerFlag` is added based on existing behavior for these
handlers only.
2023-07-13 14:52:21 -07:00
Klaus Post 4f89e5bba9
Add active disk health checks (#17539)
Add check every 2 minutes to see if a write+read operation can complete.

If disk is unresponsive for 2 minutes or returns errFaultyDisk, take it offline.
2023-07-13 11:41:55 -07:00
jiuker 183428db03
fear: Implement 'mc support top net' (#17598) 2023-07-13 11:41:19 -07:00
Shireesh Anjal fc6d873758
Use os.ReadFile instead of ioutil.ReadFile (#17649)
ioutil.ReadFile is deprecated and also doesn't work with certain kinds
of symlinks.
2023-07-13 09:07:10 -07:00
Poorna 5e2f8d7a42
replication: Simplify mrf requeueing and add backlog handler (#17171)
Simplify MRF queueing and add backlog handler

- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.

- Change MRF to have each node process only its MRF entries.

- Collect MRF backlog by the node to allow for current backlog visibility
2023-07-12 23:51:33 -07:00
Shubhendu 9b9871cfbb
Added `endpoint` and `versions` attributes to KMS details (#17350)
Now it would list details of all KMS instances with additional
attributes `endpoint` and `version`. In the case of k8s-based
deployment the list would consist of a single entry.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-12 23:50:38 -07:00
guangwu f80b6926d3
chore: fix minor issues reported via staticcheck (#17639) 2023-07-12 20:33:11 -07:00
Shubhendu 6dc55fe5ed
Corrected the API name for audit logging purpose (#17642)
This would better to record the correct API name so that
any verification around audit logs to figure out if required
APIs are called required no of times, would be correct.
Here in this case of policy attached, API `AttachDetachPolicyBuiltin`
would be called with `requestPath` as `/minio/admin/v3/idp/builtin/policy/attach`
and in case of detach policy the value would be `/minio/admin/v3/idp/builtin/policy/detach`

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-12 15:38:49 -07:00
Harshavardhana 2d1cda2061
fix: do not os.Exit(1) while writing goroutines during shutdown (#17640)
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.

Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.

It can be observed how now restarting the service is
much snappier.
2023-07-12 07:18:30 -07:00
Harshavardhana a566bcf613
treat 0-byte objects to honor same quorum as delete marker (#17633)
on unversioned buckets its possible that 0-byte objects
might lose quorum on flaky systems, allow them to be same
as DELETE markers. Since practically speak they have no
content.
2023-07-11 21:53:49 -07:00
Klaus Post 9885a0a6af
Fix hasSpaceFor in SNSD setup (#17630)
If drive is offline or filled we divide by 0.

Fixes #17629

Bonus: Reject when any valid disk exceeds minimum inode threshold.
2023-07-11 14:29:34 -07:00
Kaan Kabalak f64d62b01d
Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Harshavardhana 82075e8e3a
use strconv variants to improve on performance per 'op' (#17626)
```
BenchmarkItoa
BenchmarkItoa-8         	673628088	         1.946 ns/op	       0 B/op	       0 allocs/op
BenchmarkFormatInt
BenchmarkFormatInt-8    	592919769	         2.012 ns/op	       0 B/op	       0 allocs/op
BenchmarkSprint
BenchmarkSprint-8       	26149144	        49.06 ns/op	       2 B/op	       1 allocs/op
BenchmarkSprintBool
BenchmarkSprintBool-8   	26440180	        45.92 ns/op	       4 B/op	       1 allocs/op
BenchmarkFormatBool
BenchmarkFormatBool-8   	1000000000	         0.2558 ns/op	       0 B/op	       0 allocs/op
```
2023-07-11 07:46:58 -07:00
Harshavardhana 5b7c83341b
move per bucket metrics to peer location (#17627) 2023-07-11 07:46:24 -07:00
Poorna fb49aead9b
replication: add validation API (#17520)
To check if replication is set up properly on a bucket.
2023-07-10 20:09:20 -07:00
Aditya Manthramurthy 85f5700e4e
fix: missing audit logger call for some admin APIs (#17623) 2023-07-10 16:59:44 -07:00