Commit Graph

340 Commits

Author SHA1 Message Date
Krishnan Parthasarathi
a7577da768 Improve expiration of tiered objects (#18926)
- Use a shared worker pool for all ILM expiry tasks
- Free version cleanup executes in a separate goroutine
- Add a free version only if removing the remote object fails
- Add ILM expiry metrics to the node namespace
- Move tier journal tasks to expiryState
- Remove unused on-disk journal for tiered objects pending deletion
- Distribute expiry tasks across workers such that the expiry of versions of
  the same object serialized
- Ability to resize worker pool without server restart
- Make scaling down of expiryState workers' concurrency safe; Thanks
  @klauspost
- Add error logs when expiryState and transition state are not
  initialized (yet)
* metrics: Add missed tier journal entry tasks
* Initialize the ILM worker pool after the object layer
2024-03-01 21:11:03 -08:00
Harshavardhana
467714f33b ignore x-amz-storage-class when its set to STANDARD (#19154)
fixes #19135
2024-02-28 17:44:30 -08:00
Harshavardhana
53aa8f5650 use typos instead of codespell (#19088) 2024-02-21 22:26:06 -08:00
Harshavardhana
35deb1a8e2 do not block on send channels under high load (#19090)
all send channels must compete with `ctx` if not
they will perpetually stay alive.
2024-02-20 15:00:35 -08:00
Harshavardhana
794a7993cb calculate correct quorum check for metadata updates on object (#18979)
this fixes rare bugs we have seen but never really found a
reproducer for

- PutObjectRetention() returning 503s
- PutObjectTags() returning 503s
- PutObjectMetadata() updates during replication returning 503s

These calls return errors, and this perpetuates with
no apparent fix.

This PR fixes with correct quorum requirement.
2024-02-05 21:44:40 -08:00
Harshavardhana
fec13b0ec1 remove unused DiskMTime (#18965) 2024-02-05 01:04:26 -08:00
Anis Eleuch
6ae97aedc9 xl: Disable rename2 in decommissioning/rebalance (#18964)
Always disable rename2 optimization in decom/rebalance
2024-02-03 14:03:30 -08:00
Anis Eleuch
6fd63e920a log: Use error log type instead of Application/MinIO type (#18930)
* log: Use error log type instead of Application/MinIO type

Also bump github.com/shirou/gopsutil version to address cross
compilation issues.

* Apply suggestions from code review

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>

---------

Co-authored-by: Anis Eleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2024-02-01 16:13:57 -08:00
Harshavardhana
caac9d216e remove all the frivolous logs, that may or may not be actionable (#18922)
for actionable, inspections we have `mc support inspect`

we do not need double logging, healing will report relevant
errors if any, in terms of quorum lost etc.
2024-01-30 18:11:45 -08:00
Harshavardhana
80ca120088 remove checkBucketExist check entirely to avoid fan-out calls (#18917)
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.

We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
2024-01-30 12:43:25 -08:00
Harshavardhana
1d3bd02089 avoid close 'nil' panics if any (#18890)
brings a generic implementation that
prints a stack trace for 'nil' channel
closes(), if not safely closes it.
2024-01-28 10:04:17 -08:00
Harshavardhana
708cebe7f0 add necessary protection err, fileInfo slice reads and writes (#18854)
protection was in place. However, it covered only some
areas, so we re-arranged the code to ensure we could hold
locks properly.

Along with this, remove the DataShardFix code altogether,
in deployments with many drive replacements, this can affect
and lead to quorum loss.
2024-01-24 01:08:23 -08:00
Harshavardhana
dd2542e96c add codespell action (#18818)
Original work here, #18474,  refixed and updated.
2024-01-17 23:03:17 -08:00
Harshavardhana
38637897ba fix: listing SSE encrypted multipart objects (#18786)
GetActualSize() was heavily relying on o.Parts()
to be non-empty to figure out if the object is multipart or not, 
However, we have many indicators of whether an object is multipart 
or not.

Blindly assuming that o.Parts == nil is not a multipart, is an 
incorrect expectation instead, multipart must be obtained via

- Stored metadata value indicating this is a multipart encrypted object.

- Rely on <meta>-actual-size metadata to get the object's actual size.
  This value is preserved for additional reasons such as these.

- ETag != 32 length
2024-01-15 00:57:49 -08:00
Anis Eleuch
04135fa6cd audit: Add the drives where the dangling object is removed (#18737) 2024-01-05 14:17:24 -08:00
Harshavardhana
a50ea92c64 feat: introduce list_quorum="auto" to prefer quorum drives (#18084)
NOTE: This feature is not retro-active; it will not cater to previous transactions
on existing setups. 

To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment
variable as part of systemd service or k8s configmap. 

Once this has been enabled, you need to also set `list_quorum`. 

```
~ mc admin config set alias/ api list_quorum=auto` 
```

A new debugging tool is available to check for any missing counters.
2023-12-29 15:52:41 -08:00
Anis Eleuch
8a0ba093dd audit: Fix merrs and derrs object dangling message (#18714)
merrs and derrs are empty when a dangling object is deleted. Fix the bug
and adds invalid-meta data for data blocks
2023-12-27 22:27:04 -08:00
Harshavardhana
7c948adf88 allow pre-allocating buffers to reduce frequent GCs during growth (#18686)
This PR also increases per node bpool memory from 1024 entries
to 2048 entries; along with that, it also moves the byte pool
centrally instead of being per pool.
2023-12-21 08:59:38 -08:00
Harshavardhana
196e7e072b allow bitrot files to be healed in MRF (#18618)
bitrot scanMode was ignored in MRF,
allow it to heal relevant content if
needed when seen as an error.
2023-12-08 12:26:01 -08:00
Harshavardhana
45b7253f39 parallelize renameData() cleanup upon error (#18591) 2023-12-04 14:54:34 -08:00
Harshavardhana
8fdfcfb562 upon RenameData() quorum error delete any partial success (#18586)
there is potential for danglingWrites when quorum failed, where
only some drives took a successful write, generally this is left
to the healing routine to pick it up. However it is better that
we delete it right away to avoid potential for quorum issues on
version signature when there are many versions of an object.
2023-12-04 11:33:39 -08:00
Harshavardhana
e7c144eeac avoid double MRF heal when there is versions disparity (#18585) 2023-12-04 11:13:50 -08:00
Anis Eleuch
b7d11141e1 rename Force to Immediate for clarity (#18540) 2023-11-28 22:35:16 -08:00
Klaus Post
dc88865908 fix: shadowed error in getObjectFileInfo() (#18548)
This will result in `done <- err == nil` always returning true
for this path, which seems unintentional.
2023-11-28 09:47:41 -08:00
Harshavardhana
506f121576 remove frivolous logging in transition object (#18526)
AWS S3 closes keep-alive connections frequently
leading to frivolous logs filling up the MinIO
logs when the transition tier is an AWS S3 bucket.

Ignore such transient errors, let MinIO retry
it when it can.
2023-11-26 22:18:09 -08:00
Harshavardhana
fba883839d feat: bring new HDD related performance enhancements (#18239)
Optionally allows customers to enable 

- Enable an external cache to catch GET/HEAD responses 
- Enable skipping disks that are slow to respond in GET/HEAD 
  when we have already achieved a quorum
2023-11-22 13:46:17 -08:00
Harshavardhana
a4cfb5e1ed return errors if dataDir is missing during HeadObject() (#18477)
Bonus: allow replication to attempt Deletes/Puts when
the remote returns quorum errors of some kind, this is
to ensure that MinIO can rewrite the namespace with the
latest version that exists on the source.
2023-11-20 21:33:47 -08:00
Anis Eleuch
22d59e757d Remove stale data in HEAD/GET object (#18460)
Currently if the object does not exist in quorum disks of an erasure
set, the dangling code is never called because the returned error will
be errFileNotFound or errFileVersionNotFound;

With this commit, when errFileNotFound or errFileVersionNotFound is
returning when trying to calculate the quorum of a given object, the
code checks if a disk returned nil, which means a stale object exists in
that disk, that will trigger deleteIfDangling() function
2023-11-16 08:39:53 -08:00
Harshavardhana
0663eb69ed fix: do not preserve mtime during CopyObject() metadata updates (#18316)
mtime must be preserved only if destination mtime is set.

fixes #18314
2023-10-25 14:30:56 -07:00
Harshavardhana
5c8339e1e8 fix: veeam SOS API to higher layers (#18287)
- support populating usage info from scanner info
- support populating quota for the bucket via quota
  settings for the bucket
2023-10-23 13:55:45 -07:00
Aditya Manthramurthy
b3e7de010d Remove usage of errors.Join for go1.19 compat (#18243) 2023-10-13 15:14:16 -07:00
Harshavardhana
77e94087cf fix: calling statfs() call moves the disk head (#18203)
if erasure upgrade is needed rely on the in-memory
values, instead of performing a "DiskInfo()" call.

https://brendangregg.com/blog/2016-09-03/sudden-disk-busy.html

for HDDs these are problematic, lets avoid this because
there is no value in "being" absolutely strict here
in terms of parity. We are okay to increase parity
as we see based on the in-memory online/offline ratio.
2023-10-10 13:47:35 -07:00
Poorna
9dc29d7687 Avoid ILM expiry on deleted versions that are yet to replicate (#18175)
Fixes #18167
2023-10-06 06:55:15 -06:00
Harshavardhana
c34bdc33fb make sure to set Versioned field to ensure rename2 is not called (#18141)
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.

Added healing code to ensure recovery of such content.
2023-09-29 09:08:24 -07:00
Harshavardhana
3c470a6b8b fix: the inspect script to use scheme per deployment (#18118) 2023-09-27 08:22:50 -07:00
jiuker
9947c01c8e feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Harshavardhana
9f7044aed0 fix: ignore transient errors in read path (#18006)
Errors such as

```
returned an error (context deadline exceeded) (*fmt.wrapError)
```

```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
2023-09-11 15:29:59 -07:00
Aditya Manthramurthy
1c99fb106c Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana
3995355150 avoid repeated large allocations for large parts (#17968)
objects with 10,000 parts and many of them can
cause a large memory spike which can potentially
lead to OOM due to lack of GC.

with previous PR reducing the memory usage significantly
in #17963, this PR reduces this further by 80% under
repeated calls.

Scanner sub-system has no use for the slice of Parts(),
it is better left empty.

```
benchmark                            old ns/op     new ns/op     delta
BenchmarkToFileInfo/ToFileInfo-8     295658        188143        -36.36%

benchmark                            old allocs     new allocs     delta
BenchmarkToFileInfo/ToFileInfo-8     61             60             -1.64%

benchmark                            old bytes     new bytes     delta
BenchmarkToFileInfo/ToFileInfo-8     1097210       227255        -79.29%
```
2023-09-02 07:49:24 -07:00
Harshavardhana
18b3655c99 with xlv2 format we never had to fill in checksumInfo() (#17963)
- this PR avoids sending a large ChecksumInfo slice
  when its not needed

- also for a file with XLV2 format there is no reason
  to allocate Checksum slice while reading
2023-09-01 13:45:58 -07:00
Harshavardhana
8a57b6bced use renameat2 Linux extension syscall (#17757)
this is a faster and safer alternative
on newer kernel versions.
2023-08-27 09:57:11 -07:00
Harshavardhana
124e28578c remove strict persistence requirements for List() .metacache objects (#17917)
.metacache objects are transient in nature, and are better left to
use page-cache effectively to avoid using more IOPs on the disks.

this allows for incoming calls to be not taxed heavily due to
multiple large batch listings.
2023-08-25 07:58:11 -07:00
Harshavardhana
c45bc32d98 skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana
81be718674 fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Harshavardhana
e7b60c4d65 Add slow drive timeouts to match with active disk monitoring (#17701)
allow active disk-monitoring to be configurable, and use
these add deadlines in various call layers for various
syscalls.
2023-07-25 16:58:31 -07:00
Harshavardhana
a566bcf613 treat 0-byte objects to honor same quorum as delete marker (#17633)
on unversioned buckets its possible that 0-byte objects
might lose quorum on flaky systems, allow them to be same
as DELETE markers. Since practically speak they have no
content.
2023-07-11 21:53:49 -07:00
Kaan Kabalak
f64d62b01d Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Poorna
e8c98c3246 Avoid extra GetObjectInfo call in DeleteObject API (#17599)
Optimize DeleteObject API to avoid extra 
GetObjectInfo call on the replicating side.

For receiving side, it is just a regular
DeleteObject call.

Bonus: Fix a corner case where version purged is 
absent on target (either due to replication not yet
complete or target version already deleted in a
one-way replication or when replication was disabled). 

In such cases, mark version purge complete.
2023-07-10 07:57:56 -07:00
Klaus Post
ff5988f4e0 Reduce allocations (#17584)
* Reduce allocations

* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.

* Make type safe. Make buffer 4K instead of 8.

* Unslice
2023-07-06 16:02:08 -07:00
Kaan Kabalak
21fbe88e1f Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00