Commit Graph

5424 Commits

Author SHA1 Message Date
Harshavardhana dde1a12819
fix: validate incoming uploadID to be base64 encoded (#17865)
Bonus fixes include

- do not have to write final xl.meta (renameData) does this
  already, saves some IOPs.

- make sure to purge the multipart directory properly using
  a recursive delete, otherwise this can easily pile up and
  rely on the stale uploads cleanup.

fixes #17863
2023-08-17 09:37:55 -07:00
Harshavardhana 9ebd10d3f4
Revert "Include SuccessorModTime for FileInfo quorum (#17732)" (#17860)
This reverts commit bf3901342c.

This is to fix a regression caused when there are inconsistent
versions, but one version is in quorum. SuccessorModTime issue
must be fixed differently.
2023-08-16 07:51:33 -07:00
Harshavardhana 3ba927edae
fix: batch status reporting after complete (#17852)
batch status can perpetually wait after completion
due to a race between the MetricsHandler() returning
the active metrics in intervals of 1sec and delete
of metrics after job completion.

this PR ensures that we keep the 'status' around
for a while, i.e upto 24hrs for all the batch jobs.
2023-08-15 12:22:30 -07:00
Harshavardhana c4ca0a5a57
add two more drive metrics when metrics is available (#17854) 2023-08-15 10:55:47 -07:00
Klaus Post 406ea4f281
Fix distributed listing not able to resume (#17855)
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.

This meant that in distributed systems listings would never be able to continue and would restart on every call.

Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.
2023-08-15 07:45:25 -07:00
Harshavardhana 64aa7feabd
allow specifying lower disks for Walk() (#17829)
useful when you may want Walk() with
reduced quorum requirements.
2023-08-14 21:32:39 -07:00
Poorna 875f4076ec
site replication: avoid retries when peer is offline (#17853) 2023-08-14 21:31:41 -07:00
Harshavardhana 4643efe6be
fix: add deadline worker pattern for local disk removers (#17845) 2023-08-14 12:28:13 -07:00
Harshavardhana b760137e1d
fix: add proxyByNode for batch jobs as part of their jobId (#17844) 2023-08-11 13:12:35 -07:00
Harshavardhana 5f56f441bf
fix: apply common notification code with content-type (#17843) 2023-08-11 11:34:43 -07:00
Klaus Post 96a22bfcbb
fix: wrapped io.EOF during ListObjects() (#17842)
When listing getObjectFileInfo can return `io.EOF` if file is being written.

When we wrap the error it will *not* retry upstream, since `io.EOF` is a valid return value.

Allow one retry before returning errors and canceling the listing.
2023-08-11 09:47:16 -07:00
Poorna dfaf735073
replication: fix queuing of large uploads (#17831)
Fixes regression from #17687
2023-08-10 15:48:42 -07:00
Anis Eleuch 7fcfde7f07
s3: Pick a pool with >85% if all other pools are in suspended state (#17826) 2023-08-10 11:06:31 -07:00
jiuker b1391d1991
feat: support perf client to show `TX` from client to server (#17718) 2023-08-10 07:14:46 -07:00
Harshavardhana eb55034dfe
optimize deletePrefix, use direct set location via object name (#17827)
* optimize deletePrefix, use direct set location via object name

instead of fanning out the calls for an object force delete
we can assume the set location and not do fan-out calls

* Apply suggestions from code review

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>

---------

Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>
2023-08-09 16:30:22 -07:00
Harshavardhana c45bc32d98
skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Harshavardhana 6e860b6dc5
count all versions as part of DeleteAllVersionsAction (#17821) 2023-08-09 08:55:19 -07:00
Harshavardhana b732a673dc
reduce logging in bucket replication in retry scenarios (#17820) 2023-08-08 13:27:40 -07:00
Yang Wu 23e4895dfc
Create metrics slice when necessary (#17809) 2023-08-07 02:21:22 -07:00
Harshavardhana 8666c55ca6
fix: do not use PrefixEnabled() logic to ignore valid objects (#17677)
ignoring valid objects with valid replication metadata
after the Prefix was disabled must still honor the older
metadata.

this can lead to unexpected results, allow it during
READ phase always.
2023-08-05 13:56:01 -07:00
Anis Eleuch a3f00c5d5e
batch: Strict unmarshal yaml document to avoid user made typos (#17808)
// UnmarshalStrict is like Unmarshal except that any fields that are found
// in the data that do not have corresponding struct members, or mapping
// keys that are duplicates, will result in
// an error.
2023-08-05 13:51:48 -07:00
Poorna 26c23b30f4
replication: set context timeout for NewMultipartUpload calls (#17807) 2023-08-05 12:27:07 -07:00
Anis Eleuch a436fd513b
track client disconnections properly for all ListObjects calls (#17804)
Currently ListObjects* calls were returning 200 OK for timed-out clients,
this makes debugging via `mc admin trace` very hard.
2023-08-04 15:57:27 -07:00
Harshavardhana 533cd8d6df
fix: batch replication pull must preserve versionID (#17805)
batch replication pull must preserve versionID regardless
of destination bucket versioning configuration.

This is similar to the issue with decommissioning and rebalancing
2023-08-04 12:09:10 -07:00
Harshavardhana cb089dcb52
error out by default beyond 10000 versions per object (#17803)
```
You've exceeded the limit on the number of versions you can create on this object
```
2023-08-04 10:40:21 -07:00
Harshavardhana 239ccc9c40
fix: crash in globalTierJournal when TierConfig is not initialized (#17791) 2023-08-03 14:16:15 -07:00
Poorna b762fbaf21
sts: validate if iam subsystem initialized in handlers (#17796) 2023-08-03 13:24:25 -07:00
Praveen raj Mani 0285df5a02
fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Harshavardhana 45fb375c41
allow healing to prefer local disks over remote (#17788) 2023-08-03 02:18:18 -07:00
Harshavardhana 4a4950fe41
fix: honor requested allow origin settings properly (#17789)
fixes #17778
2023-08-02 20:41:21 -07:00
Anis Eleuch 1664fd8bb1
Avoid logging errors twice during transitioned objects expiration (#17782) 2023-08-02 09:06:03 -07:00
Harshavardhana 21cdd2bf5d
avoid overwriting metrics on success, save it in defer (#17780) 2023-08-01 22:19:56 -07:00
Harshavardhana 0153f96a20
add deadlines for readMetadata() in listing (#17776)
Bonus: also skip spending time looking for xl.json

- Listing()
- Delete()
2023-08-01 21:52:31 -07:00
Harshavardhana a7a7533190
add new errors for Disks with timeouts (#17770) 2023-08-01 12:47:50 -07:00
Poorna 311380f8cb
replication resync: fix queueing (#17775)
Assign resync of all versions of object to the same worker to avoid locking
contention. Fixes parallel resync implementation in #16707
2023-08-01 11:51:15 -07:00
Harshavardhana b0f0e53bba
fix: make sure to correctly initialize health checks (#17765)
health checks were missing for drives replaced since

- HealFormat() would replace the drives without a health check
- disconnected drives when they reconnect via connectEndpoint()
  the loop also loses health checks for local disks and merges
  these into a single code.
- other than this separate cleanUp, health check variables to avoid
  overloading them with similar requirements.
- also ensure that we compete via context selector for disk monitoring
  such that the canceled disks don't linger around longer waiting for
  the ticker to trigger.
- allow disabling active monitoring.
2023-08-01 10:54:26 -07:00
Klaus Post 004f1e2f66
Fix trailing header signature mismatch (#17774)
Seems like clients may omit a newline at the end of the trailer chunk. Each header should end with a newline. Add that if missing.

Fixes #17662
2023-08-01 08:45:57 -07:00
Harshavardhana 2fa561f22e
do not crash on invalid metric values (#17764)
```
minio[1032735]: panic: label value "\xc0.\xc0." is not valid UTF-8
minio[1032735]: goroutine 1781101 [running]:
minio[1032735]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
```

log such errors for investigation
2023-08-01 00:55:39 -07:00
Harshavardhana 81be718674
fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Sho Ce 49a1e2f98e
update-notifier.go: misleading version age message (#17750) 2023-07-31 08:36:19 -07:00
Klaus Post 684c46369c
Send events for extracted objects (#17760)
Fixes #17759
2023-07-31 08:33:51 -07:00
Harshavardhana 73edd5b8fd
introduce 'mc admin config set alias/ api odirect=on' (#17753)
change disable_odirect=off -> odirect=on to make it
easier to understand, instead of making it double
negative.
2023-07-31 00:12:53 -07:00
Harshavardhana 5e5bdf5432
capture total errors data availability and any timeout errors (#17748) 2023-07-29 23:26:26 -07:00
Harshavardhana f13cfcb83e
allow disabling O_DIRECT for write ops (#17751)
on really slow systems, O_DIRECT simply kills the drives
allow for a way to disable them.
2023-07-29 15:17:56 -07:00
Harshavardhana 731e03fe5a
add ReadFileStream deadline for disk call (#17745)
timeout the reader side if hung via disk max timeout
2023-07-28 15:37:53 -07:00
Anis Eleuch 7057d00a28
s3: Return invalid bucket name the first thing in all S3 calls (#17742) 2023-07-28 10:49:20 -07:00
Harshavardhana 114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
ruspaul013 a92cb66468
Get the signed headers in the order they were signed (#17690)
use pSignValues to get signed headers in order
2023-07-27 11:45:30 -07:00
ruspaul013 535f97ba61
check if metadata headers/url values are equal with signed headers (#17737) 2023-07-27 11:44:56 -07:00
drivebyer 14ebd82dbd
fix: missing disk metrics when query metric api from peer (#17738) 2023-07-27 11:44:13 -07:00