Commit Graph

609 Commits

Author SHA1 Message Date
Anis Eleuch
4eeb48f8e0
Return cached online/offline status for audit/http loggers (#18083)
To avoid having delays in prometheus scrape and in 'mc admin info' command.
2023-09-21 16:58:24 -07:00
Harshavardhana
1472875670
fix: failed messages counting in audit_http metrics (#18075)
all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.

Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
2023-09-21 11:24:56 -07:00
Harshavardhana
2add57cfed
apply healing per object at 1024 cycles (#18050)
- we already have MRF for most recent failures
- we trigger healing during HEAD/GET operation

These are enough, also change the default max wait
from 5sec to 1sec for default scanner speed.
2023-09-19 09:24:22 -07:00
jiuker
9947c01c8e
feat: SSE-KMS use uuid instead of read all data to md5. (#17958) 2023-09-18 10:00:54 -07:00
Anis Eleuch
419e5baf16
fix: webhook notify endpoint with standard ports (#18016) 2023-09-14 20:10:44 -07:00
Alex
dc48cd841a
Added MINIO_PROMETHEUS_AUTH_TOKEN env support (#18028)
Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>
2023-09-14 17:28:21 -07:00
Aditya Manthramurthy
cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Harshavardhana
32890342ce
introduce MINIO_BROWSER_REDIRECT env to enable/disable auto-redirect (#18025) 2023-09-13 18:43:57 -07:00
Anis Eleuch
41de53996b
heal: calculate the number of workers based on NRRequests (#17945) 2023-09-11 14:48:54 -07:00
Harshavardhana
9878031cfd
fix: change DISK_ to DRIVE_ for some drive related envs (#18005) 2023-09-11 12:19:22 -07:00
Harshavardhana
e3fbcaeb72
allow scanner key cycle to be empty (#18001)
configs from 2020 server throws an
error due to deprecation of the keys
however an attempt is made to parse
them, we should have chosen existing
defaults - this PR fixes that.
2023-09-09 08:53:32 -07:00
Anis Eleuch
b9269151a4
fix: drive rotational calculation status for partitions (#17986)
Fix drive rotational calculation status

If a MinIO drive path is mounted to a partition and not a real disk,
getting the rotational status would fail because Linux does not expose
that status to partition; In other words,
/sys/block/drive-partition-name/queue/rotational does not exist;

To fix the issue, the code will search for the rotational status of the
disk that hosts the partition, and this can be calculated from the
real path of /sys/class/block/<drive-partition-name>
2023-09-06 12:37:57 -07:00
Harshavardhana
5b114b43f7
refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Aditya Manthramurthy
1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Anis Eleuch
6a8d8f34a5
kafka: Do not require key when sending a message (#17962)
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;

This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
2023-09-01 08:37:22 -07:00
Harshavardhana
0d1fbef751
fix: a possible crash in event target Close() (#17948)
these are possible crashes when the configured
target is still in init() state and never finished
- however a delete config was initiated.
2023-08-30 07:27:45 -07:00
Harshavardhana
07b1281046 add queue_dir to help message for logger/audit targets 2023-08-29 16:07:35 -07:00
Harshavardhana
adb8be069e
tune-kafka targets to ensure timeout triggers on hung brokers (#17898)
hung brokers can cause slowness to the entire system
when many callers are hung, leading to large goroutine
build-up.
2023-08-22 20:26:35 -07:00
Harshavardhana
3a0125fa1f
remove unexpected logging from peer calls (#17888)
also make sure RequestID is set for system logs
2023-08-21 14:25:24 -07:00
Daniel Valdivia
328cb0a076
Pass environment variable to control session length to console (#17885)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2023-08-21 11:55:43 -07:00
jiuker
fa2a8d7209
fix: drain the req.body into io.Discard correctly (#17881) 2023-08-21 01:09:07 -07:00
Andreas Auernhammer
8f8f8854f0
update minio/kes-go dep to v0.2.0 (#17850)
This commit updates the minio/kes-go dependency
to v0.2.0 and updates the existing code to work
with the new KES APIs.

The `SetPolicy` handler got removed since it
may not get implemented by KES at all and could
not have been used in the past since stateless KES
is read-only w.r.t. policies and identities.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2023-08-19 07:37:53 -07:00
Harshavardhana
bc7c0d8624
if object is a delete marker it must skip tags filter in ILM (#17861) 2023-08-18 09:36:23 -07:00
Harshavardhana
11dfc817f3
do not log client canceled events (#17838) 2023-08-17 14:53:43 -07:00
Harshavardhana
8a9b886011
update grafana dashboard with disk -> drive rename (#17857) 2023-08-15 16:04:20 -07:00
Klaus Post
406ea4f281
Fix distributed listing not able to resume (#17855)
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.

This meant that in distributed systems listings would never be able to continue and would restart on every call.

Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.
2023-08-15 07:45:25 -07:00
Harshavardhana
c45bc32d98
skip disks under scanning when healing disks (#17822)
Bonus:

- avoid calling DiskInfo() calls when missing blocks
  instead heal the object using MRF operation.

- change the max_sleep to 250ms beyond that we will
  not stop healing.
2023-08-09 12:51:47 -07:00
Praveen raj Mani
0285df5a02
fix: prioritize audit_webhook and logger_webhook ENVs over the config KVS (#17783) 2023-08-03 02:47:07 -07:00
Harshavardhana
81be718674
fix: optimize DiskInfo() call avoid metrics when not needed (#17763) 2023-07-31 15:20:48 -07:00
Harshavardhana
73edd5b8fd
introduce 'mc admin config set alias/ api odirect=on' (#17753)
change disable_odirect=off -> odirect=on to make it
easier to understand, instead of making it double
negative.
2023-07-31 00:12:53 -07:00
Harshavardhana
f13cfcb83e
allow disabling O_DIRECT for write ops (#17751)
on really slow systems, O_DIRECT simply kills the drives
allow for a way to disable them.
2023-07-29 15:17:56 -07:00
Anis Eleuch
9c0e8cd15b
logger: Avoid slow calls in http logger Send() function (#17747)
Send() is synchronous and can affect the latency of S3 requests when the
logger buffer is full.

Avoid checking if the HTTP target is online or not and increase the
workers anyway since the buffer is already full.

Also, avoid logs flooding when the audit target is down.
2023-07-29 12:49:18 -07:00
Harshavardhana
731e03fe5a
add ReadFileStream deadline for disk call (#17745)
timeout the reader side if hung via disk max timeout
2023-07-28 15:37:53 -07:00
jiuker
f9d029c8fa
fix: prevCertificate save not correct in loop (#17744)
fix certs save not correct

Co-authored-by: guozhi.li <guozhi.li@daocloud.io>
2023-07-28 12:57:49 -07:00
Harshavardhana
114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
Harshavardhana
e7b60c4d65
Add slow drive timeouts to match with active disk monitoring (#17701)
allow active disk-monitoring to be configurable, and use
these add deadlines in various call layers for various
syscalls.
2023-07-25 16:58:31 -07:00
Harshavardhana
14e1ace552
remove serializing WalkDir() across all buckets/prefixes on SSDs (#17707)
slower drives get knocked off because they are too slow via 
active monitoring, we do not need to block calls arbitrarily.

Serializing adds latencies for already slow calls, remove
it for SSDs/NVMEs

Also, add a selection with context when writing to `out <-`
channel, to avoid any potential blocks.
2023-07-24 09:30:19 -07:00
Harshavardhana
e12ab486a2
avoid using os.Getenv for internal code, use env.Get() instead (#17688) 2023-07-20 07:52:49 -07:00
drivebyer
9b5c2c386a
fix: return error when requested interface has no stats available (#17666) 2023-07-17 01:14:01 -07:00
jiuker
d118031ed6
fix: when Origin: null is set return back '*' for allow origins (#17651) 2023-07-15 12:15:06 -07:00
Shireesh Anjal
bb63375f1b
Do not consider subnet api key as secret (#17643)
As it is required by mc and console to communicate with subnet
2023-07-13 12:24:47 -07:00
jiuker
183428db03
fear: Implement 'mc support top net' (#17598) 2023-07-13 11:41:19 -07:00
Shubhendu
9b9871cfbb
Added endpoint and versions attributes to KMS details (#17350)
Now it would list details of all KMS instances with additional
attributes `endpoint` and `version`. In the case of k8s-based
deployment the list would consist of a single entry.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-12 23:50:38 -07:00
Harshavardhana
2d1cda2061
fix: do not os.Exit(1) while writing goroutines during shutdown (#17640)
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.

Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.

It can be observed how now restarting the service is
much snappier.
2023-07-12 07:18:30 -07:00
Poorna
fb49aead9b
replication: add validation API (#17520)
To check if replication is set up properly on a bucket.
2023-07-10 20:09:20 -07:00
Harshavardhana
f6186965c3
honor DeleteAllVersions in list(), head() calls (#17604) 2023-07-08 15:42:10 -07:00
Harshavardhana
28a01f0320
update missing license header in files (#17603) 2023-07-08 10:42:05 -07:00
Klaus Post
45a717a142
Avoid per request URL parsing (#17593)
Every request does a `url.Parse(c.url.String())` to clone a URL.

The host will also be static, so we rewrite that on creation.
2023-07-07 22:07:30 -07:00
Harshavardhana
0bc34952eb
fix: under FanOut API avoid repeated md5sum calculation (#17572)
md5sum calculation has a high CPU overhead, avoid calculating
it repeatedly for similar fanOut calls.

To fix following CPU profiler result
```
(pprof) top10
Showing nodes accounting for 678.68s, 84.67% of 801.54s total
Dropped 1072 nodes (cum <= 4.01s)
Showing top 10 nodes out of 156
      flat  flat%   sum%        cum   cum%
   332.54s 41.49% 41.49%    332.54s 41.49%  runtime/internal/syscall.Syscall6
   228.39s 28.49% 69.98%    228.39s 28.49%  crypto/md5.block
    48.07s  6.00% 75.98%     48.07s  6.00%  runtime.memmove
    28.91s  3.61% 79.59%     28.91s  3.61%  github.com/minio/highwayhash.updateAVX2
     8.25s  1.03% 80.61%      8.25s  1.03%  runtime.futex
     8.25s  1.03% 81.64%     10.81s  1.35%  runtime.step
     6.99s  0.87% 82.52%     22.35s  2.79%  runtime.pcvalue
     6.67s  0.83% 83.35%     38.90s  4.85%  runtime.mallocgc
     5.77s  0.72% 84.07%     32.61s  4.07%  runtime.gentraceback
     4.84s   0.6% 84.67%     10.49s  1.31%  runtime.lock2
```
2023-07-05 03:16:05 -07:00
Harshavardhana
e37c4efc6e
fix: upon DNS refresh() failure use previous values (#17561)
DNS refresh() in-case of MinIO can safely re-use
the previous values on bare-metal setups, since
bare-metal arrangements do not change DNS in any 
manner commonly.

This PR simplifies that, we only ever need DNS caching
on bare-metal setups.

- On containerized setups do not enable DNS
  caching at all, as it may have adverse effects on
  the overall effectiveness of k8s DNS systems.

  k8s DNS systems are dynamic and expect applications
  to avoid managing DNS caching themselves, instead
  provide a cleaner container native caching
  implementations that must be used.

- update IsDocker() detection, including podman runtime

- move to minio/dnscache fork for a simpler package
2023-07-03 12:30:51 -07:00
Harshavardhana
22f5bc643c
fix: honor older scanner settings only if newer has not changed (#17564) 2023-07-03 12:28:36 -07:00
Harshavardhana
7f782983ca
fix: for FTP server driver allow implicit trust of TLS (#17541)
fixes #17535
2023-06-30 08:04:13 -07:00
Aditya Manthramurthy
bde533a9c7
fix: OpenID config initialization (#17544)
This is due to a regression in the handling of the enable key in OpenID
configuration.
2023-06-29 23:38:26 -07:00
Harshavardhana
aae6846413
feat: allow expiration of all versions via ILM Expiration action (#17521)
Following extension allows users to specify immediate purge of
all versions as soon as the latest version of this object has
expired.

```
<LifecycleConfiguration>
    <Rule>
        <ID>ClassADocRule</ID>
        <Filter>
           <Prefix>classA/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Expiration>
             <Days>3650</Days>
	     <ExpiredObjectAllVersions>true</ExpiredObjectAllVersions>
        </Expiration>
    </Rule>
    ...
```
2023-06-28 22:12:28 -07:00
guangwu
87b6fb37d6
chore: pkg imported more than once (#17444) 2023-06-26 09:21:29 -07:00
Aditya Manthramurthy
f3248a4b37
Redact all secrets from config viewing APIs (#17380)
This change adds a `Secret` property to `HelpKV` to identify secrets
like passwords and auth tokens that should not be revealed by the server
in its configuration fetching APIs. Configuration reporting APIs now do
not return secrets.
2023-06-23 07:45:27 -07:00
Harshavardhana
74759b05a5
make sure to set relevant config entries correctly (#17485)
Bonus: also allow skipping keys properly.
2023-06-22 10:04:02 -07:00
Praveen raj Mani
7c72b25ef0
Add an option to make bucket notifications synchronous (#17406)
With the current asynchronous behaviour in sending notification events
to the targets, we can't provide guaranteed delivery as the systems
might go for restarts.

For such event-driven use-cases, we can provide an option to enable
synchronous events where the APIs wait until the event is successfully
sent or persisted.

This commit adds 'MINIO_API_SYNC_EVENTS' env which when set to 'on'
will enable sending/persisting events to targets synchronously.
2023-06-20 17:38:59 -07:00
Aditya Manthramurthy
5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Harshavardhana
22b7c8cd8a
upgrade pkg and dperf to latest packages (#17448)
- dperf improvements in benchmarking read and write tests
- upgrade mimedb to use latest content-types
2023-06-17 07:31:36 -07:00
jiuker
0474791cf8
fix: set time format right (#17402) 2023-06-14 07:49:13 -07:00
Harshavardhana
f32efd5429
more compliance related fixes (#17408)
- lifecycle must return InvalidArgument for rule errors
- do not return `null` versionId in HTTP header
- reject mixed SSE uploads with correct error message
2023-06-13 13:52:33 -07:00
Anis Eleuch
0f0dcf0c5e
tar: Avoid storing snowball extraction header in extract objects (#17389) 2023-06-12 09:42:06 -07:00
Anis Eleuch
bb24346e04
listen: Only error out if not able to bind any interface (#17353) 2023-06-12 09:09:28 -07:00
Harshavardhana
b829e80ecb
do not disable root for invalid API config values (#17386) 2023-06-08 15:50:06 -07:00
Klaus Post
6e38d0f3ab
Add more bootstrap info in debug mode (#17362) 2023-06-08 08:39:47 -07:00
Harshavardhana
dbd4c2425e
fix: kafka broker pings must not be greater than 1sec (#17376) 2023-06-07 11:47:00 -07:00
Anis Eleuch
eba378e4a1
vrf: Fix testing for loopback coming from the address (#17372) 2023-06-07 09:53:05 -07:00
Harshavardhana
2f9e2147f5
allow quota enforcement to rely on older values (#17351)
PUT calls cannot afford to have large latency build-ups due
to contentious usage.json, or worse letting them fail with
some unexpected error, this can happen when this file is
concurrently being updated via scanner or it is being
healed during a disk replacement heal.

However, these are fairly quick in theory, stressed clusters
can quickly show visible latency this can add up leading to
invalid errors returned during PUT.

It is perhaps okay for us to relax this error return requirement
instead, make sure that we log that we are proceeding to take in
the requests while the quota is using an older value for the quota
enforcement. These things will reconcile themselves eventually,
via scanner making sure to overwrite the usage.json.

Bonus: make sure that storage-rest-client sets ExpectTimeouts to
be 'true', such that DiskInfo() call with contextTimeout does
not prematurely disconnect the servers leading to a longer
healthCheck, back-off routine. This can easily pile up while also
causing active callers to disconnect, leading to quorum loss.

DiskInfo is actively used in the PUT, Multipart call path for
upgrading parity when disks are down, it in-turn shouldn't cause
more disks to go down.
2023-06-05 16:56:35 -07:00
jiuker
8030e12ba5
fix: expMovingAvg is too small when startTime is zero (#17346) 2023-06-03 13:41:51 -07:00
jiuker
fb5ce3b87a
record err time when remote node is offline (#17262) 2023-05-30 10:07:26 -07:00
Klaus Post
6fe028b7c5
Revert s3 select simdjson reuse (#17310) 2023-05-30 10:02:22 -07:00
Anis Eleuch
54c5c88fe6
Add number of offline disks in quorum errors (#16822) 2023-05-25 09:39:06 -07:00
jiuker
443250d135
fix: for Target isActive use net.Dial instead (#17251) 2023-05-25 09:24:11 -07:00
jiuker
d749aaab69
fix: ignore existing target status when adding new targets (#17250) 2023-05-24 22:57:37 -07:00
Krishnan Parthasarathi
62df731006
Add updatedAt for GetBucketLifecycleConfig (#17271) 2023-05-24 22:52:39 -07:00
Harshavardhana
d0a0eb9738
support fan-out objects via PostUpload() (#17233) 2023-05-24 22:51:07 -07:00
Klaus Post
5677f73794
Add PostObject Checksum (#17244) 2023-05-23 07:58:33 -07:00
Krishnan Parthasarathi
55a3310446
logger-http: Don't retry after a succesful send (#17266) 2023-05-22 14:53:18 -07:00
Harshavardhana
fc03be7891
simplify bucket metadata lookups for versioning/object locking (#17253) 2023-05-22 12:05:14 -07:00
jiuker
b1b00a5055
fix: Avoid Income globalStats twice upon error (#17263) 2023-05-22 07:42:27 -07:00
jiuker
41fa8fa2d2
fix: increment counter when entry be skipped (#17237) 2023-05-19 08:36:52 -07:00
jiuker
e94e6adf91
fix: return proper error if OIDC Discoverydoc fails to respond (#17242) 2023-05-19 02:13:33 -07:00
Klaus Post
b06d7bf834
fix: leaking connections in JSON SQL with limited return (#17239) 2023-05-18 11:26:46 -07:00
Aditya Manthramurthy
9d96b18df0
Add "name" and "description" params to service acc (#17172) 2023-05-17 17:05:36 -07:00
Praveen raj Mani
85912985b6
Check for only network errors in audit webhook for reachability (#17228) 2023-05-17 11:10:33 -07:00
Klaus Post
aaf1abc993
simplify HardLimitReader by using LimitReader for internal usage (#17218) 2023-05-16 13:14:37 -07:00
Harshavardhana
ef2fc0f99e
fix: reduce using memory and temporary files. (#17206) 2023-05-15 14:08:54 -07:00
Anis Eleuch
684399433b
lock: Retry locking with an increasing random interval (#17200) 2023-05-13 08:42:21 -07:00
Poorna
e07c2ab868
Use hash.NewLimitReader for internal multipart calls (#17191) 2023-05-12 11:19:08 -07:00
Klaus Post
99c4ffa34f
fix: avoid audit log race protection deadlocks (#17168) 2023-05-09 08:11:32 -07:00
Harshavardhana
a7f266c907
allow JWT parsing on large session policy based tokens (#17167) 2023-05-09 00:53:08 -07:00
Praveen raj Mani
57acacd5a7
Support persistent queue store for loggers (#17121) 2023-05-08 21:20:31 -07:00
Klaus Post
76913a9fd5
Signed trailers for signature v4 (#16484) 2023-05-05 19:53:12 -07:00
Harshavardhana
5569acd95c
disallow EC:0 if not set during server startup (#17141) 2023-05-04 14:44:30 -07:00
Alex
6e24dff26a
Added MINIO_BROWSER_LOGIN_ANIMATION env support for WebUI console (#17123)
Signed-off-by: Benjamin Perez <benjamin@bexsoft.net>
2023-05-03 15:32:50 -07:00
Harshavardhana
9571b0825e
add configurable VRF interface and user-timeout (#17108) 2023-05-03 14:12:25 -07:00
Krishnan Parthasarathi
0ec722bc54
Add tags to NewerNoncurrentVersions audit event (#17110) 2023-05-02 12:56:33 -07:00
Anis Eleuch
4640b13c66
Use expontential backoff algo for internode reconnections (#17052) 2023-05-02 12:35:52 -07:00
Praveen raj Mani
1704abaf6b
fix: store notification events immediately for persistent queues (#17112) 2023-05-02 07:53:13 -07:00