Commit Graph

91 Commits

Author SHA1 Message Date
Harshavardhana b534dc69ab
deprecate unexpected healing failed counters (#19705)
simplify this to avoid verbose metrics, and make
room for valid metrics to be reported for alerting
etc.
2024-05-09 11:04:41 -07:00
Harshavardhana 4f660a8eb7
fix: missing metrics for healed objects (#19392)
all healed successful objects via queueHealTask
in a non-blocking heal weren't being reported
correctly, this PR fixes this comprehensively.
2024-04-01 23:48:36 -07:00
Aditya Manthramurthy 9a4d003ac7
Add common middleware to S3 API handlers (#19171)
The middleware sets up tracing, throttling, gzipped responses and
collecting API stats.

Additionally, this change updates the names of handler functions in
metric labels to be the same as the name derived from Go lang reflection
on the handler name.

The metric api labels are now stored in memory the same as the handler
name - they will be camelcased, e.g. `GetObject` instead of `getobject`.

For compatibility, we lowercase the metric api label values when emitting the metrics.
2024-03-04 10:05:56 -08:00
Harshavardhana 62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Anis Eleuch 8432fd5ac2
prom: Add online and healing drives metrics per erasure set (#18700) 2023-12-21 16:56:43 -08:00
Harshavardhana 6829ae5b13
completely remove drive caching layer from gateway days (#18217)
This has already been deprecated for close to a year now.
2023-10-11 21:18:17 -07:00
Harshavardhana 9788d85ea3
remove logging for invalid metadata values (#18068) 2023-09-20 15:49:55 -07:00
Aditya Manthramurthy cbc0ef459b
Fix policy package import name (#18031)
We do not need to rename the import of minio/pkg/v2/policy as iampolicy
any more.
2023-09-14 14:50:16 -07:00
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Poorna b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Anis Eleuch 6d0bc5ab1e
prometheus: Fix internode stats (#17594)
Internode calculation was done inside S3 handlers, fix it by moving it
to internode handlers.

Remove admin stats since it is not used.
2023-07-08 07:35:11 -07:00
Klaus Post d85da9236e
Add Object Version count histogram (#16739) 2023-03-10 08:53:59 -08:00
Harshavardhana 14cf8f1b22
upgrade deps for minio/pkg v1.6.1 to include groups conditions (#16538) 2023-02-06 09:27:29 -08:00
Harshavardhana 2433698372
fix: remove unnecessary logs for client conn errors (#16261) 2022-12-15 08:25:05 -08:00
Anis Elleuch 932d2c3c62
Add X-Amz-Request-Id to internode calls (#16146) 2022-12-06 09:27:26 -08:00
Harshavardhana 5a8df7efb3
re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Harshavardhana 23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Poorna 6b9fd256e1
Persist in-memory replication stats to disk (#15594)
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.

This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the 
MRF queue so that entries in `PENDING` status are healed faster.
2022-09-12 12:40:02 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Anis Elleuch 8d98282afd
Better reporting of total/free usable capacity of the cluster (#15230)
The current code uses approximation using a ratio. The approximation 
can skew if we have multiple pools with different disk capacities.

Replace the algorithm with a simpler one which counts data 
disks and ignore parity disks.
2022-07-06 13:29:49 -07:00
Harshavardhana 63ac260bd5
Simplify Prometheus metrics gather (#15210) 2022-07-01 13:18:39 -07:00
Harshavardhana 2420f6c000
fix: make metrics endpoint responsive by reducing the chatter (#15055)
peerOnlineCounter was making NxN calls to many peers, this
can be really long and tedious if there are random servers
that are going down.

Instead we should calculate online peers from the point of
view of "self" and return those online and offline appropriately
by performing a healthcheck.
2022-06-08 02:43:13 -07:00
Harshavardhana f1abb92f0c
feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Harshavardhana 85f3a9f3b0 Remove Azure gateway implementation (#14418)
refer #14331
2022-04-29 12:51:23 -07:00
Harshavardhana 0cc993f403 Remove GCS, HDFS gateway implementations #14418
refer #14331
2022-04-24 10:19:17 -07:00
Harshavardhana 9c846106fa
decouple service accounts from root credentials (#14534)
changing root credentials makes service accounts
in-operable, this PR changes the way sessionToken
is generated for service accounts.

It changes service account behavior to generate
sessionToken claims from its own secret instead
of using global root credential.

Existing credentials will be supported by
falling back to verify using root credential.

fixes #14530
2022-03-14 09:09:22 -07:00
Harshavardhana d6dd17a483
make sure to pass groups for all credentials while verifying policies (#14193)
fixes #14180
2022-01-26 21:53:36 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 914bfb2d9c
fix: allow compaction on replicated buckets (#13711)
currently getReplicationConfig() failure incorrectly
returns error on unexpected buckets upon upgrade, we
should always calculate usage as much as possible.
2021-11-19 14:46:14 -08:00
Anis Elleuch 4caed7cc0d
metrics: Add replication latency metrics (#13515)
Add a new Prometheus metric for bucket replication latency

e.g.:
minio_bucket_replication_latency_ns{
    bucket="testbucket",
    operation="upload",
    range="LESS_THAN_1_MiB",
    server="127.0.0.1:9001",
    targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-11-17 12:10:57 -08:00
Daniel A. Ochoa 07dd0692b6
Fix hdfs gateway concurrent map writes (#13596)
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-11-08 09:07:58 -08:00
Anis Elleuch 20761e053e
replication: Fix replica stats during crawling (#13499)
Also show replica stats with an ARN in Prometheus output.
2021-10-22 19:13:50 -07:00
Poorna K e7f559c582
Fixes to replication metrics (#13493)
For reporting ReplicaSize and loading initial
replication metrics correctly.
2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy 19ecdc75a8
replication: Simplify metrics calculation (#13274)
Also doing some code cleanup
2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy c4373ef290
Add support for multi site replication (#12880) 2021-09-18 13:31:35 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Poorna Krishnamoorthy 3690de0c6b
Drop Pending size and count from replication metrics (#12378)
Real-time metrics calculated in-memory rely on the initial
replication metrics saved with data usage. However, this can
lag behind the actual state of the cluster at the time of server 
restart leading to inaccurate Pending size/counts reported to
Prometheus. Dropping the Pending metrics as this can be more 
reliably monitored by applications with replication notifications.

Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-05-31 20:26:52 -07:00
Harshavardhana fdc2020b10
move to iam, bucket policy from minio/pkg (#12400) 2021-05-29 21:16:42 -07:00
Harshavardhana ebf75ef10d
fix: remove all unused code (#12360) 2021-05-24 09:28:19 -07:00
Harshavardhana 1aa5858543
move madmin to github.com/minio/madmin-go (#12239) 2021-05-06 08:52:02 -07:00
Harshavardhana 069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana d0d67f9de0
feat: allow prometheus for only authorized users (#12121)
allow restrictions on who can access Prometheus
endpoint, additionally add prometheus as part of
diagnostics canned policy.

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-22 18:55:30 -07:00
Poorna Krishnamoorthy 075bccda42
Fix cluster bucket stats API for prometheus (#11970)
Metrics calculation was accumulating inital usage across all nodes
rather than using initial usage only once.

Also fixing:
- bug where all  peer traffic was going to the same node.
- reset counters when replication status changes from
PENDING -> FAILED
2021-04-06 08:36:54 -07:00
Harshavardhana 09ee303244
add cluster support for realtime bucket stats (#11963)
implementation in #11949 only catered from single
node, but we need cluster metrics by capturing
from all peers. introduce bucket stats API that
will be used for capturing in-line bucket usage
as well eventually
2021-04-04 15:34:33 -07:00
Poorna Krishnamoorthy 47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Anis Elleuch 2c296652f7
Simplify access to local node name (#11907)
The local node name is heavily used in tracing, create a new global 
variable to store it. Multiple goroutines can access it since it won't be
changed later.
2021-03-26 11:37:58 -07:00
Klaus Post 749e9c5771
metrics: Add canceled requests (#11881)
Add metric for canceled requests
2021-03-24 10:25:27 -07:00
Harshavardhana c6a120df0e
fix: Prometheus metrics to re-use storage disks (#11647)
also re-use storage disks for all `mc admin server info`
calls as well, implement a new LocalStorageInfo() API
call at ObjectLayer to lookup local disks storageInfo

also fixes bugs where there were double calls to StorageInfo()
2021-03-02 17:28:04 -08:00
Harshavardhana ffea6fcf09
fix: rename crawler as scanner in config (#11549) 2021-02-17 12:04:11 -08:00