Commit Graph

17 Commits

Author SHA1 Message Date
Poorna Krishnamoorthy 47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Ritesh H Shukla 3ddd8b04d1
fix: handle unsupported APIs more granularly (#11674) 2021-03-30 23:19:36 -07:00
Anis Elleuch 2c296652f7
Simplify access to local node name (#11907)
The local node name is heavily used in tracing, create a new global 
variable to store it. Multiple goroutines can access it since it won't be
changed later.
2021-03-26 11:37:58 -07:00
Klaus Post b383522743
fix error could not read /proc ion windows. (#11868)
Bonus: Prealloc reasonable sizes for metrics.
2021-03-25 12:58:43 -07:00
Harshavardhana d7f32ad649 xl: avoid sending Delete() remote call for fully successful runs
an optimization to avoid extra syscalls in PutObject(),
adds up to our PutObject response times.
2021-03-24 17:32:12 -07:00
Klaus Post 749e9c5771
metrics: Add canceled requests (#11881)
Add metric for canceled requests
2021-03-24 10:25:27 -07:00
Anis Elleuch fad7b27f15
metrics: Change type of minio_s3_requests_waiting_total to gauge (#11884) 2021-03-24 09:06:37 -07:00
Ritesh H Shukla 23b03dadb8
Add process uptime metric (#11844) 2021-03-20 21:23:27 -07:00
Ritesh H Shukla b5dcaaccb4
Introduce metrics caching for performant metrics (#11831) 2021-03-19 00:04:29 -07:00
Harshavardhana 2c198ae7b6
fix: prometheus metrics disks_online count when disks are down (#11689)
prometheus metrics was using total disks instead
of online disk count, when disks were down, this
PR fixes this and also adds a new metric for
total_disk_count
2021-03-03 11:18:41 -08:00
Harshavardhana c6a120df0e
fix: Prometheus metrics to re-use storage disks (#11647)
also re-use storage disks for all `mc admin server info`
calls as well, implement a new LocalStorageInfo() API
call at ObjectLayer to lookup local disks storageInfo

also fixes bugs where there were double calls to StorageInfo()
2021-03-02 17:28:04 -08:00
Anis Elleuch e8d8dfa3ae
Add metric for internode RPC calls errors (#11669) 2021-03-01 12:31:33 -08:00
Anis Elleuch 98d3f94996
metrics: Add the number of requests in the waiting queue (#11580)
We can use this metric to check if there are too many S3 clients in the
queue and could explain why some of those S3 clients are timing out.

```
minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981
```

If max_requests is 10000 then there is a strong possibility that clients
are timing out because of the queue deadline.
2021-02-20 00:21:55 -08:00
Ritesh H Shukla 67a8f37df0
fix: disk usage capacity metric reporting (#11435) 2021-02-04 12:26:58 -08:00
Ritesh H Shukla c4848f9b4f
Add process start time to cluster metrics. (#11405) 2021-02-01 23:02:18 -08:00
Ritesh H Shukla 7575c24037
Add open FD and FD limit to cluster metrics (#11328) 2021-01-22 18:30:16 -08:00
Ritesh H Shukla b4add82bb6
Updated Prometheus metrics (#11141)
* Add metrics for nodes online and offline
* Add cluster capacity metrics
* Introduce v2 metrics
2021-01-18 20:35:38 -08:00