epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.
upstream issue https://github.com/golang/go/issues/65064
It seems to be a deeper problem; haven't yet tried the fix
provide in this issue, but however this change without
changing the compiler helps.
Of course, this is a workaround for now, hoping for a
more comprehensive fix from Go runtime.
the main reason is to let Go net/http perform necessary
book keeping properly, and in essential from consistency
point of view its GETs all the way.
Deprecate sendFile() as its buggy inside Go runtime.
kms: Expose API available when bucket federation is enabled
When bucket federation feature is enabled, KMS API will not work, such
as `mc admin kms key list`
The commit will fix the issue by disabling bucket forwarding when this
is a KMS request.
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.
A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.
This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
If site replication enabled across sites, replicate the SSE-C
objects as well. These objects could be read from target sites
using the same client encryption keys.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.
This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`
The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:
when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no
resource found error
All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
This PR adds a WebSocket grid feature that allows servers to communicate via
a single two-way connection.
There are two request types:
* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
roundtrips with small payloads.
* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
which allows for different combinations of full two-way streams with an initial payload.
Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.
Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.
If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte
slices or use a higher-level generics abstraction.
There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.
The request path can be changed to a new one for any protocol changes.
First, all servers create a "Manager." The manager must know its address
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.
```
func (m *Manager) Connection(host string) *Connection
```
All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight
requests and responses must also be given for streaming requests.
The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.
* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
performs a single request and returns the result. Any deadline provided on the request is
forwarded to the server, and canceling the context will make the function return at once.
* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
will initiate a remote call and send the initial payload.
```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
// Responses from the remote server.
// Channel will be closed after an error or when the remote closes.
// All responses *must* be read by the caller until either an error is returned or the channel is closed.
// Canceling the context will cause the context cancellation error to be returned.
Responses <-chan Response
// Requests sent to the server.
// If the handler is defined with 0 incoming capacity this will be nil.
// Channel *must* be closed to signal the end of the stream.
// If the request context is canceled, the stream will no longer process requests.
Requests chan<- []byte
}
type Response struct {
Msg []byte
Err error
}
```
There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.
Add a new endpoint for "resource" metrics `/v2/metrics/resource`
This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.
Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.
Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
health checks were missing for drives replaced since
- HealFormat() would replace the drives without a health check
- disconnected drives when they reconnect via connectEndpoint()
the loop also loses health checks for local disks and merges
these into a single code.
- other than this separate cleanUp, health check variables to avoid
overloading them with similar requirements.
- also ensure that we compete via context selector for disk monitoring
such that the canceled disks don't linger around longer waiting for
the ticker to trigger.
- allow disabling active monitoring.
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.
some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.
Bonus: Added new cluster metrics for usage, objects and histograms
Since `addCustomerHeaders` middleware was after the `httpTracer`
middleware, the request ID was not set in the http tracing context. By
reordering these middleware functions, the request ID header becomes
available. We also avoid setting the tracing context key again in
`newContext`.
Bonus: All middleware functions are renamed with a "Middleware" suffix
to avoid confusion with http Handler functions.
* Reduce allocations
* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.
* Make type safe. Make buffer 4K instead of 8.
* Unslice
500k is a reasonable limit for any single MinIO
cluster deployment, in future we may increase this
value.
However for now we are going to keep this limit.
also use designated names for internal
calls
- storageREST calls are storageR
- lockREST calls are lockR
- peerREST calls are just peer
Named in this fashion to facilitate wildcard matches
by having prefixes of the same name.
Additionally, also enable funcNames for generic handlers
that return errors, currently we disable '<unknown>'
PR #15052 caused a regression, add the missing metrics back.
Bonus:
- internode information should be only for distributed setups
- update the dashboard to include 4xx and 5xx error panels.
Currently minio_s3_requests_errors_total covers 4xx and
5xx S3 responses which can be confusing when s3 applications
sent a lot of HEAD requests with obvious 404 responses or
when the replication is enabled.
Add
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total
to help users monitor 4xx and 5xx HTTP status codes separately.
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.
Though the federation is deprecated, let's fix this for the time being.
- combine similar looking functionalities into single
handlers, and remove unnecessary proxying of the
requests at handler layer.
- remove bucket forwarding handler as part of default setup
add it only if bucket federation is enabled.
Improvements observed for 1kiB object reads.
```
-------------------
Operation: GET
Operations: 4538555 -> 4595804
* Average: +1.26% (+0.2 MiB/s) throughput, +1.26% (+190.2) obj/s
* Fastest: +4.67% (+0.7 MiB/s) throughput, +4.67% (+739.8) obj/s
* 50% Median: +1.15% (+0.2 MiB/s) throughput, +1.15% (+173.9) obj/s
```
read/writers are not concurrent in handlers
and self contained - no need to use atomics on
them.
avoids unnecessary contentions where it's not
required.