avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.
Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.
Remove it everywhere.
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.
This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
small setups do not return appropriate errors when speedtest
cannot run on small tiny setups, allow the tests to fail
appropriately more pro-actively.
many users bring toy setups, this PR simply returns an error
in such situations.
Some users running MinIO claim that their system became slow. One
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow,
due to disk issues for example.
- deleteBucket() should be called for cleanup
if client abruptly disconnects
- out of disk errors should be sent to client
properly and also cancel the calls
- limit concurrency to available MAXPROCS not
32 for auto-tuned setup, if procs are beyond
32 then continue normally. this is to handle
smaller setups.
fixes#13834
totalDrives reported in speedTest result were wrong
for multiple pools, this PR fixes this.
Bonus: add support for configurable storage-class, this
allows us to test REDUCED_REDUNDANCY to see further
maximum throughputs across the cluster.
- New sub-system has "region" and "name" fields.
- `region` subsystem is marked as deprecated, however still works, unless the
new region parameter under `site` is set - in this case, the region subsystem is
ignored. `region` subsystem is hidden from top-level help (i.e. from `mc admin
config set myminio`), but appears when specifically requested (i.e. with `mc
admin config set myminio region`).
- MINIO_REGION, MINIO_REGION_NAME are supported as legacy environment variables for server region.
- Adds MINIO_SITE_REGION as the current environment variable to configure the
server region and MINIO_SITE_NAME for the site name.
an active running speedTest will reject all
new S3 requests to the server, until speedTest
is complete.
this is to ensure that speedTest results are
accurate and trusted.
Co-authored-by: Klaus Post <klauspost@gmail.com>
- Go might reset the internal http.ResponseWriter() to `nil`
after Write() failure if the go-routine has returned, do not
flush() such scenarios and avoid spurious flushes() as
returning handlers always flush.
- fix some racy tests with the console
- avoid ticker leaks in certain situations
This will help other projects like `health-analyzer` to verify that the
struct was indeed populated by the minio server, and is not
default-populated during unmarshalling of the JSON.
Signed-off-by: Shireesh Anjal <shireesh@minio.io>
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
Bonus: if runs have PUT higher then capture it anyways
to display an unexpected result, which provides a way
to understand what might be slowing things down on the
system.
For example on a Data24 WDC setup it is clearly visible
there is a bug in the hardware.
```
./mc admin speedtest wdc/
⠧ Running speedtest (With 64 MiB object size, 32 concurrency) PUT: 31 GiB/s GET: 24 GiB/s
⠹ Running speedtest (With 64 MiB object size, 48 concurrency) PUT: 38 GiB/s GET: 24 GiB/s
MinIO 2021-11-04T06:08:33Z, 6 servers, 48 drives
PUT: 38 GiB/s, 605 objs/s
GET: 24 GiB/s, 383 objs/s
```
Reads are almost 14GiB/sec slower than Writes which
is practically not possible.
Logger targets were not race protected against concurrent updates from for example `HTTPConsoleLoggerSys`.
Restrict direct access to targets and make slices immutable so a returned slice can be processed safely without locks.
In (erasureServerPools).MakeBucketWithLocation deletes the created
buckets if any set returns an error.
Add `NoRecreate` option, which will not recreate the bucket
in `DeleteBucket`, if the operation fails.
Additionally use context.Background() for operations we always want to be performed.
A multi resources lock is a single lock UID with multiple associated
resources. This is created for example by multi objects delete
operation. This commit changes the behavior of Refresh() to iterate over
all locks having the same UID and refresh them.
Bonus: Fix showing top locks for multi delete objects
The intention is to list values of sys config that can potentially
impact the performance of minio.
At present, it will return max value configured for rlimit
Signed-off-by: Shireesh Anjal <shireesh@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
The intention is to provide status of any sys services that can
potentially impact the performance of minio.
At present, it will return information about the `selinux` service
(not-installed/disabled/permissive/enforcing)
Signed-off-by: Shireesh Anjal <shireesh@minio.io>
In case of non-distributed setup, if the server start command contains a
`--console-address` flag and its value contains a hostname, it is not
getting anonymized.
Fixed by replacing the console host also with `server1`
Ensure that hostnames / ip addresses are not printed in the subnet
health report. Anonymize them by replacing them with `servern` where `n`
represents the position of the server in the pool.
This is done by building a `host anonymizer` map that maps every
possible value containing the host e.g. host, host:port,
http://host:port, etc to the corresponding anonymized name and using
this map to replace the values at the time of health report generation.
A different logic is used to anonymize host names in the `procinfo`
data, as the host names are part of an ellipses pattern in the process
start command. Here we just replace the prefix/suffix of the ellipses
pattern with their hashes.
Gzip responses if appropriate, except GetObject requests.
List reponses has an almost 10:1 compression ratio with no
measurable slowdown (in fact it seems a bit faster).
Download files from *any* bucket/path as an encrypted zip file.
The key is included in the response but can be separated so zip
and the key doesn't have to be sent on the same channel.
Requires https://github.com/minio/pkg/pull/6
This commit adds an admin API for fetching
the KMS status information (default key ID, endpoints, ...).
With this commit the server exposes REST endpoint:
```
GET <admin-api>/kms/status
```
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
This commit replaces the custom KES client implementation
with the KES SDK from https://github.com/minio/kes
The SDK supports multi-server client load-balancing and
requests retry out of the box. Therefore, this change reduces
the overall complexity within the MinIO server and there
is no need to maintain two separate client implementations.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
upon errors to acquire lock context would still leak,
since the cancel would never be called. since the lock
is never acquired - proactively clear it before returning.
* lock: Always cancel the returned Get(R)Lock context
There is a leak with cancel created inside the locking mechanism. The
cancel purpose was to cancel operations such erasure get/put that are
holding non-refreshable locks.
This PR will ensure the created context.Cancel is passed to the unlock
API so it will cleanup and avoid leaks.
* locks: Avoid returning nil cancel in local lockers
Since there is no Refresh mechanism in the local locking mechanism, we
do not generate a new context or cancel. Currently, a nil cancel
function is returned but this can cause a crash. Return a dummy function
instead.
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.
Some new additions include:
- Admin APIs remote tier configuration management
- Simple journal to track remote objects to be 'collected'
This is used by object API handlers which 'mutate' object versions by
overwriting/replacing content (Put/CopyObject) or removing the version
itself (e.g DeleteObjectVersion).
- Rework of previous ILM transition to fit the new model
In the new model, a storage class (a.k.a remote tier) is defined by the
'remote' object storage type (one of s3, azure, GCS), bucket name and a
prefix.
* Fixed bugs, review comments, and more unit-tests
- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
This commit changes the config/IAM encryption
process. Instead of encrypting config data
(users, policies etc.) with the root credentials
MinIO now encrypts this data with a KMS - if configured.
Therefore, this PR moves the MinIO-KMS configuration (via
env. variables) to a "top-level" configuration.
The KMS configuration cannot be stored in the config file
since it is used to decrypt the config file in the first
place.
As a consequence, this commit also removes support for
Hashicorp Vault - which has been deprecated anyway.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit introduces a new package `pkg/kms`.
It contains basic types and functions to interact
with various KMS implementations.
This commit also moves KMS-related code from `cmd/crypto`
to `pkg/kms`. Now, it is possible to implement a KMS-based
config data encryption in the `pkg/config` package.
This code is necessary for `mc admin update` command
to work with fips compiled binaries, with fips tags
the releaseInfo will automatically point to fips
specific binaries.
This PR fixes
- close leaking bandwidth report channel leakage
- remove the closer requirement for bandwidth monitor
instead if Read() fails remember the error and return
error for all subsequent reads.
- use locking for usage-cache.bin updates, with inline
data we cannot afford to have concurrent writes to
usage-cache.bin corrupting xl.meta
The local node name is heavily used in tracing, create a new global
variable to store it. Multiple goroutines can access it since it won't be
changed later.
```
mc admin info --json
```
provides these details, for now, we shall eventually
expose this at Prometheus level eventually.
Co-authored-by: Harshavardhana <harsha@minio.io>
* Provide information on *actively* healing, buckets healed/queued, objects healed/failed.
* Add concurrent healing of multiple sets (typically on startup).
* Add bucket level resume, so restarts will only heal non-healed buckets.
* Print summary after healing a disk is done.
also re-use storage disks for all `mc admin server info`
calls as well, implement a new LocalStorageInfo() API
call at ObjectLayer to lookup local disks storageInfo
also fixes bugs where there were double calls to StorageInfo()
The previous code was iterating over replies from peers and assigning
pool numbers to them, thus missing to add it for the local server.
Fixed by iterating over the server properties of all the servers
including the local one.
We can use this metric to check if there are too many S3 clients in the
queue and could explain why some of those S3 clients are timing out.
```
minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981
```
If max_requests is 10000 then there is a strong possibility that clients
are timing out because of the queue deadline.
This commit deprecates the native Hashicorp Vault
support and removes the legacy Vault documentation.
The native Hashicorp Vault documentation is marked as
outdated and deprecated for over a year now. We give
another 6 months before we start removing Hashicorp Vault
support and show a deprecation warning when a MinIO server
starts with a native Vault configuration.
```
mc admin config set alias/ storage_class standard=EC:3
```
should only succeed if parity ratio is valid for all
server pools, if not we should fail proactively.
This PR also needs to bring other changes now that
we need to cater for variadic drive counts per pool.
Bonus fixes also various bugs reproduced with
- GetObjectWithPartNumber()
- CopyObjectPartWithOffsets()
- CopyObjectWithMetadata()
- PutObjectPart,PutObject with truncated streams
30 seconds white spaces is long for some setups which time out when no
read activity in short time, reduce the subnet health white space ticker
to 5 seconds, since it has no cost at all.
This refactor is done for few reasons below
- to avoid deadlocks in scenarios when number
of nodes are smaller < actual erasure stripe
count where in N participating local lockers
can lead to deadlocks across systems.
- avoids expiry routines to run 1000 of separate
network operations and routes per disk where
as each of them are still accessing one single
local entity.
- it is ideal to have since globalLockServer
per instance.
- In a 32node deployment however, each server
group is still concentrated towards the
same set of lockers that partipicate during
the write/read phase, unlike previous minio/dsync
implementation - this potentially avoids send
32 requests instead we will still send at max
requests of unique nodes participating in a
write/read phase.
- reduces overall chattiness on smaller setups.
this is needed such that we make sure to heal the
users, policies and bucket metadata right away as
we do listing based on list cache which only lists
'3' sufficiently good drives, to avoid possibly
losing access to these users upon upgrade make
sure to heal them.
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c
Gist of improvements:
* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no
longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.
locker check interval is high enough to safely
cleanup stale locks.
This change tracks bandwidth for a bucket and object
- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.
allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.
In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.