At a customer setup with lots of concurrent calls
it can be observed that in newRetryTimer there
were lots of tiny alloations which are not
relinquished upon retries, in this codepath
we were only interested in re-using the timer
and use it wisely for each locker.
```
(pprof) top
Showing nodes accounting for 8.68TB, 97.02% of 8.95TB total
Dropped 1198 nodes (cum <= 0.04TB)
Showing top 10 nodes out of 79
flat flat% sum% cum cum%
5.95TB 66.50% 66.50% 5.95TB 66.50% time.NewTimer
1.16TB 13.02% 79.51% 1.16TB 13.02% github.com/ncw/directio.AlignedBlock
0.67TB 7.53% 87.04% 0.70TB 7.78% github.com/minio/minio/cmd.xlObjects.putObject
0.21TB 2.36% 89.40% 0.21TB 2.36% github.com/minio/minio/cmd.(*posix).Walk
0.19TB 2.08% 91.49% 0.27TB 2.99% os.statNolog
0.14TB 1.59% 93.08% 0.14TB 1.60% os.(*File).readdirnames
0.10TB 1.09% 94.17% 0.11TB 1.25% github.com/minio/minio/cmd.readDirN
0.10TB 1.07% 95.23% 0.10TB 1.07% syscall.ByteSliceFromString
0.09TB 1.03% 96.27% 0.09TB 1.03% strings.(*Builder).grow
0.07TB 0.75% 97.02% 0.07TB 0.75% path.(*lazybuf).append
```
The documentation states that `nVolumeNameSize` and `nFileSystemNameSize` are:
> The length of a volume name buffer, in TCHARs. The maximum buffer size is MAX_PATH+1.
It seems like we allocated too little for them before, so expand it to 260 wchars.
some clients such as veeam expect the x-amz-meta to
be sent in lower cased form, while this does indeed
defeats the HTTP protocol contract it is harder to
change these applications, while these applications
get fixed appropriately in future.
x-amz-meta is usually sent in lowercased form
by AWS S3 and some applications like veeam
incorrectly end up relying on the case sensitivity
of the HTTP headers.
Bonus fixes
- Fix the iso8601 time format to keep it same as
AWS S3 response
- Increase maxObjectList to 50,000 and use
maxDeleteList as 10,000 whenever multi-object
deletes are needed.
When running a zoned setup simply uploading will run
the system out of memory very fast.
Root cause: nFileSystemNameSize is a DWORD and
not a pointer. No idea how this didn't crash hard.
Furthermore replace poor mans utf16 -> string
conversion to support arbitrary output.
Fixes#9630
Groups information shall be now stored as part of the
credential data structure, this is a more idiomatic
way to support large LDAP groups.
Avoids the complication of setups where LDAP groups
can be in the range of 150+ which may lead to excess
HTTP header size > 8KiB, to reduce such an occurrence
we shall save the group information on the server as
part of the credential data structure.
Bonus change support multiple mapped policies, across
all types of users.
This PR is a continuation from #9586, now the
entire parsing logic is fully merged into
bucket metadata sub-system, simplify the
quota API further by reducing the remove
quota handler implementation.
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.
Additionally also fixes#9396, #9394
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.
This PR fixes the first stage of this
cleanup.
The `ioutil.NopCloser(reader)` was hiding nested hash readers.
We make it an `io.Closer` so it can be attached without wrapping
and allows for nesting, by merging the requests.
s3:HardwareInfo was removed recently. Users having that admin action
stored in the backend will have an issue starting the server.
To fix this, we need to avoid returning an error in Marshal/Unmarshal
when they encounter an invalid action and validate only in specific
location.
Currently the validation is done and in ParseConfig().
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}
This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
We should allow quorum errors to be send upwards
such that caller can retry while reading bucket
encryption/policy configs when server is starting
up, this allows distributed setups to load the
configuration properly.
Current code didn't facilitate this and would have
never loaded the actual configs during rolling,
server restarts.
In large setups this avoids unnecessary data transfer
across nodes and potential locks.
This PR also optimizes heal result channel, which should
be avoided for each queueHealTask as its expensive
to create/close channels for large number of objects.
This PR allows setting a "hard" or "fifo" quota
restriction at the bucket level. Buckets that
have reached the FIFO quota configured, will
automatically be cleaned up in FIFO manner until
bucket usage drops to configured quota.
If a bucket is configured with a "hard" quota
ceiling, all further writes are disallowed.
- elasticsearch client should rely on the SDK helpers
instead of pure HTTP calls.
- webhook shouldn't need to check for IsActive() for
all notifications, failure should be delayed.
- Remove DialHTTP as its never used properly
Fixes#9460
allow generating service accounts for temporary credentials
which have a designated parent, currently OpenID is not yet
supported.
added checks to ensure that service account cannot generate
further service accounts for itself, service accounts can
never be a parent to any credential.
this commit avoids lots of tiny allocations, repeated
channel creates which are performed when filtering
the incoming events, unescaping a key just for matching.
also remove deprecated code which is not needed
anymore, avoids unexpected data structure transformations
from the map to slice.
New value defaults to 100K events by default,
but users can tune this value upto any value
they seem necessary.
* increase the limit to maxint64 while validating
This PR also fixes issues when
deletePolicy, deleteUser is idempotent so can lead to
issues when client can prematurely timeout, so a retry
call error response should be ignored when call returns
http.StatusNotFound
Fixes#9347
Some tests take a long time on CI:
* `--- PASS: TestRWMutex (226.49s)`
* ` --- PASS: TestRWMutex (7.13s)`
Reduce the number of runs.
Before/after locally:
```
--- PASS: TestRWMutex (20.95s)
--- PASS: TestRWMutex (7.13s)
--- PASS: TestMutex (3.01s)
--- PASS: TestMutex (1.65s)
```
This PR fixes couple of behaviors with service accounts
- not need to have session token for service accounts
- service accounts can be generated by any user for themselves
implicitly, with a valid signature.
- policy input for AddNewServiceAccount API is not fully typed
allowing for validation before it is sent to the server.
- also bring in additional context for admin API errors if any
when replying back to client.
- deprecate GetServiceAccount API as we do not need to reply
back session tokens
- Introduced a function `FetchRegisteredTargets` which will return
a complete set of registered targets irrespective to their states,
if the `returnOnTargetError` flag is set to `False`
- Refactor NewTarget functions to return non-nil targets
- Refactor GetARNList() to return a complete list of configured targets
- Removes PerfInfo admin API as its not OBDInfo
- Keep the drive path without the metaBucket in OBD
global latency map.
- Remove all the unused code related to PerfInfo API
- Do not redefined global mib,gib constants use
humanize.MiByte and humanize.GiByte instead always
This PR adds context-based `k=v` splits based
on the sub-system which was obtained, if the
keys are not provided an error will be thrown
during parsing, if keys are provided with wrong
values an error will be thrown. Keys can now
have values which are of a much more complex
form such as `k="v=v"` or `k=" v = v"`
and other variations.
additionally, deprecate unnecessary postgres/mysql
configuration styles, support only
- connection_string for Postgres
- dsn_string for MySQL
All other parameters are removed.
also, bring in an additional policy to ensure that
force delete bucket is only allowed with the right
policy for the user, just DeleteBucketAction
policy action is not enough.
This PR also tries to simplify the approach taken in
object-locking implementation by preferential treatment
given towards full validation.
This in-turn has fixed couple of bugs related to
how policy should have been honored when ByPassGovernance
is provided.
Simplifies code a bit, but also duplicates code intentionally
for clarity due to complex nature of object locking
implementation.
This commit modifies csv parser, a fork of golang csv
parser to support a custom quote escape character.
The quote escape character is used to escape the quote
character when a csv field contains a quote character
as part of data.
Use the *credentials.Credentials implementation method *Get*
```
func (c *Credentials) Get() (Value, error) {
```
which also handles auto-refresh, this allows for chaining
of various implementations together if necessary or simply
initialize with credentials.NewStaticV4(access, secret, token)
Co-authored-by: Klaus Post <klauspost@gmail.com>
Too many deployments come up with an odd number
of hosts or drives, to facilitate even distribution
among those setups allow for odd and prime numbers
based packs.
- Implement a graph algorithm to test network bandwidth from every
node to every other node
- Saturate any network bandwidth adaptively, accounting for slow
and fast network capacity
- Implement parallel drive OBD tests
- Implement a paging mechanism for OBD test to provide periodic updates to client
- Implement Sys, Process, Host, Mem OBD Infos
NAS gateway creates non-multipart-uploads with mode 0666.
But multipart-uploads are created with a differing mode of 0644.
Both modes should be equal! Else it leads to files with different
permissions based on its file-size. This patch solves that by
using 0666 for both cases.
This is to improve responsiveness for all
admin API operations and allowing callers
to cancel any on-going admin operations,
if they happen to be waiting too long.
- avoid setting last heal activity when starting self-healing
This can be confusing to users thinking that the self healing
cycle was already performed.
- add info about the next background healing round
Allow downloading goroutine dump to help detect leaks
or overuse of goroutines.
Extensions are now type dependent.
Change `profiling` -> `profile` prefix, since that is what they are
not the abstract concept.
Change distributed locking to allow taking bulk locks
across objects, reduces usually 1000 calls to 1.
Also allows for situations where multiple clients sends
delete requests to objects with following names
```
{1,2,3,4,5}
```
```
{5,4,3,2,1}
```
will block and ensure that we do not fail the request
on each other.
AWS S3 doesn't enforce the URL in XMLNS, accordingly, removing the
URL in XMLNS for ObjectLegalHold.
This was found while testing https://github.com/minio/minio-go/pull/1226
This commit fixes typos in the displayed server info
w.r.t. the KMS and removes the update status.
For more information about why the update status
is removed see: PR #8943
This commit removes the `Update` functionality
from the admin API. While this is technically
a breaking change I think this will not cause
any harm because:
- The KMS admin API is not complete, yet.
At the moment only the status can be fetched.
- The `mc` integration hasn't been merged yet.
So no `mc` client could have used this API
in the past.
The `Update`/`Rewrap` status is not useful anymore.
It provided a way to migrate from one master key version
to another. However, KES does not support the concept of
key versions. Instead, key migration should be implemented
as migration from one master key to another.
Basically, the `Update` functionality has been implemented just
for Vault.
- pkg/bucket/encryption provides support for handling bucket
encryption configuration
- changes under cmd/ provide support for AES256 algorithm only
Co-Authored-By: Poorna <poornas@users.noreply.github.com>
Co-authored-by: Harshavardhana <harsha@minio.io>
The server info handler makes a http connection to other
nodes to check if they are up but does not load the custom
CAs in ~/.minio/certs/CAs.
This commit fix it.
Co-authored-by: Harshavardhana <harsha@minio.io>
For 'snapshot' type profiles, record a 'before' profile that can be used
as `go tool pprof -base=before ...` to compare before and after.
"Before" profiles are included in the zipped package.
[`runtime.MemProfileRate`](https://golang.org/pkg/runtime/#pkg-variables)
should not be updated while the application is running, so we set it at startup.
Co-authored-by: Harshavardhana <harsha@minio.io>
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation
Add further tuning parameters such as
- DialTimeout is now 15 seconds from 30 seconds
- KeepAliveTimeout is not 20 seconds, 5 seconds
more than default 15 seconds
- ResponseHeaderTimeout to 10 seconds
- ExpectContinueTimeout is reduced to 3 seconds
- DualStack is enabled by default remove setting
it to `true`
- Reduce IdleConnTimeout to 30 seconds from
1 minute to avoid idleConn build up
Fixes#8773
- Stop spawning store replay routines when testing the notification targets
- Properly honor the target.Close() to clean the resources used
Fixes#8707
Co-authored-by: Harshavardhana <harsha@minio.io>
In certain organizations policy claim names
can be not just 'policy' but also things like
'roles', the value of this field might also
be *string* or *[]string* support this as well
In this PR we are still not supporting multiple
policies per STS account which will require a
more comprehensive change.
Fixes scenario where zones are appropriately
handled, along with supporting overriding set
count. The new fix also ensures that we handle
the various setup types properly.
Update documentation to properly indicate the
behavior.
Fixes#8750
Co-authored-by: Nitish Tiwari <nitish@minio.io>
Currently when connections to vault fail, client
perpetually retries this leads to assumptions that
the server has issues and masks the problem.
Re-purpose *crypto.Error* type to send appropriate
errors back to the client.
In existing functionality we simply return a generic
error such as "MalformedPolicy" which indicates just
a generic string "invalid resource" which is not very
meaningful when there might be multiple types of errors
during policy parsing. This PR ensures that we send
these errors back to client to indicate the actual
error, brings in two concrete types such as
- iampolicy.Error
- policy.Error
Refer #8202
We had messy cyclical dependency problem with `mc`
due to dependencies in pkg/console, moved the pkg/console
to minio for more control and also to avoid any further
cyclical dependencies of `mc` clobbering up the
dependencies on server.
Fixes#8659
This PR fixes the issue where we might allow policy changes
for temporary credentials out of band, this situation allows
privilege escalation for those temporary credentials. We
should disallow any external actions on temporary creds
as a practice and we should clearly differentiate which
are static and which are temporary credentials.
Refer #8667
Admin data usage info API returns the following
(Only FS & XL, for now)
- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
Final update to all messages across sub-systems
after final review, the only change here is that
NATS now has TLS and TLSSkipVerify to be consistent
for all other notification targets.