Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.
isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
This commit fixes a DoS vulnerability in the
request authentication. The root cause is an 'unlimited'
read-into-RAM from the request body.
Since this read happens before the request authentication
is verified the vulnerability can be exploit without any
access privileges.
This commit limits the size of the request body to 3 MB.
This is about the same size as AWS. The limit seems to be
between 1.6 and 3.2 MB - depending on the AWS machine which
is handling the request.
This commit ensures that all tickers are stopped using defer ticker.Stop()
style. This will also fix one bug seen when a client starts to listen to
event notifications and that case will result a leak in tickers.
Current healing has an issue when disks are healed
even when they are offline without knowing if disk
is unformatted. This can lead to issues of pre-maturely
removing the disk from the set just because it was
temporarily offline.
There is an increasing number of `mc admin heal` usage
on a cron or regular basis. It is possible that if healing
code saw disk is offline it might prematurely take it down,
this causes availability issues.
Fixes#5826
Previously we used allow bucket policies without
`Version` field to be set to any given value, but
this behavior is inconsistent with AWS S3.
PR #5790 addressed this by making bucket policies
stricter and cleaner, but this causes a breaking
change causing any existing policies perhaps without
`Version` field or the field to be empty to fail upon
server startup.
This PR brings a code to migrate under these scenarios
as a one time operation.
- remove old bucket policy handling
- add new policy handling
- add new policy handling unit tests
This patch brings support to bucket policy to have more control not
limiting to anonymous. Bucket owner controls to allow/deny any rest
API.
For example server side encryption can be controlled by allowing
PUT/GET objects with encryptions including bucket owner.
This change disables the non-constant-time implementations of P-384 and P-521.
As a consequence a client using just these curves cannot connect to the server.
This should be no real issues because (all) clients at least support P-256.
Further this change also rejects ECDSA private keys of P-384 and P-521.
While non-constant-time implementations for the ECDHE exchange don't expose an
obvious vulnerability, using P-384 or P-521 keys for the ECDSA signature may allow
pratical timing attacks.
Fixes#5844
- getBucketLocation
- headBucket
- deleteBucket
Should return 404 or NoSuchBucket even for invalid bucket names, invalid
bucket names are only validated during MakeBucket operation
This is an effort to remove panic from the source.
Add a new call called CriticialIf, that calls LogIf and exits.
Replace panics with one of CriticalIf, FatalIf and a return of error.
Make sure to apply standard headers such as Content-Type,
Content-Disposition and Content-Language to the correct
GCS object attributes during object upload and copy operations.
Fixes: #5800
As we move to multiple config backends like local disk and etcd,
config file should not be read from the disk, instead the quick
package should load and verify for duplicate entries.
This change adds some security headers like Content-Security-Policy.
It does not set the HSTS header because Content-Security-Policy prevents
mixed HTTP and HTTPS content and the server does not use cookies.
However it is a header which could be added later on.
It also moves some header added by #5805 from a vendored file
to a generic handler.
Fixes ##5813
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.
Fixes#5797
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.
Fixes#5700
This change let the server return the S3 error for a key rotation
if the source key is not valid but equal to the destination key.
This change also fixes the SSE-C error messages since AWS returns error messages
ending with a '.'.
Fixes#5625
This change sets the storage class of the object-info if a storage
class was specified during PUT. The server now replies with the
storage class which was set during uploading the object in FS mode.
Fixes#5777
Default installations of cloned VMs on VMware like env
might experience serious problems with time skewing,
allow for a higher value instead of 3 seconds we are
moving to 15 minutes just like API level skew.
Access to internet and configuring ntp might not be possible,
in such situations providing atleast a 15 minute skew could
cater for majority of situations.
Since we do not re-use storageDisks after moving
the connections to object layer we should close them
appropriately otherwise we have a lot of connection
leaks and these can compound as the time goes by.
This PR also refactors the initialization code to
re-use storageDisks for given set of endpoints until
we have confirmed a valid reference format.
An issue was reproduced when there a no more inodes
available on an existing setup of 4 disks, now we
took one of the disks and reformatted it to relinquish
inodes. Now we attempt to bring the fresh disk back
into setup and perform a heal - at this point creating
new `format.json` fails on existing disks since they
do not have more inodes available.
At this point due to quorum failure, we end up deleting
existing `format.json` as well, this PR removes the code
which deletes existing `format.json` as there is no need
to delete them.
Previous PR 2afd196c83 fixed
the issue of quorum based listing for regular objects, this
PR continues on this idea by extending this support to
object directory prefixes as well.
Fixes#5733
Set GOPATH string to empty in build-constants.go
Check for both compile time GOPATH and default GOPATH
while trimming the file path in the stack trace.
Fixes#5741
This PR fixes two different variant of deadlocks in
notification.
- holding write lock on the bucket competing with read lock
- holding competing locks on read/save notification config
This PR adds disk based edge caching support for minio server.
Cache settings can be configured in config.json to take list of disk drives,
cache expiry in days and file patterns to exclude from cache or via environment
variables MINIO_CACHE_DRIVES, MINIO_CACHE_EXCLUDE and MINIO_CACHE_EXPIRY
Design assumes that Atime support is enabled and the list of cache drives is
fixed.
- Objects are cached on both GET and PUT/POST operations.
- Expiry is used as hint to evict older entries from cache, or if 80% of cache
capacity is filled.
- When object storage backend is down, GET, LIST and HEAD operations fetch
object seamlessly from cache.
Current Limitations
- Bucket policies are not cached, so anonymous operations are not supported in
offline mode.
- Objects are distributed using deterministic hashing among list of cache
drives specified.If one or more drives go offline, or cache drive
configuration is altered - performance could degrade to linear lookup.
Fixes#4026
This is a trival fix to support server level WORM. The feature comes
with an environment variable `MINIO_WORM`.
Usage:
```
$ export MINIO_WORM=on
$ minio server endpoint
```
Object deletion should not be possible if quorum is not
available. This PR updates deleteObject() to check for
quorum errors before proceeding with object deletion.
Fixes#5535
This commit adds the bucket delete and bucket policy functionalities
to the browser.
Part of rewriting the browser code to follow best practices and
guidelines of React (issues #5409 and #5410)
The backend code has been modified by @krishnasrinivas to prevent
issue #4498 from occuring. The relevant changes have been made to the
code according to the latest commit and the unit tests in the backend.
This commit also addresses issue #5449.
Migration regression got introduced in 9083bc152e
adding more unit tests to catch this scenario, we need to fix this by
re-writing the formats after the migration to 'V3'.
This bug only happens when a user is migrating directly from V1 to V3,
not from V1 to V2 and V2 to V3.
Added additional unit tests to cover these situations as well.
Fixes#5667
- Add head method for healthcheck endpoint. Some platforms/users
may use the HTTP Head method to check for health status.
- Add liveness and readiness probe examples in Kubernetes yaml
example docs. Note that readiness probe not added to StatefulSet
example due to https://github.com/kubernetes/kubernetes/issues/27114
With following changes
- Add SSE and refactor encryption API (#942) <Andreas Auernhammer>
- add copyObject test changing metadata and preserving etag (#944) <Harshavardhana>
- Add SSE-C tests for multipart, copy, get range operations (#941) <Harshavardhana>
- Removing conditional check for notificationInfoCh in api-notication (#940) <Matthew Magaldi>
- Honor prefix parameter in ListBucketPolicies API (#929) <kannappanr>
- test for empty objects uploaded with SSE-C headers (#927) <kannappanr>
- Encryption headers should also be set during initMultipart (#930) <Harshavardhana>
- Add support for Content-Language metadata header (#928) <kannappanr>
- Fix check for duplicate notification configuration entries (#917) <kannappanr>
- allow OS to cleanup sockets in TIME_WAIT (#925) <Harshavardhana>
- Sign V2: Fix signature calculation in virtual host style (#921) <A. Elleuch>
- bucket policy: Support json string in Principal field (#919) <A. Elleuch>
- Fix copyobject failure for empty files (#918) <kannappanr>
- Add new constructor NewWithOptions to SDK (#915) <poornas>
- Support redirect headers to sign again with new Host header. (#829) <Harshavardhana>
- Fail in PutObject if invalid user metadata is passed <Harshavadhana>
- PutObjectOptions Header: Don't include invalid header <Isaac Hess>
- increase max retry count to 10 (#913) <poornas>
- Add new regions for Paris and China west. (#905) <Harshavardhana>
- fix s3signer to use req.Host header (#899) <Bartłomiej Nogaś>
This PR adds readiness and liveness endpoints to probe Minio server
instance health. Endpoints can only be accessed without authentication
and the paths are /minio/health/live and /minio/health/ready for
liveness and readiness respectively.
The new healthcheck liveness endpoint is used for Docker healthcheck
now.
Fixes#5357Fixes#5514
In kubernetes statefulset like environments when secrets
are mounted to pods they have sub-directories, we should
ideally be only looking for regular files here and skip
all others.
Fix a compatibility issue with AWS S3 where to do key rotation
we need to replace an existing object's metadata. In such a
scenario "REPLACE" metadata directive is not necessary.
- Data from disk was being read after bitrot verification to return
data for GetObject. Strictly speaking this does not guarantee bitrot
protection, as disks may return bad data even temporarily.
- This fix reads data from disk, verifies data for bitrot and then
returns data to the client directly.
Current code didn't implement the logic to support
decrypting encrypted multiple parts, this PR fixes
by supporting copying encrypted multipart objects.
Currently we reply back `X-Minio-Internal` values
back to the client for an encrypted object, we should
filter these out and only reply AWS compatible headers.
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
Flags like `json, config-dir, quiet` are now honored even if they are
between minio and gateway in the cli, like, `minio --json gateway s3`.
Fixes#5403
Stable sort is needed when we are sorting based on two or more
distinct elements. When equal elements are indistinguishable,
such as with integers, or more generally, any data where the
entire element is the key like `PartNumber`, stability is not
an issue.
Refactor such that metadata and etag are
combined to a single argument `srcInfo`.
This is a precursor change for #5544 making
it easier for us to provide encryption/decryption
functions.
Delete & Multi Delete API should not try to remove the directory content.
The only permitted case is with zero size object with a trailing slash
in its name.
MaxIdleConns limits the total number of connections
kept in the pool for re-use. In addition, MaxIdleConnsPerHost
limits the number for a single host. Since minio gateways
usually connect to the same host, setting `MaxIdleConns = 100`
won't really have much of an impact since the idle connection
pool is limited to 2 anyway.
Now, with the pool set to a limit of 2, and when using
the client heavily from 2+ goroutines, the `http.Transport`
will open a connection, use it, then try to return it to
the idle-pool which often fails since there's a limit of 2.
So it's going to close the connection and new ones will be
opened on demand again, many of which get closed soon after
being used. Since those connections/sockets don't disappear
from the OS immediately, use `MaxIdleConnsPerHost = 100`
which fixes this problem.
Overwriting files is allowed, but since the introduction of
the object directory, we will aslo need to allow overwriting
an empty directory. Putting twice the same object directory
won't fail with 403 error anymore.
TestNewWebHookNotify wasn't passing in my local machine. The reason is
that the test expects the POST handler (as a webhook endpoint) is always
running on port 80, which is not always the case.
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
Since we do not encrypt directories we don't need to send
errors with encryption headers when the directory doesn't
have encryption metadata.
Continuation PR from 4ca10479b5
It can happen such that one of the disks that was down would
return 'errDiskNotFound' but the err is preserved due to
loop shadowing which leads to issues when healing the bucket.
This change adds an object size check such that the server does not
encrypt empty objects (typically folders) for SSE-C. The server still
returns SSE-C headers but the object is not encrypted since there is no
point to encrypt such objects.
Fixes#5493
Currently minio master requires 4 servers, we
have decided to run on a minimum of 2 servers
instead - fixes a regression from previous
releases where 3 server setups were supported.
This PR brings semver capabilities in our RPC layer to
ensure that we can upgrade the servers in rolling fashion
while keeping I/O in progress. This is only a framework change
the functionality remains the same as such and we do not
have any special API changes for now. But in future when
we bring in API changes we will be able to upgrade servers
without a downtime.
Additional change in this PR is to not abort when serverVersions
mismatch in a distributed cluster, instead wait for the quorum
treat the situation as if the server is down. This allows
for administrator to properly upgrade all the servers in the cluster.
Fixes#5393
in-memory caching cannot be cleanly implemented
without the access to GC which Go doesn't naturally
provide. At times we have seen that object caching
is more of an hindrance rather than a boon for
our use cases.
Removing it completely from our implementation
related to #5160 and #5182
This is a generic minimum value. The current reason is to support
Azure blob storage accounts name whose length is less than 5. 3 is the
minimum length for Azure.
Check if the storage class is set in an
non XL setup instead of relying on `globalEndpoints`
value. Also converge the checks for both SS
and RRS parity configuration.
This PR also removes redundant `tt.name` in all
test cases, since each testcase doesn't need to
be numbered explicitly they are numbered implicitly.
* Update the GetConfig admin API to use the latest version of
configuration, along with fixes to the corresponding RPCs.
* Remove mutex inside the configuration struct, and inside
notification struct.
* Use global config mutex where needed.
* Add `serverConfig.ConfigDiff()` that provides a more granular diff
of what is different between two configurations.
- Changes related to moving admin APIs
- admin APIs now have an endpoint under /minio/admin
- admin APIs are now versioned - a new API to server the version is
added at "GET /minio/admin/version" and all API operations have the
path prefix /minio/admin/v1/<operation>
- new service stop API added
- credentials change API is moved to /minio/admin/v1/config/credential
- credentials change API and configuration get/set API now require TLS
so that credentials are protected
- all API requests now receive JSON
- heal APIs are disabled as they will be changed substantially
- Heal API changes
Heal API is now provided at a single endpoint with the ability for a
client to start a heal sequence on all the data in the server, a
single bucket, or under a prefix within a bucket.
When a heal sequence is started, the server returns a unique token
that needs to be used for subsequent 'status' requests to fetch heal
results.
On each status request from the client, the server returns heal result
records that it has accumulated since the previous status request. The
server accumulates upto 1000 records and pauses healing further
objects until the client requests for status. If the client does not
request any further records for a long time, the server aborts the
heal sequence automatically.
A heal result record is returned for each entity healed on the server,
such as system metadata, object metadata, buckets and objects, and has
information about the before and after states on each disk.
A client may request to force restart a heal sequence - this causes
the running heal sequence to be aborted at the next safe spot and
starts a new heal sequence.
In current implementation we used as many dsync clients
as per number of endpoints(along with path) which is not
the expected implementation. The implementation of Dsync
was expected to be just for the endpoint Host alone such
that if you have 4 servers and each with 4 disks we need
to only have 4 dsync clients and 4 dsync servers. But
we currently had 8 clients, servers which in-fact is
unexpected and should be avoided.
This PR brings the implementation back to its original
intention. This issue was found #5160
This change is a simplification over existing
code since it is not required to have a separate
RPCClient structure instead keep authRPCClient can
do the same job.
There is no code which directly uses netRPCClient(),
keeping authRPCClient is better and simpler. This
simplication also allows for removal of multiple
levels of locking code per object.
Observed in #5160
This change adds the HighwayHash256 PRF as bitrot protection / detection
algorithm. Since HighwayHash256 requires a 256 bit we generate a random
key from the first 100 decimals of π - See nothing-up-my-sleeve-numbers.
This key is fixed forever and tied to the HighwayHash256 bitrot algorithm.
Fixes#5358
The problem was after the globalServiceDoneCh receives a
message, we cleanly stop the ticker as expected. But the
go-routine where the `select` loop is running is never
returned from. The stage at which point this may occur
i.e server is being restarted, doesn't seriously affect
servers usage. But any build up like this on server has
consequences as the new functionality would come in future.
With storage class support, the free and total space
reported in Minio XL startup banner should be based on
totalDisks - standardClassParityDisks, instead of totalDisks/2.
fixes#5416
This change replaces all imports of "crypto/sha256" with
"github.com/minio/sha256-simd". The sha256-simd package
is faster on ARM64 (NEON instructions) and can take advantage
of AVX-512 in certain scenarios.
Fixes#5374
Internally, triton-go, what manta minio is built on, changed it's internal
error handling. This means we no longer need to unwrap specific error types
This doesn't change any manta minio functionality - it just changes how errors are
handled internally and adds a wrapper for a 404 error
This change fixes an authentication bypass attack against the
minio Admin-API. Therefore the Admin-API rejects now all types of
requests except valid signature V2 and signature V4 requests - this
includes signature V2/V4 pre-signed requests.
Fixes#5411
This fix removes logrus package dependency and refactors the console
logging as the only logging mechanism by removing file logging support.
It rearranges the log message format and adds stack trace information
whenever trace information is not available in the error structure.
It also adds `--json` flag support for server logging.
When minio server is started with `--json` flag, all log messages are
displayed in json format, with no start-up and informational log
messages.
Fixes#5265#5220#5197
Under any concurrent removeObjects in progress
might have removed the parents of the same prefix
for which there is an ongoing putObject request.
An inconsistent situation may arise as explained
below even under sufficient locking.
PutObject is almost successful at the last stage when
a temporary file is renamed to its actual namespace
at `a/b/c/object1`. Concurrently a RemoveObject is
also in progress at the same prefix for an `a/b/c/object2`.
To create the object1 at location `a/b/c` PutObject has
to create all the parents recursively.
```
a/b/c - os.MkdirAll loops through has now created
'a/' and 'b/' about to create 'c/'
a/b/c/object2 - at this point 'c/' and 'object2'
are deleted about to delete b/
```
Now for os.MkdirAll loop the expected situation is
that top level parent 'a/b/' exists which it created
, such that it can create 'c/' - since removeObject
and putObject do not compete for lock due to holding
locks at different resources. removeObject proceeds
to delete parent 'b/' since 'c/' is not yet present,
once deleted 'os.MkdirAll' would receive an error as
syscall.ENOENT which would fail the putObject request.
This PR tries to address this issue by implementing
a safer/guarded approach where we would retry an operation
such as `os.MkdirAll` and `os.Rename` if both operations
observe syscall.ENOENT.
Fixes#5254
After the addition of Storage Class support, readQuorum
and writeQuorum are decided on a per object basis, instead
of deployment wide static quorums.
This PR updates madmin api to remove readQuorum/writeQuorum
and add Standard storage class and reduced redundancy storage
class parity as return values. Since these parity values are
used to decide the quorum for each object.
Fixes#5378
Since the server performs automatic clean-up of multipart uploads that
have not been resumed for more than a couple of weeks, it was decided
to remove functionality to heal multipart uploads.
If STANDARD storage class is set before starting up Minio server,
but x-amz-storage-class metadata field is not set in a PutObject
request, Minio server defaults to N/2 data and N/2 parity disks.
This PR changes the behaviour to use data and parity disks set in
STANDARD storage class, even if x-amz-storage-class metadata
field is not present in PutObject requests.
- Return error when the config JSON has duplicate keys (fixes#5286)
- Limit size of configuration file provided to 256KiB - this prevents
another form of DoS
Remove the requirement for IssuedAt claims from JWT
for now, since we do not currently have a way to provide
a leeway window for validating the claims. Expiry does
the same checks as IssuedAt with an expiry window.
We do not need it right now since we have clock skew check
in our RPC layer to handle this correctly.
rpc-common.go
```
func isRequestTimeAllowed(requestTime time.Time) bool {
// Check whether request time is within acceptable skew time.
utcNow := UTCNow()
return !(requestTime.Sub(utcNow) > rpcSkewTimeAllowed ||
utcNow.Sub(requestTime) > rpcSkewTimeAllowed)
}
```
Once the PR upstream is merged https://github.com/dgrijalva/jwt-go/pull/139
We can bring in support for leeway later.
Fixes#5237
x-amz-content-sha256 can be optional for any AWS signature v4
requests, make sure to skip sha256 calculation when payload
checksum is not set.
Here is the overall expected behavior
** Signed request **
- X-Amz-Content-Sha256 is set to 'empty' or some 'value' or its
not 'UNSIGNED-PAYLOAD'- use it to validate the incoming payload.
- X-Amz-Content-Sha256 is set to 'UNSIGNED-PAYLOAD' - skip checksum verification
- X-Amz-Content-Sha256 is not set we use emptySHA256
** Presigned request **
- X-Amz-Content-Sha256 is set to 'empty' or some 'value' or its
not 'UNSIGNED-PAYLOAD'- use it to validate the incoming payload
- X-Amz-Content-Sha256 is set to 'UNSIGNED-PAYLOAD' - skip checksum verification
- X-Amz-Content-Sha256 is not set we use 'UNSIGNED-PAYLOAD'
Fixes#5339
This PR updates the behaviour to print relevant error message
if storage class is set in config.json for gateway
This PR also fixes the case where storage class set via
environment variables is not parsed properly into config.json.
Save http trace to a file instead of displaying it onto the console.
the environment variable MINIO_HTTP_TRACE will be a filepath instead
of a boolean.
This to handle the scenario where both json and http tracing are
turned on. In that case, both http trace and json output are displayed
on the screen making the json not parsable. Loging this trace onto
a file helps us avoid that scenario.
Fixes#5263
Manta has the ability to allow users to authenticate with a
username other than the main account. We want to expose
this functionality to minio manta gateway.
This change adds support for password-protected private keys.
If the private key is encrypted the server tries to decrypt
the key with the password provided by the env variable
MINIO_CERT_PASSWD.
Fixes#5302
- Update startup banner to print storage class in capitals. This
makes it easier to identify different storage classes available.
- Update response metadata to not send STANDARD storage class.
This is in accordance with AWS S3 behaviour.
- Update minio-go library to bring in storage class related
changes. This is needed to make transparent translation of
storage class headers for Minio S3 Gateway.
Currently, browser access information is displayed without checking
if browser enabled flag is turned off in config.json. Fixing it to
hide the information if the flag is turned off.
Fixes#5312
This change replaces the non-constant time comparison of
request signatures with a constant time implementation. This
prevents a timing attack which can be used to learn a valid
signature for a request without knowing the secret key.
Fixes#5334
This commit takes the existing remove bucket functionality written by
brendanashworth, integrates it to the current UI with a dropdown for
each bucket, and fixes small issues that were present, like the dropdown
not disappearing after the user clicks on 'Delete' for certain buckets.
This feature only deletes a bucket that is empty (that has no objects).
Fixes#4166
- Add storage class metadata validation for request header
- Change storage class header values to be consistent with AWS S3
- Refactor internal method to take only the reqd argument
HealFile() does not process the case when an empty file is lost in
some disks. Since, Reedsolomon erasure doesn't handle restoring empty
data, HealFile will create empty files similarly to CreateFile().
This adds configurable data and parity options on a per object
basis. To use variable parity
- Users can set environment variables to cofigure variable
parity
- Then add header x-amz-storage-class to putobject requests
with relevant storage class values
Fixes#4997
- Use it to send the Content-MD5 header correctly encoded to S3
Gateway
- Fixes a bug in PutObject (including anonymous PutObject) and
PutObjectPart with S3 Gateway found when testing with Mint.