- Add head method for healthcheck endpoint. Some platforms/users
may use the HTTP Head method to check for health status.
- Add liveness and readiness probe examples in Kubernetes yaml
example docs. Note that readiness probe not added to StatefulSet
example due to https://github.com/kubernetes/kubernetes/issues/27114
With following changes
- Add SSE and refactor encryption API (#942) <Andreas Auernhammer>
- add copyObject test changing metadata and preserving etag (#944) <Harshavardhana>
- Add SSE-C tests for multipart, copy, get range operations (#941) <Harshavardhana>
- Removing conditional check for notificationInfoCh in api-notication (#940) <Matthew Magaldi>
- Honor prefix parameter in ListBucketPolicies API (#929) <kannappanr>
- test for empty objects uploaded with SSE-C headers (#927) <kannappanr>
- Encryption headers should also be set during initMultipart (#930) <Harshavardhana>
- Add support for Content-Language metadata header (#928) <kannappanr>
- Fix check for duplicate notification configuration entries (#917) <kannappanr>
- allow OS to cleanup sockets in TIME_WAIT (#925) <Harshavardhana>
- Sign V2: Fix signature calculation in virtual host style (#921) <A. Elleuch>
- bucket policy: Support json string in Principal field (#919) <A. Elleuch>
- Fix copyobject failure for empty files (#918) <kannappanr>
- Add new constructor NewWithOptions to SDK (#915) <poornas>
- Support redirect headers to sign again with new Host header. (#829) <Harshavardhana>
- Fail in PutObject if invalid user metadata is passed <Harshavadhana>
- PutObjectOptions Header: Don't include invalid header <Isaac Hess>
- increase max retry count to 10 (#913) <poornas>
- Add new regions for Paris and China west. (#905) <Harshavardhana>
- fix s3signer to use req.Host header (#899) <Bartłomiej Nogaś>
This PR adds readiness and liveness endpoints to probe Minio server
instance health. Endpoints can only be accessed without authentication
and the paths are /minio/health/live and /minio/health/ready for
liveness and readiness respectively.
The new healthcheck liveness endpoint is used for Docker healthcheck
now.
Fixes#5357Fixes#5514
In kubernetes statefulset like environments when secrets
are mounted to pods they have sub-directories, we should
ideally be only looking for regular files here and skip
all others.
Fix a compatibility issue with AWS S3 where to do key rotation
we need to replace an existing object's metadata. In such a
scenario "REPLACE" metadata directive is not necessary.
- Data from disk was being read after bitrot verification to return
data for GetObject. Strictly speaking this does not guarantee bitrot
protection, as disks may return bad data even temporarily.
- This fix reads data from disk, verifies data for bitrot and then
returns data to the client directly.
Current code didn't implement the logic to support
decrypting encrypted multiple parts, this PR fixes
by supporting copying encrypted multipart objects.
Currently we reply back `X-Minio-Internal` values
back to the client for an encrypted object, we should
filter these out and only reply AWS compatible headers.
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
Flags like `json, config-dir, quiet` are now honored even if they are
between minio and gateway in the cli, like, `minio --json gateway s3`.
Fixes#5403
Stable sort is needed when we are sorting based on two or more
distinct elements. When equal elements are indistinguishable,
such as with integers, or more generally, any data where the
entire element is the key like `PartNumber`, stability is not
an issue.
Refactor such that metadata and etag are
combined to a single argument `srcInfo`.
This is a precursor change for #5544 making
it easier for us to provide encryption/decryption
functions.
Delete & Multi Delete API should not try to remove the directory content.
The only permitted case is with zero size object with a trailing slash
in its name.
MaxIdleConns limits the total number of connections
kept in the pool for re-use. In addition, MaxIdleConnsPerHost
limits the number for a single host. Since minio gateways
usually connect to the same host, setting `MaxIdleConns = 100`
won't really have much of an impact since the idle connection
pool is limited to 2 anyway.
Now, with the pool set to a limit of 2, and when using
the client heavily from 2+ goroutines, the `http.Transport`
will open a connection, use it, then try to return it to
the idle-pool which often fails since there's a limit of 2.
So it's going to close the connection and new ones will be
opened on demand again, many of which get closed soon after
being used. Since those connections/sockets don't disappear
from the OS immediately, use `MaxIdleConnsPerHost = 100`
which fixes this problem.
Overwriting files is allowed, but since the introduction of
the object directory, we will aslo need to allow overwriting
an empty directory. Putting twice the same object directory
won't fail with 403 error anymore.
TestNewWebHookNotify wasn't passing in my local machine. The reason is
that the test expects the POST handler (as a webhook endpoint) is always
running on port 80, which is not always the case.
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
Since we do not encrypt directories we don't need to send
errors with encryption headers when the directory doesn't
have encryption metadata.
Continuation PR from 4ca10479b5
It can happen such that one of the disks that was down would
return 'errDiskNotFound' but the err is preserved due to
loop shadowing which leads to issues when healing the bucket.
This change adds an object size check such that the server does not
encrypt empty objects (typically folders) for SSE-C. The server still
returns SSE-C headers but the object is not encrypted since there is no
point to encrypt such objects.
Fixes#5493
Currently minio master requires 4 servers, we
have decided to run on a minimum of 2 servers
instead - fixes a regression from previous
releases where 3 server setups were supported.
This PR brings semver capabilities in our RPC layer to
ensure that we can upgrade the servers in rolling fashion
while keeping I/O in progress. This is only a framework change
the functionality remains the same as such and we do not
have any special API changes for now. But in future when
we bring in API changes we will be able to upgrade servers
without a downtime.
Additional change in this PR is to not abort when serverVersions
mismatch in a distributed cluster, instead wait for the quorum
treat the situation as if the server is down. This allows
for administrator to properly upgrade all the servers in the cluster.
Fixes#5393
in-memory caching cannot be cleanly implemented
without the access to GC which Go doesn't naturally
provide. At times we have seen that object caching
is more of an hindrance rather than a boon for
our use cases.
Removing it completely from our implementation
related to #5160 and #5182
This is a generic minimum value. The current reason is to support
Azure blob storage accounts name whose length is less than 5. 3 is the
minimum length for Azure.
Check if the storage class is set in an
non XL setup instead of relying on `globalEndpoints`
value. Also converge the checks for both SS
and RRS parity configuration.
This PR also removes redundant `tt.name` in all
test cases, since each testcase doesn't need to
be numbered explicitly they are numbered implicitly.
* Update the GetConfig admin API to use the latest version of
configuration, along with fixes to the corresponding RPCs.
* Remove mutex inside the configuration struct, and inside
notification struct.
* Use global config mutex where needed.
* Add `serverConfig.ConfigDiff()` that provides a more granular diff
of what is different between two configurations.
- Changes related to moving admin APIs
- admin APIs now have an endpoint under /minio/admin
- admin APIs are now versioned - a new API to server the version is
added at "GET /minio/admin/version" and all API operations have the
path prefix /minio/admin/v1/<operation>
- new service stop API added
- credentials change API is moved to /minio/admin/v1/config/credential
- credentials change API and configuration get/set API now require TLS
so that credentials are protected
- all API requests now receive JSON
- heal APIs are disabled as they will be changed substantially
- Heal API changes
Heal API is now provided at a single endpoint with the ability for a
client to start a heal sequence on all the data in the server, a
single bucket, or under a prefix within a bucket.
When a heal sequence is started, the server returns a unique token
that needs to be used for subsequent 'status' requests to fetch heal
results.
On each status request from the client, the server returns heal result
records that it has accumulated since the previous status request. The
server accumulates upto 1000 records and pauses healing further
objects until the client requests for status. If the client does not
request any further records for a long time, the server aborts the
heal sequence automatically.
A heal result record is returned for each entity healed on the server,
such as system metadata, object metadata, buckets and objects, and has
information about the before and after states on each disk.
A client may request to force restart a heal sequence - this causes
the running heal sequence to be aborted at the next safe spot and
starts a new heal sequence.
In current implementation we used as many dsync clients
as per number of endpoints(along with path) which is not
the expected implementation. The implementation of Dsync
was expected to be just for the endpoint Host alone such
that if you have 4 servers and each with 4 disks we need
to only have 4 dsync clients and 4 dsync servers. But
we currently had 8 clients, servers which in-fact is
unexpected and should be avoided.
This PR brings the implementation back to its original
intention. This issue was found #5160
This change is a simplification over existing
code since it is not required to have a separate
RPCClient structure instead keep authRPCClient can
do the same job.
There is no code which directly uses netRPCClient(),
keeping authRPCClient is better and simpler. This
simplication also allows for removal of multiple
levels of locking code per object.
Observed in #5160
This change adds the HighwayHash256 PRF as bitrot protection / detection
algorithm. Since HighwayHash256 requires a 256 bit we generate a random
key from the first 100 decimals of π - See nothing-up-my-sleeve-numbers.
This key is fixed forever and tied to the HighwayHash256 bitrot algorithm.
Fixes#5358
The problem was after the globalServiceDoneCh receives a
message, we cleanly stop the ticker as expected. But the
go-routine where the `select` loop is running is never
returned from. The stage at which point this may occur
i.e server is being restarted, doesn't seriously affect
servers usage. But any build up like this on server has
consequences as the new functionality would come in future.
With storage class support, the free and total space
reported in Minio XL startup banner should be based on
totalDisks - standardClassParityDisks, instead of totalDisks/2.
fixes#5416
This change replaces all imports of "crypto/sha256" with
"github.com/minio/sha256-simd". The sha256-simd package
is faster on ARM64 (NEON instructions) and can take advantage
of AVX-512 in certain scenarios.
Fixes#5374
Internally, triton-go, what manta minio is built on, changed it's internal
error handling. This means we no longer need to unwrap specific error types
This doesn't change any manta minio functionality - it just changes how errors are
handled internally and adds a wrapper for a 404 error
This change fixes an authentication bypass attack against the
minio Admin-API. Therefore the Admin-API rejects now all types of
requests except valid signature V2 and signature V4 requests - this
includes signature V2/V4 pre-signed requests.
Fixes#5411
This fix removes logrus package dependency and refactors the console
logging as the only logging mechanism by removing file logging support.
It rearranges the log message format and adds stack trace information
whenever trace information is not available in the error structure.
It also adds `--json` flag support for server logging.
When minio server is started with `--json` flag, all log messages are
displayed in json format, with no start-up and informational log
messages.
Fixes#5265#5220#5197
Under any concurrent removeObjects in progress
might have removed the parents of the same prefix
for which there is an ongoing putObject request.
An inconsistent situation may arise as explained
below even under sufficient locking.
PutObject is almost successful at the last stage when
a temporary file is renamed to its actual namespace
at `a/b/c/object1`. Concurrently a RemoveObject is
also in progress at the same prefix for an `a/b/c/object2`.
To create the object1 at location `a/b/c` PutObject has
to create all the parents recursively.
```
a/b/c - os.MkdirAll loops through has now created
'a/' and 'b/' about to create 'c/'
a/b/c/object2 - at this point 'c/' and 'object2'
are deleted about to delete b/
```
Now for os.MkdirAll loop the expected situation is
that top level parent 'a/b/' exists which it created
, such that it can create 'c/' - since removeObject
and putObject do not compete for lock due to holding
locks at different resources. removeObject proceeds
to delete parent 'b/' since 'c/' is not yet present,
once deleted 'os.MkdirAll' would receive an error as
syscall.ENOENT which would fail the putObject request.
This PR tries to address this issue by implementing
a safer/guarded approach where we would retry an operation
such as `os.MkdirAll` and `os.Rename` if both operations
observe syscall.ENOENT.
Fixes#5254
After the addition of Storage Class support, readQuorum
and writeQuorum are decided on a per object basis, instead
of deployment wide static quorums.
This PR updates madmin api to remove readQuorum/writeQuorum
and add Standard storage class and reduced redundancy storage
class parity as return values. Since these parity values are
used to decide the quorum for each object.
Fixes#5378
Since the server performs automatic clean-up of multipart uploads that
have not been resumed for more than a couple of weeks, it was decided
to remove functionality to heal multipart uploads.
If STANDARD storage class is set before starting up Minio server,
but x-amz-storage-class metadata field is not set in a PutObject
request, Minio server defaults to N/2 data and N/2 parity disks.
This PR changes the behaviour to use data and parity disks set in
STANDARD storage class, even if x-amz-storage-class metadata
field is not present in PutObject requests.
- Return error when the config JSON has duplicate keys (fixes#5286)
- Limit size of configuration file provided to 256KiB - this prevents
another form of DoS
Remove the requirement for IssuedAt claims from JWT
for now, since we do not currently have a way to provide
a leeway window for validating the claims. Expiry does
the same checks as IssuedAt with an expiry window.
We do not need it right now since we have clock skew check
in our RPC layer to handle this correctly.
rpc-common.go
```
func isRequestTimeAllowed(requestTime time.Time) bool {
// Check whether request time is within acceptable skew time.
utcNow := UTCNow()
return !(requestTime.Sub(utcNow) > rpcSkewTimeAllowed ||
utcNow.Sub(requestTime) > rpcSkewTimeAllowed)
}
```
Once the PR upstream is merged https://github.com/dgrijalva/jwt-go/pull/139
We can bring in support for leeway later.
Fixes#5237
x-amz-content-sha256 can be optional for any AWS signature v4
requests, make sure to skip sha256 calculation when payload
checksum is not set.
Here is the overall expected behavior
** Signed request **
- X-Amz-Content-Sha256 is set to 'empty' or some 'value' or its
not 'UNSIGNED-PAYLOAD'- use it to validate the incoming payload.
- X-Amz-Content-Sha256 is set to 'UNSIGNED-PAYLOAD' - skip checksum verification
- X-Amz-Content-Sha256 is not set we use emptySHA256
** Presigned request **
- X-Amz-Content-Sha256 is set to 'empty' or some 'value' or its
not 'UNSIGNED-PAYLOAD'- use it to validate the incoming payload
- X-Amz-Content-Sha256 is set to 'UNSIGNED-PAYLOAD' - skip checksum verification
- X-Amz-Content-Sha256 is not set we use 'UNSIGNED-PAYLOAD'
Fixes#5339
This PR updates the behaviour to print relevant error message
if storage class is set in config.json for gateway
This PR also fixes the case where storage class set via
environment variables is not parsed properly into config.json.
Save http trace to a file instead of displaying it onto the console.
the environment variable MINIO_HTTP_TRACE will be a filepath instead
of a boolean.
This to handle the scenario where both json and http tracing are
turned on. In that case, both http trace and json output are displayed
on the screen making the json not parsable. Loging this trace onto
a file helps us avoid that scenario.
Fixes#5263
Manta has the ability to allow users to authenticate with a
username other than the main account. We want to expose
this functionality to minio manta gateway.
This change adds support for password-protected private keys.
If the private key is encrypted the server tries to decrypt
the key with the password provided by the env variable
MINIO_CERT_PASSWD.
Fixes#5302
- Update startup banner to print storage class in capitals. This
makes it easier to identify different storage classes available.
- Update response metadata to not send STANDARD storage class.
This is in accordance with AWS S3 behaviour.
- Update minio-go library to bring in storage class related
changes. This is needed to make transparent translation of
storage class headers for Minio S3 Gateway.
Currently, browser access information is displayed without checking
if browser enabled flag is turned off in config.json. Fixing it to
hide the information if the flag is turned off.
Fixes#5312
This change replaces the non-constant time comparison of
request signatures with a constant time implementation. This
prevents a timing attack which can be used to learn a valid
signature for a request without knowing the secret key.
Fixes#5334
This commit takes the existing remove bucket functionality written by
brendanashworth, integrates it to the current UI with a dropdown for
each bucket, and fixes small issues that were present, like the dropdown
not disappearing after the user clicks on 'Delete' for certain buckets.
This feature only deletes a bucket that is empty (that has no objects).
Fixes#4166
- Add storage class metadata validation for request header
- Change storage class header values to be consistent with AWS S3
- Refactor internal method to take only the reqd argument
HealFile() does not process the case when an empty file is lost in
some disks. Since, Reedsolomon erasure doesn't handle restoring empty
data, HealFile will create empty files similarly to CreateFile().
This adds configurable data and parity options on a per object
basis. To use variable parity
- Users can set environment variables to cofigure variable
parity
- Then add header x-amz-storage-class to putobject requests
with relevant storage class values
Fixes#4997
- Use it to send the Content-MD5 header correctly encoded to S3
Gateway
- Fixes a bug in PutObject (including anonymous PutObject) and
PutObjectPart with S3 Gateway found when testing with Mint.
Manta is an Object Storage by [Joyent](https://www.joyent.com/)
This PR adds initial support for Manta. It is intended as non-production
ready so that feedback can be obtained.
This PR allows 'minio update' to not only shows update banner
but also allows for in-place upgrades.
Updates are done safely by validating the downloaded
sha256 of the binary.
Fixes#4781
This PR handles following situations
- secure endpoints provided, server should fail to start
if TLS is not configured
- insecure endpoints provided, server starts ignoring
if TLS is configured or not.
Fixes#5251
- Adds a metadata argument to the CopyObjectPart API to facilitate
implementing encryption for copying APIs too.
- Update vendored minio-go - this version implements the
CopyObjectPart client API for use with the S3 gateway.
Fixes#4885
This check incorrectly rejects most valid filenames. The only filenames Sia
forbids are leading forward slashes and path traversal characters, but it's
better to simply allow Sia to reject invalid names on its own rather than try
to anticipate errors from Sia:
https://github.com/NebulousLabs/Sia/blob/master/doc/api/Renter.md#path-parameters-4
The problem in existing code was the following line
```
start := int(keyCrc%uint32(cardinality)) | 1
```
A given a value of N cardinality the ending result
because of the the bitwise '|' would lead to always
higher affinity to odd sequences.
As can be seen from the test cases that this can
lead to many objects being allocated the same set
of disks or atleast the first disk is an odd disk
always. This introduces a performance problem
for majority of the objects under concurrent load.
Remove `| 1` to provide a more cleaner distribution
and the new code will be.
```
start := int(keyCrc % uint32(cardinality))
```
Thanks to Krishna Srinivas for pointing out the bitwise
situation here.
This change introduces following simplified steps to follow
during config migration.
```
// Steps to move from version N to version N+1
// 1. Add new struct serverConfigVN+1 in config-versions.go
// 2. Set configCurrentVersion to "N+1"
// 3. Set serverConfigCurrent to serverConfigVN+1
// 4. Add new migration function (ex. func migrateVNToVN+1()) in config-migrate.go
// 5. Call migrateVNToVN+1() from migrateConfig() in config-migrate.go
// 6. Make changes in config-current_test.go for any test change
```
Current implementation we faked the makeBucket operations
to allow for s3 clients to behave properly. But instead
we can create a placeholder zero byte file instead, which
is a hexadecimal representation of the bucket name itself.
The Sia gateway had a bug with uploading that prevented the user's uploads
from reaching the Sia backend. The PutObject function called fsRemoveFile at
the end of the function, which didn't give the Sia backend enough time to
upload the file to the Sia network.
This adds a goroutine that watches the file upload progress and doesn't delete
the file until the upload reaches 100% complete.
Note that this solution has the limitation where if the minio process dies in
the middle of upload, it will leave orphaned files in the SIA_TEMP directory
that the user will need to remove manually.
This PR changes the behavior of DecryptRequest.
Instead of returning `object-tampered` if the client provided
key is wrong DecryptRequest will return `access-denied`.
This is AWS S3 behavior.
Fixes#5202
Apache Spark sends getObject requests with trailing "/".
This PR updates the getObjectInfo to stat for files
even if they are sent with trailing "/".
Fixes#2965
Previously ListenBucketNotificationHandler could deadlock with
PutObjectHandler's eventNotify call when a client closes its
connection. This change removes the cyclic dependency between the
channel and map of ARN to channels by using a separate done channel to
signal that the client has quit.
This change brings public data-types such that
we can ask projects to implement gateway projects
externally than maintaining in our repo.
All publicly exported structs are maintained in object-api-datatypes.go
completePart --> CompletePart
uploadMetadata --> MultipartInfo
All other exported errors are at object-api-errors.go
S3 spec requires that MethodNotAllowed error be return if object name is part
of the URL.
Fix postpolicy related unit tests to not set object name as part of target URL.
Fixes#5141
On windows having a preceding "/" will cause problems, if the
command line already has C:/<export-folder/ in it. Final resulting
path on windows might become C:/C:/ this will cause problems
of starting minio server properly in distributed mode on windows.
As a special case make sure to trim off the separator.
NOTE: It is also perfectly fine for windows users to have a path
without C:/ since at that point we treat it as relative path
and obtain the full filesystem path as well. Providing C:/
style is necessary to provide paths other than C:/,
such as F:/, D:/ etc.
Another additional benefit here is that this style also
supports providing UNC paths as well.
Fixes#5136
This chnage replaces the current SSE-C key derivation scheme. The 'old'
scheme derives an unique object encryption key from the client provided key.
This key derivation was not invertible. That means that a client cannot change
its key without changing the object encryption key.
AWS S3 allows users to update there SSE-C keys by executing a SSE-C COPY with
source == destination. AWS probably updates just the metadata (which is a very
cheap operation). The old key derivation scheme would require a complete copy
of the object because the minio server would not be able to derive the same
object encryption key from a different client provided key (without breaking
the crypto. hash function).
This change makes the key derivation invertible.
This change adds server-side-encryption support for HEAD, GET and PUT
operations. This PR only addresses single-part PUTs and GETs without
HTTP ranges.
Further this change adds the concept of reserved object metadata which is required
to make encrypted objects tamper-proof and provide API compatibility to AWS S3.
This PR adds the following reserved metadata entries:
- X-Minio-Internal-Server-Side-Encryption-Iv ('guarantees' tamper-proof property)
- X-Minio-Internal-Server-Side-Encryption-Kdf (makes Key-MAC computation negotiable in future)
- X-Minio-Internal-Server-Side-Encryption-Key-Mac (provides AWS S3 API compatibility)
The prefix `X-Minio_Internal` specifies an internal metadata entry which must not
send to clients. All client requests containing a metadata key starting with `X-Minio-Internal`
must also rejected. This is implemented by a generic-handler.
This PR implements SSE-C separated from client-side-encryption (CSE). This cannot decrypt
server-side-encrypted objects on the client-side. However, clients can encrypted the same object
with CSE and SSE-C.
This PR does not address:
- SSE-C Copy and Copy part
- SSE-C GET with HTTP ranges
- SSE-C multipart PUT
- SSE-C Gateway
Each point must be addressed in a separate PR.
Added to vendor dir:
- x/crypto/chacha20poly1305
- x/crypto/poly1305
- github.com/minio/sio
It is possible that x-amz-content-sha256 is set through
the query params in case of presigned PUT calls, make sure
that we validate the incoming x-amz-content-sha256 properly.
Current code simply just allows this without honoring the
set x-amz-content-sha256, fix it.
Previously ID/ETag from backend service is used as is which causes
failure on s3cmd like tools where those tools use ETag as checksum to
validate data. This is fixed by prepending "-1".
Refer minio/mint#193minio/mint#201
When MINIO_TRACE_DIR is provided, create a new log file and store all
HTTP requests + responses data, body are excluded to reduce memory
consumption. MINIO_HTTP_TRACE=1 enables logging. Use non mem
consuming http req/resp recorders, the maximum is about 32k per request.
This logs to STDOUT, body logging is disabled for PutObject PutObjectPart
GetObject.
Verify() was being called by caller after the data
has been successfully read after io.EOF. This disconnection
opens a race under concurrent access to such an object.
Verification is not necessary outside of Read() call,
we can simply just do checksum verification right inside
Read() call at io.EOF.
This approach simplifies the usage.
In some cases, Cache manager returns ErrCacheFull error when creating a
new cache buffer but the code still sends object data to nil cache buffer data.
Dont print the error errFileNotFound, as it is expected that concurrent
complete-multipart-uploads or abort-multipart-uploads would have deleted
the file, and the file may not be found
Fixes: https://github.com/minio/minio/issues/5056
Every so often we get requirements for creating
directories/prefixes and we end up rejecting
such requirements. This PR implements this and
allows empty directories without any new file
addition to backend.
Existing lower APIs themselves are leveraged to provide
this behavior. Only FS backend supports this for
the time being as desired.
s3cmd cli fails when trying to upload a file to azure gateway.
Previous fixes in azure to handle client side encryption alone
did not completely address the problem.
We need to possibilly convert all the x-amz-meta-<name>
, i.e specifically <name> should be converted into a
C# identifier as mentioned in the docs for `put-blob`.
https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob
```
s3cmd put README.md s3://myanis/
upload: 'README.md' -> 's3://myanis/README.md' [1 of 1]
4598 of 4598 100% in 0s 47.24 kB/s done
upload: 'README.md' -> 's3://myanis/README.md' [1 of 1]
4598 of 4598 100% in 0s 50.47 kB/s done
ERROR: S3 error: 400 (InvalidArgument): Your metadata headers are not supported.
```
There is a separate issue with s3cmd after this fix is applied where
the ETag is wronly validated https://github.com/s3tools/s3cmd/issues/880
But that is an upstream s3cmd problem which wrongly interprets ETag
to be md5sum of the content that was uploaded.
This PR addresses a long standing dependency on
`gopkg.in/check.v1` project used for our tests.
All tests are re-written to use the go default
testing framework instead.
There was no reason for us to use an external
package where Go tools are sufficient for this.
This is done to avoid repeated declaration of not-implemented
functions for each gateway. It also avoids a possible bug in go
https://github.com/golang/go/issues/18468 which is triggered on
our multiple PRs already.
- Add release-time conversion helpers
- Split GetCurrentReleaseTime() into two simpler functions.
- Avoid appending strings when assembling user-agent string.
- Reorder release info URLs to check the newer URLs earlier.
- Remove trivial low-level functions created solely for the purpose of
writing tests.
- Remove some unnecessary tests.
Amazon S3 API expects all incoming stream has a content-length
set it was superflous for us to support object layer which supports
unknown sized stream as well, this PR removes such requirements
and explicitly error out if input stream is less than zero.
* Enable ListMultipartUploads and ListObjectParts for FS.
Previously we had disabled ListMultipartUploads and ListObjectParts
to see if any clients break. Docker registry broke. This patch
enables ListMultipartUploads and ListObjectParts, however
ListMultipartUploads with prefix based listing is not
supported (which is not used by docker registry anyway).
i.e ListMultipartUploads will need exact object name.
Gateway implementation of ListObjectsV1 does not validate maxKeys range.
Raise an InvalidArgument when maxKeys is negative so that ListObjects
call is compatible with S3 on all gateways.
Gateway interface implementations of GetBucketInfo() under
azure and s3 gateway did not perform any bucketname input
validation resulting in incorrect responses when the tests
are expecting InvalidBucketName.
Fixes#4983
When running `make test` in docker, two test cases cause hanging.
This Patch fixes the problem by removing those test cases.
Thanks to @ws141 for identifying the problem.
The reedsolomon library now avoids allocations during reconstruction.
This change exploits that to reduce memory allocs and GC preasure during
healing and reading.
Previously we were wrongly adding `?` as part
of the resource name, add a test case to check
if this is handled properly.
Thanks to @kannappanr for reproducing this.
Without this change presigned URL generated with following
command would fail with signature mismatch.
```
aws s3 presign s3://testbucket/functional-tests.sh
```
It can happen that an incoming PutObject() request might
have inputs of following form eg:-
- bucketName is 'testbucket'
- objectName is '/'
bucketName exists and was previously created but there
are no other objects in this bucket. In a situation like
this parentDirIsObject() goes into an infinite loop.
Verifying that if '/' is an object fails on both backends
but the resulting `path.Dir('/')` returns `'/'` this causes
the closure to loop onto itself.
Fixes#4940
This change removes the ReadFileWithVerify function from the
StorageAPI. The ReadFile was basically a redirection to ReadFileWithVerify.
This change removes the redirection and moves the logic of
ReadFileWithVerify directly into ReadFile.
This removes a lot of unnecessary code in all StorageAPI implementations.
Fixes#4946
* review: fix doc and typos
Previously init multipart upload stores metadata of an object which is
used for complete multipart. This patch makes azure gateway to store
metadata information of init multipart object in azure in the name of
'minio.sys.tmp/multipart/v1/<UPLOAD-ID>/meta.json' and uses this
information on complete multipart.
This change refactor the ObjectLayer PutObject and PutObjectPart
functions. Instead of passing an io.Reader and a size to PUT operations
ObejectLayer expects an HashReader.
A HashReader verifies the MD5 sum (and SHA256 sum if required) of the object.
This change updates all all PutObject(Part) calls and removes unnecessary code
in all ObjectLayer implementations.
Fixes#4923
This is an improvement upon existing implementation
by avoiding transfer of access and secret keys over
the network. This change only exchanges JWT tokens
generated by an rpc client. Even if the JWT can be
traced over the network on a non-TLS connection, this
change makes sure that we never really expose the
secret key over the network.
Previously minio gateway returns invalid bucket name error for invalid
meta data. This is fixed by returning BadRequest with 'Unsupported
metadata' in response.
Fixes#4891
When servers are started simultaneously across multiple
nodes or simulating a local setup, it can happen such
that one of the servers in setup reaches a following
situation where it observes
- Some servers are formatted
- Some servers are unformatted
- Some servers are offline
Current state machine doesn't handle this correctly, to fix
this situation where we have unformatted, formatted and
disks offline we do not decisively know the course of
action. So we wait for the offline disks to change their state.
Once the offline disks change their state to either one of these
states we can decisively move forward.
- nil (formatted disk)
- errUnformattedDisk
- Or any other error such as errCorruptedDisk.
Fixes#4903
The default timeout of 30secs is not enough for high latency
environments, change these values to use 15 minutes instead.
With 30secs I/O timeouts seem to be quite common, this leads
to pretty much most SDKs and clients reconnect. This in-turn
causes significant performance problems. On a low latency
interconnect this can be quite challenging to transfer large
amounts of data. Setting this value to 15minutes covers
pretty much all known cases.
This PR was tested with `wondershaper <NIC> 20000 20000` by
limiting the network bandwidth to 20Mbit/sec. Default timeout
caused a significant amount of I/O timeouts, leading to
constant retires from the client. This seems to be more common
with tools like rclone, restic which have high concurrency set
by default. Once the value was fixed to 15minutes i/o timeouts
stopped and client could steadily upload data to the server
even while saturating the network.
Fixes#4670
Previously if any multipart part size > 100MiB is uploaded, azure
gateway returns error.
This patch fixes the issue by creating sub parts sizing each 100MiB of
given multipart part. On complete multipart, it fetches all uploaded
azure block ids for each parts and performs completion.
Fixes#4868
- Region handling can now use region endpoints directly.
- All uploads are streaming no more large buffer needed.
- Major API overhaul for CopyObject(dst, src)
- Fixes bugs present in existing code for copying
- metadata replace directive CopyObject
- PutObjectPart doesn't require md5Sum and sha256
All `net/rpc` requests go to `/minio`, so the existing
generic handler for reserved bucket check would essentially
erroneously send errors leading to distributed setups to
wait infinitely.
For `net/rpc` requests alone we should skip this check and
allow resource bucket names to be from `/minio` .
Current code was just using io.ReadAll() on an fd()
which might have moved underneath due to a concurrent
read operation. Subsequent read will result in EOF
We should always seek back and read again. pread()
is allowed on all platforms use io.SectionReader to
read from the beginning of the file.
Fixes#4842
Bcrypt is not neccessary and not used properly. This change
replace the whole bcrypt hash computation through a constant time
compare and removes bcrypt from the code base.
Fixes#4813
If a TopicConfiguration element or CloudFunction element is found in
configuration submitted to PutBucketNotification API, an BadRequest
error is returned.
S3 only allows http headers with a size of 8 KB and user-defined metadata
with a size of 2 KB. This change adds a new API error and returns this
error to clients which sends to large http requests.
Fixes#4634
We don't need to typecast identifiers from
their base to type to same type again. This
is not a bug and compiler is fine to skip
it but it is better to avoid if not needed.
This change provides new implementations of the XL backend operations:
- create file
- read file
- heal file
Further this change adds table based tests for all three operations.
This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes#4696Fixes#4649Fixes#4359
Wait for remote hosts to resolve instead of failing on first host
resolution error, when running in Kubernetes or Docker environment.
Note that
- Waiting is based on exponential back-off mechanism
- If run as a binary, server fails if remote host is not resolvable
This is needed because in orchestration platforms like Kubernetes, remote
hosts are started sequentially and all the hosts are not up initially,
though they are expected to come up in a short time frame
It is difficult to identify a cap on the waiting time due to
non-deterministic nature of infrastructure platforms, so the server waits
infinitely for the hosts to come up, while logging the error messages to
the console.
Fixes: https://github.com/minio/minio/issues/4669
Since go1.8 os.RemoveAll and os.MkdirAll both support long
path names i.e UNC path on windows. The code we are carrying
was directly borrowed from `pkg/os` package and doesn't need
to be in our repo anymore. As a side affect this also
addresses our codecoverage issue.
Refer #4658
* Prevent unnecessary verification of parity blocks while reading erasure
coded file.
* Update klauspost/reedsolomon and just only reconstruct data blocks while
reading (prevent unnecessary parity block reconstruction)
* Remove Verification of (all) reconstructed Data and Parity blocks since
in our case we are protected by bit rot protection. And even if the
verification would fail (essentially impossible) there is no way to
definitively say whether the data is still correct or not, so this call
make no sense for our use case.
Implement an offline mode for remote storage to cache the
offline status of a node in order to prevent network calls
that are bound to fail. After a time interval an attempt
will be made to restore the connection and mark the node
as online if successful.
Fixes#4183
It is possible at times due to a typo when distributed mode was intended
a user might end up starting standalone erasure mode causing confusion.
Add code to check this based on some standard heuristic guess work and
report an error to the user.
Fixes#4686
Under the call flow
```
Readdir
+
|
|
| path-entry
|
|
v
StatDir
```
Existing code was written in a manner where say
a bucket/top-level directory was indeed deleted
between Readdir() and before StatDir() we would
ignore certain errors. This is not a plausible
situation and might not happen in almost all
practical cases. We do not have to look for
or interpret these errors returned by StatDir()
instead we can just collect the successful
values and return back to the client. We do not
need to pre-maturely decide on bucket access
we just let filesystem decide subsequently for
real I/O operations.
Refer #4658
This is in preparation for updated admin heal API.
* Improve case analysis of healFormatXL() - fixes a case where disks
could have unhandled errors.
* Simplify healFormatXLFreshDisks() and healFormatXLCorruptedDisks()
to share more code and handle fewer cases for improved simplicity
and reduced code repetition.
* Fix test cases.
This commit changes posix's deleteFile() to not upstream errors from
removing parent directories. This fixes a race condition.
The race condition occurs when multiple deleteFile()s are called on the
same parent directory, but different child files. Because deleteFile()
recursively removes parent directories if they are empty, but
deleteFile() errors if the selected deletePath does not exist, there was
an opportunity for a race condition. The two processes would remove the
child directories successfully, then depend on the parent directory
still existing. In some cases this is an invalid assumption, because
other processes can remove the parent directory beforehand. This commit
changes deleteFile() to not upstream an error if one occurs, because the
only required error should be from the immediate deletePath, not from a
parent path.
In the specific bug report, multiple CompleteMultipartUpload requests
would launch multiple deleteFile() requests. Because they chain up on
parent directories, ultimately at the end, there would be multiple
remove files for the ultimate parent directory,
.minio.sys/multipart/{bucket}. Because only one will succeed and one
will fail, an error would be upstreamed saying that the file does not
exist, and the CompleteMultipartUpload code interpreted this as
NoSuchKey, or that the object/part id doesn't exist. This was faulty
behavior and is now fixed.
The added test fails before this change and passes after this change.
Fixes: https://github.com/minio/minio/issues/4727
This commit adds a new test for isDirEmpty (for code coverage) and
changes around the error conditional. Previously, there was a `return
nil` statement that would only be triggered under a race condition and
would trip up our test coverage for no real reason. With this new error
conditional, there's no awkward 'else'-esque condition, which means test
coverage will not change between runs for no reason in this specific
test. It's also a cleaner read.
This commit makes fsDeleteFile() simply call deleteFile() after calling
the relevant path length checking functions. This DRYs the code base.
This commit removes the Stat() call from deleteFile(). This improves
performance and removes any possibility of a race condition.
This additionally adds tests and a benchmark for said function. The
results aren't very consistent, although I'd expect this commit to make
it faster.
This commit fixes a potential security issue, whereby a full-access
token to the server would be available in the GET URL of a download
request. This fixes that issue by introducing short-expiry tokens, which
are only valid for one minute, and are regenerated for every download
request.
This commit specifically introduces the short-lived tokens, adds tests
for the tokens, adds an RPC call for generating a token given a
full-access token, updates the browser to use the new tokens for
requests where the token is passed as a GET parameter, and adds some
tests with the new temporary tokens.
Refs: https://github.com/minio/minio/pull/4673
This PR fixes the issue of cleaning up in-memory state
properly. Without this PR we can lead to security
situations where new bucket would inherit wrong
permissions on bucket and expose objects erroneously.
Fixes#4714
* Refactor HTTP server to address bugs
* Remove unnecessary goroutine to start multiple TCP listeners.
* HTTP server waits for shutdown to maximum of Server.ShutdownTimeout
than per serverShutdownPoll.
* Handles new connection errors properly.
* Handles read and write timeout properly.
* Handles error on start of HTTP server properly by exiting minio
process.
Fixes#4494#4476 & fixed review comments
This PR serves to fix following things in GCS gateway.
- fixes leaks in object reader and writer, not getting closed
under certain situations. This led to go-routine leaks.
- apparent confusing issue in case of complete multipart upload,
where it is currently possible for an entirely different
object name to concatenate parts of a different object name
if you happen to know the upload-id and parts of the object.
This is a very rare scenario but it is possible.
- succint usage of certain parts of code base and re-use.
Fixed header-to-metadat extraction. The extractMetadataFromHeader function should return an error if the http.Header contains a non-canonicalized key. The reason is that the keys can be manually set (through a map access) which can lead to ugly bugs.
Also fixed header-to-metadata extraction. Return a InternalError if a non-canonicalized key is found in a http.Header. Also log the error.
This is needed to avoid proxies buffering the connection
this is also a HTTP standard way to handle this situation
where server is sending back events in asynchronously.
For more details read https://goo.gl/RCML9f
Fixes - https://github.com/minio/minio-go/issues/731
When the browser asks for a GET presigned url, this latter is not
encoded and can be confusing when the user copies-pastes it somewhere,
especially when the path contains a space.
Current state-machine didn't honor a situation
which can arise when there is a combination of
- formatted
- unformatted
- corrupted
disks - this combination invariably goes into a
mode where all servers are waiting perpetually
forever thinking we will get quorum in future.
At this point there is a distant possibility of
ever getting a quorum since we don't even have
quorum number of disks offline.
We should exit and print a proper message per disk
to indicate what went wrong and what was detected
by the server.
Refer #4477
The ETag is constructed from md5 atttribute of object attributes
returned by the vendor's Composer. The md5 attribute comes back
as nil for large uploads. Instead the CRC32C should be used.
Refer to https://cloud.google.com/storage/docs/hashes-etagsFixes#4397
This implementation is similar to AMQP notifications:
* Notifications are published on a single topic as a JSON feed
* Topic is configurable, as is the QoS. Uses the paho.mqtt.golang
library for the mqtt connection, and supports connections over tcp
and websockets, with optional secure tls support.
* Additionally the minio server configuration has been bumped up
so mqtt configuration can be added.
* Configuration migration code is added with tests.
MQTT is an ISO standard M2M/IoT messaging protocol and was
originally designed for applications for limited bandwidth
networks. Today it's use is growing in the IoT space.
xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
Looks like if we follow pattern such as
```
_ = rlk
```
Go can potentially kick in GC and close the fd when
the reference is lost, only speculation is that
the cause here is `SetFinalizer` which is set on
`os.close()` internally in `os` stdlib.
This is unexpected and unsual endeavour for Go, but
we have to make sure the reference is never lost
and always dies with the server.
Fixes#4530
This patch also reverts previous changes which were
merged for migration to the newer disk format. We will
be bringing these changes in subsequent releases. But
we wish to add protection in this release such that
future release migrations are protected.
Revert "fs: Migration should handle bucketConfigs as regular objects. (#4482)"
This reverts commit 976870a391.
Revert "fs: Migrate object metadata to objects directory. (#4195)"
This reverts commit 76f4f20609.
isDocker was currently reading from `/proc/cgroup` file. But
this file alone is rather not conclusive evidence. Docker
internally has `.dockerenv` as a special file which we should
use instead.
Fixes#4456
Current code failed to anticipate the existence of files
which could have been created to corrupt the namespace such
as `policy.json` file created at the bucket top level.
In the current release creating such as file conflicts
with the namespace for future bucket policy operations.
We implemented migration of backend format to avoid situations
such as these.
This PR handles this situation, makes sure that the
erroneous files should have been moved properly.
Fixes#4478
Current code allowed it wrongly to generate secret key upto 100
we should only use 100 as a value to validate but for generating
it should be 40.
Fixes#4470
This makes lock RPCs similar to other RPCs where requests to the local
server bypass the network. Requests to the local lock-subsystem may
bypass the network layer and directly access the locking
data-structures.
This incidentally fixes#4451.
Currently redirection doesn't work in following scenarios
- server started with port ":80" and TLS is configured
client requested insecure request on port "80"
gets redirected to port 443 and fails.
The following commit f44f2e341c
fix was incomplete and we still had presigned URLs printing
in query strings in wrong fashion.
This PR fixes this properly. Avoid double encoding
percent encoded strings such as
`s3%!!(MISSING)A(MISSING)`
Print properly as json encoded.
`s3%3AObjectCreated%3A%2A`
Currently even when bucket doesn't exist we wrongly
return success, when an object is a directory prefix with
'/' as suffix and is of size 0.
This PR fixes this behavior.
Sending envVars along with access and secret
exposes the entire minio server's sensitive
information. This will be an unexpected
situation for all users.
If at all we need to look for things like if
credentials are set through env, we should
only have access to only this information
not the entire set of system envs.
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.
The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.
This patch provides significant performance improvement because:
1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;
2. limits the number of file-reads; and
3. avoids transferring the file over the network for bit-rot
verification.
ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.
Credits to AB and Harsha for the algorithmic improvement.
Fixes#4236.
This PR also does backend format change to 1.0.1
from 1.0.0. Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.
Fixes#4312
Previous message
```
Migration from version ‘17’ to ‘18’ completed successfully.
```
For example didn't provide any meaningful insights.
This PR attempts to improve this message as below
```
Configuration file '/home/harsha/.minio/config.json' migrated from version '17' to '18' successfully.
```
Fixes#4199
This change adopts the upstream fix in this regard at
https://go-review.googlesource.com/#/c/41834/ for Minio's
purposes.
Go's current os.Stat() lacks support for lot of strange
windows files such as
- share symlinks on SMB2
- symlinks on docker nanoserver
- de-duplicated files on NTFS de-duplicated volume.
This PR attempts to incorporate the change mentioned here
https://blogs.msdn.microsoft.com/oldnewthing/20100212-00/?p=14963/
The article suggests to use Windows I/O manager to
dereference the symbolic link.
Fixes#4122
We need to have local peer initialized properly
for listen bucket to work, current code did initialize
properly but the resulting code was initializing
peer on a wrong target v/s what listen bucket expected
it to be.
This regression came in de204a0a52Fixes#4158
Avoid using `time.Now()` instead rely on UTC time
for the final deadline, this is to be consistent with
all our internal functions.
Reduce the default read timeout to 15 seconds
in lieu with a newly discovered issue
- https://github.com/minio/minio/issues/4139
Additionally also change the Read() conn wrapper
to set deadline only upon successful Reads().
Current log prints in this form
```
ERRO[8150] Lock maintenance failed to remove entry for write
lock (should never happen)%!!(MISSING)(EXTRA ....
```
Fix this by using proper formatting directive.
Duration for which a lock was held can be computed from the `Since`
field of `OpsLockState`. It is the difference between current time and
time at which the namespace lock was held. This change avoids
superfluous instrumentation.
Previous value was set to avoid large cache value build
up but we can clearly see this can cause lots of GC
pauses which can lead to significant drop in performance.
Change this value to 50% and decrease the value to 25%
once the 75% cache size is used. To have a larger
window for GC pauses.
Another change is to only allow caching if a server has
more than 24GB of RAM instead of 8GB.