This commit adds key-rotation for SSE-S3 objects.
To execute a key-rotation a SSE-S3 client must
- specify the `X-Amz-Server-Side-Encryption: AES256` header
for the destination
- The source == destination for the COPY operation.
Fixes#6754
This PR adds support
- Request query params
- Request headers
- Response headers
AuditLogEntry is exported and versioned as well
starting with this PR.
On Windows erasure coding setup if
```
~ minio server V:\ W:\ X:\ Z:\
```
is not possible due to NTFS creating couple of
hidden folders, this PR allows minio to use
the entire drive.
Execute method in s3Select package makes a response.WriteHeader call.
Not calling it again in SelectObjectContentHandler function in case of
error in s3Select.Execute call.
On a heavily loaded server, getBucketInfo() becomes slow,
one can easily observe deleting an object causes many
additional network calls.
This PR is to let the underlying call return the actual
error and write it back to the client.
This PR supports two models for etcd certs
- Client-to-server transport security with HTTPS
- Client-to-server authentication with HTTPS client certificates
Endpoint comparisons blindly without looking
if its local is wrong because the actual drive
for a local disk is always going to provide just
the path without the HTTP endpoint.
Add code such that this is taken care properly in
all situations. Without this PR HealBucket() would
wrongly conclude that the healing doesn't have quorum
when there are larger number of local disks involved.
Fixes#6703
Current master didn't support CopyObjectPart when source
was encrypted, this PR fixes this by allowing range
CopySource decryption at different sequence numbers.
Fixes#6698
This PR fixes
- The target object should be compressed even if the
source object is not compressed.
- The actual size for an encrypted object should be the
`decryptedSize`
This commit fixes a wrong assignment to `actualPartSize`.
The `actualPartSize` for an encrypted src object is not `srcInfo.Size`
because that's the encrypted object size which is larger than the
actual object size. So the actual part size for an encrypted
object is the decrypted size of `srcInfo.Size`.
Without this fix we have room for two different type of
errors.
- Source is encrypted and we didn't provide any source encryption keys
This results in Incomplete body error to be returned back to the client
since source is encrypted and we gave the reader as is to the object
layer which was of a decrypted value leading to "IncompleteBody"
- Source is not encrypted and we provided source encryption keys.
This results in a corrupted object on the destination which is
considered encrypted but cannot be read by the server and returns
the following error.
```
<Error><Code>XMinioObjectTampered</Code><Message>The requested object
was modified and may be compromised</Message><Resource>/id-platform-gamma/
</Resource><RequestId>155EDC3E86BFD4DA</RequestId><HostId>3L137</HostId>
</Error>
```
This commit fixes a regression introduced in f187a16962
the regression returned AccessDenied when a client is trying to create an empty
directory on a existing prefix, though it should return 200 OK to be close as
much as possible to S3 specification.
Since refactoring to GetObjectNInfo style, there are many cases
when i/o closed pipe is printed like, downloading an object
with wrong encryption key. This PR removes the log.
This commit moves the check that SSE-C requests
must be made over TLS into a generic HTTP handler.
Since the HTTP server uses custom TCP connection handling
it is not possible to use `http.Request.TLS` to check
for TLS connections. So using `globalIsSSL` is the only
option to detect whether the request is made over TLS.
By extracting this check into a separate handler it's possible
to refactor other parts of the SSE handling code further.
This commit adds two functions for sealing/unsealing the
etag (a.k.a. content MD5) in case of SSE single-part upload.
Sealing the ETag is neccessary in case of SSE-S3 to preserve
the security guarantees. In case of SSE-S3 AWS returns the
content-MD5 of the plaintext object as ETag. However, we
must not store the MD5 of the plaintext for encrypted objects.
Otherwise it becomes possible for an attacker to detect
equal/non-equal encrypted objects. Therefore we encrypt
the ETag before storing on the backend. But we only need
to encrypt the ETag (content-MD5) if the client send it -
otherwise the client cannot verify it anyway.
CopyObject handler forgot to remove multipart encryption flag in metadata
when source is an encrypted multipart object and the target is also encrypted
but single part object.
This PR also simplifies the code to facilitate review.
This PR brings an additional logger implementation
called AuditLog which logs to http targets
The intention is to use AuditLog to log all incoming
requests, this is used as a mechanism by external log
collection entities for processing Minio requests.
This PR introduces two new features
- AWS STS compatible STS API named AssumeRoleWithClientGrants
```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```
This API endpoint returns temporary access credentials, access
tokens signature types supported by this API
- RSA keys
- ECDSA keys
Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.
- External policy engine support, in this case OPA policy engine
- Credentials are stored on disks
- Only require len(disks)/2 to initialize the cluster
- Fix checking of read/write quorm in subsystems init
- Add retry mechanism in policy and notification to avoid aborting in case of read/write quorums errors
in xl.PutObjectPart call, prepareFile detected an error when the storage
is exhausted but we were returning the wrong error.
With this commit, users can see the correct error message when their disks
become full.
Simplify the logic of using rename() in xl. Currently, renaming
doesn't require the source object/dir to be existent in at least
read quorum disks, since there is no apparent reason for it
to be as a general rule, this commit will just simplify the
logic to avoid possible inconsistency in the backend in the future.
This is a major regression introduced in this commit
ce02ab613d is the first bad commit
commit ce02ab613d
Author: Krishna Srinivas <634494+krishnasrinivas@users.noreply.github.com>
Date: Mon Aug 6 15:14:08 2018 -0700
Simplify erasure code by separating bitrot from erasure code (#5959)
:040000 040000 794f58d82ad2201ebfc8 M cmd
This effects all distributed server deployments since this commit
All the following releases are affected
- RELEASE.2018-09-25T21-34-43Z
- RELEASE.2018-09-12T18-49-56Z
- RELEASE.2018-09-11T01-39-21Z
- RELEASE.2018-09-01T00-38-25Z
- RELEASE.2018-08-25T01-56-38Z
- RELEASE.2018-08-21T00-37-20Z
- RELEASE.2018-08-18T03-49-57Z
Thanks to Anis for reproducing the issue
Without this PR minio server is writing an erroneous
response to clients on an idle connections which ends
up printing following message
```
Unsolicited response received on idle HTTP channel
```
This PR would avoid sending responses on idle connections
.i.e routine network errors.
In XL PutObject & CompleteMultipartUpload, the existing object is renamed
to the temporary directory before checking if worm is enabled or not.
Most of the times, this doesn't cause an issue unless two uploads to the
same location occurs at the same time. Since there is no locking in object
handlers, both uploads will reach XL layer. The second client acquiring
write lock in put object or complete upload in XL will rename the object
to the temporary directory before doing the check and returning the error (wrong!).
This commit fixes then the behavior: no rename to temporary directory if
worm is enabled.
Current implementation simply uses all the memory locally
and crashes when a large upload is initiated using Minio
browser UI.
This PR uploads stream in blocks and finally commits the blocks
allowing for low memory footprint while uploading large objects
through Minio browser UI.
This PR also adds ETag compatibility for single PUT operations.
Fixes#6542Fixes#6550
This to ensure that we heal all entries in config/
prefix, we will have IAM and STS related files which
are being introduced in #6168 PR
This is a change to ensure that we heal all of them
properly, not just `config.json`
go test shows the following warning:
```
WARNING: DATA RACE
Write at 0x000002909e18 by goroutine 276:
github.com/minio/minio/cmd.testAdminCmdRunnerSignalService()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-rpc_test.go:44 +0x94
Previous read at 0x000002909e18 by goroutine 194:
github.com/minio/minio/cmd.testServiceSignalReceiver()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-handlers_test.go:467 +0x70
```
The reason for this data race is that some admin tests are not waiting for go routines
that they created to be properly exited, which triggers the race detector.
When download profiling data API fails to gather profiling data
from all nodes for any reason (including profiler not enabled),
return 400 http code with the appropriate json message.
This commit adds two functions for removing
confidential information - like SSE-C keys -
from HTTP headers / object metadata.
This creates a central point grouping all
headers/entries which must be filtered / removed.
See also https://github.com/minio/minio/pull/6489#discussion_r219797993
of #6489
The new call combines GetObjectInfo and GetObject, and returns an
object with a ReadCloser interface.
Also adds a number of end-to-end encryption tests at the handler
level.
Allow minio s3 gateway to use aws environment credentials,
IAM instance credentials, or AWS file credentials.
If AWS_ACCESS_KEY_ID, AWS_SECRET_ACCSES_KEY are set,
or minio is running on an ec2 instance with IAM instance credentials,
or there is a file $HOME/.aws/credentials, minio running as an S3
gateway will authenticate with AWS S3 using those one of credentials.
The lookup order:
1. AWS environment varaibles
2. IAM instance credentials
3. $HOME/.aws/credentials
4. minio environment variables
To authenticate with the minio gateway, you will always use the
minio environment variables MINIO_ACCESS_KEY MINIO_SECRET_KEY.
Two handlers are added to admin API to enable profiling and disable
profiling of a server in a standalone mode, or all nodes in the
distributed mode.
/minio/admin/profiling/start/{cpu,block,mem}:
- Start profiling and return starting JSON results, e.g. one
node is offline.
/minio/admin/profiling/download:
- Stop the on-going profiling task
- Stream a zip file which contains all profiling files that can
be later inspected by go tool pprof
The test TestServerTLSCiphers seems to fail sometimes for
no obvious reason. Actually the test is not needed
(as unit test) since minio/mint tests the server's TLS ciphers
as part of its security tests.
Fixes#5977
Currently, one node in a cluster can fail to boot with the following error message:
```
ERROR Unable to initialize config system: Storage resources are insufficient for the write operation
```
This happens when disks are formatted, read quorum is met but write
quorum is not met. In checkServerConfig(), a insufficient read quorum
error is replaced by errConfigNotFound, the code will generate a
new config json and try to save it, but it will fail because write
quorum is not met.
Replacing read quorum with errConfigNotFound is also wrong because it
can lead, in rare cases, to overwrite the config set by the user.
So, this commit adds a retry mechanism in configuration initialization
to retry only with read or write quorum errors.
This commit will also fix the following cases:
- Read quorum is lost just after the initialization of the object layer.
- Write quorum not met when upgrading configuration version.
ReadFile RPC input argument has been changed in commit a8f5939452959d27674560c6b803daa9,
however, RPC doesn't detect such a change when it calls other nodes with older versions.
Hence, bumping RPC version.
Fixes#6458
It was expected that in gateway mode, we do not know
the backend types whereas in NAS gateway since its
an extension of FS mode (standalone) this leads to
an issue in LivenessCheckHandler() which would perpetually
return 503, this would affect all kubernetes, openshift
deployments of NAS gateway.
This commit fixes an AWS S3 incompatibility issue.
The AccessKeyID may contain one or more `/` which caused
the server to interpret parts of the AccessKeyID as
other `X-Amz-Credential` parameters (like date, region, ...)
This commit fixes this by allowing 5 or more
`X-Amz-Credential` parameter strings and only interpreting
the last 5.
Fixes#6443
This commit will print connection failures to other disks in other nodes
after 5 retries. It is useful for users to understand why the
distribued cluster fails to boot up.
Enhance a little bit the error message that is showing
when access & secret keys are not specified in the
environment when running Minio in gateway and server mode.
This commit also removes a redundant check of access/secret keys.
This commit fixes the Manta gateway client creation flow. We now affix
the endpoint scheme with endpoint URL while creating the Manta client
for gateway.
Also add steps in Manta gateway docs on how to run with custom Manta
endpoint.
Fixes#6408
This commit fixes are regression in the server regarding
handling SSE requests with wrong SSE-C keys.
The server now returns an AWS S3 compatable API error (access denied)
in case of the SSE key does not match the secret key used during upload.
Fixes#6431
This PR adds two new admin APIs in Minio server and madmin package:
- GetConfigKeys(keys []string) ([]byte, error)
- SetConfigKeys(params map[string]string) (err error)
A key is a path in Minio configuration file, (e.g. notify.webhook.1)
The user will always send a string value when setting it in the config file,
the API will know how to convert the value to the appropriate type. The user
is also able to set a raw json.
Before setting a new config, Minio will validate all fields and try to connect
to notification targets if available.
Currently Go http connection pool was not being properly
utilized leading to degrading performance as the number
of concurrent requests increased.
As recommended by Go implementation, we have to drain the
response body and close it.
Removing an empty directory is not working because of xl.DeleteObject()
was only checking if the passed prefix is an actual object but it
should also check if it is an empty directory.
soMaxConn value is 128 on almost all linux systems,
this value is too low for Minio at times when used
against large concurrent workload e.g: spark applications
this causes a sort of SYN flooding observed by the kernel
to allow for large backlog increase this value to 2048.
With this value we do not see anymore SYN flooding
kernel messages.
ListMultipartUploads implementation is meant for docker-registry
use-case only. It lists only the first upload with a prefix matching
the object being uploaded.
* Revert "Encrypted reader wrapped in NewGetObjectReader should be closed (#6383)"
This reverts commit 53a0bbeb5b.
* Revert "Change SelectAPI to use new GetObjectNInfo API (#6373)"
This reverts commit 5b05df215a.
* Revert "Implement GetObjectNInfo object layer call (#6290)"
This reverts commit e6d740ce09.
An issue was reproduced when minio-js client functional
tests are setting lower case http headers, in our current
master branch we specifically look for canonical host header
which may be not necessarily true for all http clients.
This leads to a perpetual hang on the *net.Conn*.
This PR fixes regression caused by #6206 by handling the
case insensitivity.
This combines calling GetObjectInfo and GetObject while returning a
io.ReadCloser for the object's body. This allows the two operations to
be under a single lock, fixing a race between getting object info and
reading the object body.
One typo introduced in a recent commit miscalculates if worm and browser
are enabled or not. A simple test is also added to detect this issue
in the future if it ever happens again.
In current master when you do `mc watch` you can see a
dynamic ARN being listed which exposes the remote IP as well
```
mc watch play/airlines
```
On another terminal
```
mc admin info play
● play.minio.io:9000
Uptime : online since 11 hours ago
Version : 2018-08-22T07:50:45Z
Region :
SQS ARNs : arn:minio:sqs::httpclient+51c39c3f-131d-42d9-b212-c5eb1450b9ee+73.222.245.195:33408
Stats : Incoming 30GiB, Outgoing 7.6GiB
Storage : Used 7.7GiB
```
SQS ARNs listed as part of ServerInfo should be only external targets,
since listing an ARN here is not useful and it cannot be re-purposed in
any manner.
This PR fixes this issue by filtering out httpclient from the ARN list.
This is a regression introduced in #52940e4431725c
This commit adds error handling for SSE-KMS requests to
HEAD, GET, PUT and COPY operations. The server responds
with `not implemented` if a client sends a SSE-KMS
request.
This package provide customizable TCP net.Listener with various
performance-related options:
* SO_REUSEPORT. This option allows linear scaling server performance
on multi-CPU servers.
See https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/ for details.
* TCP_DEFER_ACCEPT. This option expects the server reads from the accepted
connection before writing to them.
* TCP_FASTOPEN. See https://lwn.net/Articles/508865/ for details.
Add support for sse-s3 encryption with vault as KMS.
Also refactoring code to make use of headers and functions defined in
crypto package and clean up duplicated code.
This PR fixes a regression introduced in 8eb838bf91
where hashing technique was used on prefixes to get the right set
to perform the operation, this is not correct since prefixes and
their corresponding keys might hash to a different value depending
on the key length.
For prefixes/directories we should look everywhere to support proper
quorum based listing.
Fixes#6293
This PR is the first set of changes to move the config
to the backend, the changes use the existing `config.json`
allows it to be migrated such that we can save it in on
backend disks.
In future releases, we will slowly migrate out of the
current architecture.
Fixes#6182
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.
Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf,
which will print on any error.
Fixes#5997
This commit adds support for detecting SSE-KMS headers.
The server should be able to detect SSE-KMS headers to
at least fail such S3 requests with not implemented.
When a S3 client sends a GET Object with a range header, 206 http
code is returned indicating success, however the call of the object
layer's GetObject() inside the handler can return an error and will lead
to writing an XML error message, which is obviously wrong since
we already sent 206 http code. So in the case, we just stop sending
data to the S3 client, this latter can still detect if there is no
error when comparing received data with Content-Length header
in the Get Object response.
ANSI colors do not work on dumb terminals, in situations
when minio is running as a service under systemd.
This PR ensures we turn off color in those situations.
This is to avoid serializing RPC contention on ongoing
parallel operations, the blocking profile indicating
all calls were being serialized through setRetryTicker.
This commit adds the crypto.* errors to the
`toAPIErrorCode` switch. Further this commit adds an S3
API error code returned whenever the client specifes a
SSE-S3 request with an invalid algorithm parameter.
Fixes#6238
globalPolicySys used to be initialized in fs/xl layer. The referenced
commit moved this logic to server/gateway initialization,but a check
to avoid double initialization prevented globalPolicySys to be loaded
from disk for NAS.
fixes regression from commit be1700f595
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.
Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
POST mime/multipart upload style can have filename value optional
which leads to implementation issues in Go releases in their
standard mime/multipart library.
When `filename` doesn't exist Go doesn't update `form.File` which
we rely on to extract the incoming file data, strangely when `filename`
is not specified this data is buffered in memory and is now part of
`form.Value` instead of `form.File` which creates an inconsistent
behavior.
This PR tries to fix this in our code for the time being, but ideal PR
would be to fix the upstream mime/multipart library to handle the
above situation consistently.
This commit adds a `fmt.Stringer` implementation for
SSE-S3 and SSE-C. The string representation is the
domain used for object key sealing.
See: `ObjectKey.Seal(...)` and `ObjectKey.Unseal(...)`
Continuing from PR 157ed65c35
Our posix.go implementation did not handle I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.
This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.
This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
* crypto: add support for parsing SSE-C/SSE-S3 metadata
This commit adds support for detecting and parsing
SSE-C/SSE-S3 object metadata. With the `IsEncrypted`
functions it is possible to determine whether an object
seems to be encrypted. With the `ParseMetadata` functions
it is possible to validate such metadata and extract the
SSE-C/SSE-S3 related values.
It also fixes some naming issues.
* crypto: add functions for creating SSE object metadata
This commit adds functions for creating SSE-S3 and
SSE-C metadata. It also adds a `CreateMultipartMetadata`
for creating multipart metadata.
For all functions unit tests are included.
Since implementing `pwrite` like implementation would
require a more complex code than background append
implementation, it is better to keep the current code
as is and not implement `pwrite` based functionality.
Closes#4881
Healthcheck handler in current implementation was
performing ListBuckets() to check for liveness of Minio
service. ListBuckets() implementation on the other hand
doesn't do quorum based listing and if one of the disks
returned error, an I/O error it would be lead to kubernetes
taking the minio pod down prematurely even if the disk
is not local to that minio server.
The reason is ListBuckets() call cannot be trusted to
provide us the valid information that we need, Minio is a
clustered application which is designed to handle disk
failures. Error on one of the disks doesn't mean the pod
should become fully non-operational.
This PR attempts to fix this by only checking for alive
disks which are local to each setup and also by simply
performing a Stat() operation, if the Stat() returned
error on all disks local to a particular server then
we can let kubernetes safely take it down, until then
we should be operational.
The current code for deleting 1000 objects simultaneously
causes significant random I/O, which on slower drives
leads to servers disconnecting in a distributed setup.
Simplify this by serially deleting and reducing the
chattiness of this operation.
Currently, requestid field in logEntry is not populated, as the
requestid field gets set at the very end.
It is now set before regular handler functions. This is also
useful in setting it as part of the XML error response.
Travis build for ppc64le has been quite inconsistent and stays queued
for most of the time. Removing this build as part of Travis.yml for
the time being.
- Add console target logging, enabled by default.
- Add http target logging, which supports an endpoint
with basic authentication (username/password are passed
in the endpoint url itself)
- HTTP target logging is asynchronous and some logs can be
dropped if channel buffer (10000) is full
In a small window, UI error tries to split lines for an eye candy
error message. However, since we show some docs.minio.io links in some
error messages, these links are actually broken and not easily selected
in a X terminal. This PR changes the behavior and won't split lines
anymore.
This commit adds basic support for SSE-C / SSE-C copy.
This includes functions for determining whether SSE-C
is requested by the S3 client and functions for parsing
such HTTP headers.
All S3 SSE-C parsing errors are exported such that callers
can pattern-match to forward the correct error to S3
clients.
Further the SSE-C related internal metadata entry-keys
are added by this commit.
This commit adds a basic KMS implementation for an
operator-specified SSE-S3 master key. The master key
is wrapped as KMS such that using SSE-S3 with master key
and SSE-S3 with KMS can use the same code.
Bindings for a remote / true KMS (like hashicorp vault)
will be added later on.
This commit updates the key derivation to reflect the
latest change of crypto/doc.go. This includes handling
the insecure legacy KDF.
Since #6064 is fixed, the 3. test case for object key
generation is enabled again.
With CoreDNS now supporting etcdv3 as the DNS backend, we
can update our federation target to etcdv3. Users will now be
able to use etcdv3 server as the federation backbone.
Minio will update bucket data to etcdv3 and CoreDNS can pick
that data up and serve it as bucket style DNS path.
This commit fixes the size calculation for multipart
objects. The decrypted size of an encrypted multipart
object is the sum of the decrypted part sizes.
Also fixes the key derivation in CopyObjectPart.
Instead of using the same object-encryption-key for each
part now an unique per-part key is derived.
Updates #6139
Minio server was preventing itself to start when any notification
target is down and not running. The PR changes the behavior by
avoiding startup abort in that case, so the user will still
be able to access Minio server using mc admin commands after
a restart or set config commands.
This commit fixes a weakness of the key-encryption-key
derivation for SSE-C encrypted objects. Before this
change the key-encryption-key was not bound to / didn't
depend on the object path. This allows an attacker to
repalce objects - encrypted with the same
client-key - with each other.
This change fixes this issue by updating the
key-encryption-key derivation to include:
- the domain (in this case SSE-C)
- a canonical object path representation
- the encryption & key derivation algorithm
Changing the object path now causes the KDF to derive a
different key-encryption-key such that the object-key
unsealing fails.
Including the domain (SSE-C) and encryption & key
derivation algorithm is not directly neccessary for this
fix. However, both will be included for the SSE-S3 KDF.
So they are included here to avoid updating the KDF
again when we add SSE-S3.
The leagcy KDF 'DARE-SHA256' is only used for existing
objects and never for new objects / key rotation.
This PR simplifies the code to avoid tracking
any running usage events. This PR also brings
in an upper threshold of upto 1 minute suspend
the usage function after which the usage would
proceed without waiting any longer.
This commit introduces a new crypto package providing
AWS S3 related cryptographic building blocks to implement
SSE-S3 (master key or KMS) and SSE-C.
This change only adds some basic functionallity esp.
related to SSE-S3 and documents the general approach
for SSE-S3 and SSE-C.
disk usage crawling is not needed when a tenant
is not sharing the same disk for multiple other
tenants. This PR adds an optimization when we
see a setup uses entire disk, we simply rely on
statvfs() to give us total usage.
This PR also additionally adds low priority
scheduling for usage check routine, such that
other go-routines blocked will be automatically
unblocked and prioritized before usage.
Minio server returns 403 (access denied) for head requests to prefixes
without trailing "/", this is different from S3 behaviour. S3 returns
404 in such cases.
Fixes#6080
This commit prevents complete server failures caused by
`logger.CriticalIf` calls. Instead of calling `os.Exit(1)`
the function now executes a panic with a special value
indicating that a critical error happend. At the top HTTP
handler layer panics are recovered and if its a critical
error the client gets an InternalServerError status code.
Further this allows unit tests to cover critical-error code
paths.
Add compile time GOROOT path to the list of prefix
of file paths to be removed.
Add webhandler function names to the slice that
stores function names to terminate logging.
During startup until the object layer is initialized
logger is disabled to provide for a cleaner UI error
message. CriticalIf is disabled, use FatalIf instead.
Also never call os.Exit(1) on running servers where
you can return error to client in handlers.
This commit limits the amount of memory allocated by the
S3 Multi-Object-Delete-API. The server used to allocate as
many bytes as provided by the client using Content-Length.
S3 specifies that the S3 Multi-Object-Delete-API can delete
at most 1000 objects using a single request.
(See: https://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html)
Since the maximum S3 object name is limited to 1024 bytes the
XML body sent by the client can only contain up to 1000 * 1024
bytes (excluding XML format overhead).
This commit limits the size of the parsed XML for the S3
Multi-Object-Delete-API to 2 MB. This fixes a DoS
vulnerability since (auth.) clients, MitM-adversaries
(without TLS) and un-auth. users accessing buckets allowing
multi-delete by policy can kill the server.
This behavior is similar to the AWS-S3 implementation.
This PR adds CopyObject support for objects residing in buckets
in different Minio instances (where Minio instances are part of
a federated setup).
Also, added support for multiple Minio domain IPs. This is required
for distributed deployments, where one deployment may have multiple
nodes, each with a different public IP.
Buckets already present on a Minio server before it joins a
bucket federated deployment will now be added to etcd during
startup. In case of a bucket name collision, admin is informed
via Minio server console message.
Added configuration migration for configuration stored in etcd
backend.
Also, environment variables are updated and ListBucket path style
request is no longer forwarded.
Added support for new RPC support using HTTP POST. RPC's
arguments and reply are Gob encoded and sent as HTTP
request/response body.
This patch also removes Go RPC based implementation.
With the implementation of dummy GET ACL handlers,
tools like s3cmd perform few operations which causes
the ACL call to be invoked. Make sure that in our
router configuration GET?acl comes before actual
GET call to facilitate this dummy call.
tests were written in the manner by editing internal
variables of fsObjects to mimic certain behavior from
APIs, but this is racy when an active go-routine is
reading from the same variable.
Make sure to terminate the go-routine if possible for
these tests.
The current problem is that when you invoke
```
mc admin info myminio | head -1
● localhost:9000
```
This output is incorrect as the expected output should be
```
mc admin info myminio | head -1
● 192.168.1.17:9000
```
This commit adds a check to the server's admin-API such that it only
accepts Admin-API requests with authenticated bodies. Further this
commit updates the `madmin` package to always add the
`X-Amz-Content-Sha256` header.
This change improves the Admin-API security since the server does not
accept unauthenticated request bodies anymore.
After this commit `mc` must be updated to the new `madmin` api because
requests over TLS connections will fail.
This commit fixes a DoS vulnerability for certain APIs using
signature V4 by verifying the content-md5 and/or content-sha56 of
the request body in a streaming mode.
The issue was caused by reading the entire body of the request into
memory to verify the content-md5 or content-sha56 checksum if present.
The vulnerability could be exploited by either replaying a V4 request
(in the 15 min time frame) or sending a V4 presigned request with a
large body.
Removed field minio_http_requests_total as it was redundant with
minio_http_requests_duration_seconds_count
Also removed field minio_server_start_time_seconds as it was
redundant with process_start_time_seconds
GetBucketACL call returns empty for all GET in ACL requests,
the primary purpose of this PR is to provide legacy API support
for legacy applications.
Fixes#5706
Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.
isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
This commit fixes a DoS vulnerability in the
request authentication. The root cause is an 'unlimited'
read-into-RAM from the request body.
Since this read happens before the request authentication
is verified the vulnerability can be exploit without any
access privileges.
This commit limits the size of the request body to 3 MB.
This is about the same size as AWS. The limit seems to be
between 1.6 and 3.2 MB - depending on the AWS machine which
is handling the request.
This commit ensures that all tickers are stopped using defer ticker.Stop()
style. This will also fix one bug seen when a client starts to listen to
event notifications and that case will result a leak in tickers.
Current healing has an issue when disks are healed
even when they are offline without knowing if disk
is unformatted. This can lead to issues of pre-maturely
removing the disk from the set just because it was
temporarily offline.
There is an increasing number of `mc admin heal` usage
on a cron or regular basis. It is possible that if healing
code saw disk is offline it might prematurely take it down,
this causes availability issues.
Fixes#5826
Previously we used allow bucket policies without
`Version` field to be set to any given value, but
this behavior is inconsistent with AWS S3.
PR #5790 addressed this by making bucket policies
stricter and cleaner, but this causes a breaking
change causing any existing policies perhaps without
`Version` field or the field to be empty to fail upon
server startup.
This PR brings a code to migrate under these scenarios
as a one time operation.
- remove old bucket policy handling
- add new policy handling
- add new policy handling unit tests
This patch brings support to bucket policy to have more control not
limiting to anonymous. Bucket owner controls to allow/deny any rest
API.
For example server side encryption can be controlled by allowing
PUT/GET objects with encryptions including bucket owner.
This change disables the non-constant-time implementations of P-384 and P-521.
As a consequence a client using just these curves cannot connect to the server.
This should be no real issues because (all) clients at least support P-256.
Further this change also rejects ECDSA private keys of P-384 and P-521.
While non-constant-time implementations for the ECDHE exchange don't expose an
obvious vulnerability, using P-384 or P-521 keys for the ECDSA signature may allow
pratical timing attacks.
Fixes#5844
- getBucketLocation
- headBucket
- deleteBucket
Should return 404 or NoSuchBucket even for invalid bucket names, invalid
bucket names are only validated during MakeBucket operation
This is an effort to remove panic from the source.
Add a new call called CriticialIf, that calls LogIf and exits.
Replace panics with one of CriticalIf, FatalIf and a return of error.
Make sure to apply standard headers such as Content-Type,
Content-Disposition and Content-Language to the correct
GCS object attributes during object upload and copy operations.
Fixes: #5800
As we move to multiple config backends like local disk and etcd,
config file should not be read from the disk, instead the quick
package should load and verify for duplicate entries.
This change adds some security headers like Content-Security-Policy.
It does not set the HSTS header because Content-Security-Policy prevents
mixed HTTP and HTTPS content and the server does not use cookies.
However it is a header which could be added later on.
It also moves some header added by #5805 from a vendored file
to a generic handler.
Fixes ##5813
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.
Fixes#5797
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.
Fixes#5700
This change let the server return the S3 error for a key rotation
if the source key is not valid but equal to the destination key.
This change also fixes the SSE-C error messages since AWS returns error messages
ending with a '.'.
Fixes#5625
This change sets the storage class of the object-info if a storage
class was specified during PUT. The server now replies with the
storage class which was set during uploading the object in FS mode.
Fixes#5777
Default installations of cloned VMs on VMware like env
might experience serious problems with time skewing,
allow for a higher value instead of 3 seconds we are
moving to 15 minutes just like API level skew.
Access to internet and configuring ntp might not be possible,
in such situations providing atleast a 15 minute skew could
cater for majority of situations.
Since we do not re-use storageDisks after moving
the connections to object layer we should close them
appropriately otherwise we have a lot of connection
leaks and these can compound as the time goes by.
This PR also refactors the initialization code to
re-use storageDisks for given set of endpoints until
we have confirmed a valid reference format.
An issue was reproduced when there a no more inodes
available on an existing setup of 4 disks, now we
took one of the disks and reformatted it to relinquish
inodes. Now we attempt to bring the fresh disk back
into setup and perform a heal - at this point creating
new `format.json` fails on existing disks since they
do not have more inodes available.
At this point due to quorum failure, we end up deleting
existing `format.json` as well, this PR removes the code
which deletes existing `format.json` as there is no need
to delete them.
Previous PR 2afd196c83 fixed
the issue of quorum based listing for regular objects, this
PR continues on this idea by extending this support to
object directory prefixes as well.
Fixes#5733
Set GOPATH string to empty in build-constants.go
Check for both compile time GOPATH and default GOPATH
while trimming the file path in the stack trace.
Fixes#5741
This PR fixes two different variant of deadlocks in
notification.
- holding write lock on the bucket competing with read lock
- holding competing locks on read/save notification config
This PR adds disk based edge caching support for minio server.
Cache settings can be configured in config.json to take list of disk drives,
cache expiry in days and file patterns to exclude from cache or via environment
variables MINIO_CACHE_DRIVES, MINIO_CACHE_EXCLUDE and MINIO_CACHE_EXPIRY
Design assumes that Atime support is enabled and the list of cache drives is
fixed.
- Objects are cached on both GET and PUT/POST operations.
- Expiry is used as hint to evict older entries from cache, or if 80% of cache
capacity is filled.
- When object storage backend is down, GET, LIST and HEAD operations fetch
object seamlessly from cache.
Current Limitations
- Bucket policies are not cached, so anonymous operations are not supported in
offline mode.
- Objects are distributed using deterministic hashing among list of cache
drives specified.If one or more drives go offline, or cache drive
configuration is altered - performance could degrade to linear lookup.
Fixes#4026
This is a trival fix to support server level WORM. The feature comes
with an environment variable `MINIO_WORM`.
Usage:
```
$ export MINIO_WORM=on
$ minio server endpoint
```
Object deletion should not be possible if quorum is not
available. This PR updates deleteObject() to check for
quorum errors before proceeding with object deletion.
Fixes#5535
This commit adds the bucket delete and bucket policy functionalities
to the browser.
Part of rewriting the browser code to follow best practices and
guidelines of React (issues #5409 and #5410)
The backend code has been modified by @krishnasrinivas to prevent
issue #4498 from occuring. The relevant changes have been made to the
code according to the latest commit and the unit tests in the backend.
This commit also addresses issue #5449.
Migration regression got introduced in 9083bc152e
adding more unit tests to catch this scenario, we need to fix this by
re-writing the formats after the migration to 'V3'.
This bug only happens when a user is migrating directly from V1 to V3,
not from V1 to V2 and V2 to V3.
Added additional unit tests to cover these situations as well.
Fixes#5667
- Add head method for healthcheck endpoint. Some platforms/users
may use the HTTP Head method to check for health status.
- Add liveness and readiness probe examples in Kubernetes yaml
example docs. Note that readiness probe not added to StatefulSet
example due to https://github.com/kubernetes/kubernetes/issues/27114
With following changes
- Add SSE and refactor encryption API (#942) <Andreas Auernhammer>
- add copyObject test changing metadata and preserving etag (#944) <Harshavardhana>
- Add SSE-C tests for multipart, copy, get range operations (#941) <Harshavardhana>
- Removing conditional check for notificationInfoCh in api-notication (#940) <Matthew Magaldi>
- Honor prefix parameter in ListBucketPolicies API (#929) <kannappanr>
- test for empty objects uploaded with SSE-C headers (#927) <kannappanr>
- Encryption headers should also be set during initMultipart (#930) <Harshavardhana>
- Add support for Content-Language metadata header (#928) <kannappanr>
- Fix check for duplicate notification configuration entries (#917) <kannappanr>
- allow OS to cleanup sockets in TIME_WAIT (#925) <Harshavardhana>
- Sign V2: Fix signature calculation in virtual host style (#921) <A. Elleuch>
- bucket policy: Support json string in Principal field (#919) <A. Elleuch>
- Fix copyobject failure for empty files (#918) <kannappanr>
- Add new constructor NewWithOptions to SDK (#915) <poornas>
- Support redirect headers to sign again with new Host header. (#829) <Harshavardhana>
- Fail in PutObject if invalid user metadata is passed <Harshavadhana>
- PutObjectOptions Header: Don't include invalid header <Isaac Hess>
- increase max retry count to 10 (#913) <poornas>
- Add new regions for Paris and China west. (#905) <Harshavardhana>
- fix s3signer to use req.Host header (#899) <Bartłomiej Nogaś>
This PR adds readiness and liveness endpoints to probe Minio server
instance health. Endpoints can only be accessed without authentication
and the paths are /minio/health/live and /minio/health/ready for
liveness and readiness respectively.
The new healthcheck liveness endpoint is used for Docker healthcheck
now.
Fixes#5357Fixes#5514
In kubernetes statefulset like environments when secrets
are mounted to pods they have sub-directories, we should
ideally be only looking for regular files here and skip
all others.
Fix a compatibility issue with AWS S3 where to do key rotation
we need to replace an existing object's metadata. In such a
scenario "REPLACE" metadata directive is not necessary.
- Data from disk was being read after bitrot verification to return
data for GetObject. Strictly speaking this does not guarantee bitrot
protection, as disks may return bad data even temporarily.
- This fix reads data from disk, verifies data for bitrot and then
returns data to the client directly.
Current code didn't implement the logic to support
decrypting encrypted multiple parts, this PR fixes
by supporting copying encrypted multipart objects.
Currently we reply back `X-Minio-Internal` values
back to the client for an encrypted object, we should
filter these out and only reply AWS compatible headers.
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
Flags like `json, config-dir, quiet` are now honored even if they are
between minio and gateway in the cli, like, `minio --json gateway s3`.
Fixes#5403
Stable sort is needed when we are sorting based on two or more
distinct elements. When equal elements are indistinguishable,
such as with integers, or more generally, any data where the
entire element is the key like `PartNumber`, stability is not
an issue.
Refactor such that metadata and etag are
combined to a single argument `srcInfo`.
This is a precursor change for #5544 making
it easier for us to provide encryption/decryption
functions.
Delete & Multi Delete API should not try to remove the directory content.
The only permitted case is with zero size object with a trailing slash
in its name.
MaxIdleConns limits the total number of connections
kept in the pool for re-use. In addition, MaxIdleConnsPerHost
limits the number for a single host. Since minio gateways
usually connect to the same host, setting `MaxIdleConns = 100`
won't really have much of an impact since the idle connection
pool is limited to 2 anyway.
Now, with the pool set to a limit of 2, and when using
the client heavily from 2+ goroutines, the `http.Transport`
will open a connection, use it, then try to return it to
the idle-pool which often fails since there's a limit of 2.
So it's going to close the connection and new ones will be
opened on demand again, many of which get closed soon after
being used. Since those connections/sockets don't disappear
from the OS immediately, use `MaxIdleConnsPerHost = 100`
which fixes this problem.
Overwriting files is allowed, but since the introduction of
the object directory, we will aslo need to allow overwriting
an empty directory. Putting twice the same object directory
won't fail with 403 error anymore.
TestNewWebHookNotify wasn't passing in my local machine. The reason is
that the test expects the POST handler (as a webhook endpoint) is always
running on port 80, which is not always the case.
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
Since we do not encrypt directories we don't need to send
errors with encryption headers when the directory doesn't
have encryption metadata.
Continuation PR from 4ca10479b5
It can happen such that one of the disks that was down would
return 'errDiskNotFound' but the err is preserved due to
loop shadowing which leads to issues when healing the bucket.
This change adds an object size check such that the server does not
encrypt empty objects (typically folders) for SSE-C. The server still
returns SSE-C headers but the object is not encrypted since there is no
point to encrypt such objects.
Fixes#5493
Currently minio master requires 4 servers, we
have decided to run on a minimum of 2 servers
instead - fixes a regression from previous
releases where 3 server setups were supported.
This PR brings semver capabilities in our RPC layer to
ensure that we can upgrade the servers in rolling fashion
while keeping I/O in progress. This is only a framework change
the functionality remains the same as such and we do not
have any special API changes for now. But in future when
we bring in API changes we will be able to upgrade servers
without a downtime.
Additional change in this PR is to not abort when serverVersions
mismatch in a distributed cluster, instead wait for the quorum
treat the situation as if the server is down. This allows
for administrator to properly upgrade all the servers in the cluster.
Fixes#5393
in-memory caching cannot be cleanly implemented
without the access to GC which Go doesn't naturally
provide. At times we have seen that object caching
is more of an hindrance rather than a boon for
our use cases.
Removing it completely from our implementation
related to #5160 and #5182
This is a generic minimum value. The current reason is to support
Azure blob storage accounts name whose length is less than 5. 3 is the
minimum length for Azure.