There is no written specification about how to encode key names
when url encoding type is passed.
However, this change will encode URLs as url.QueryEscape() does
while considering AWS S3 exceptions.
This commit adds a unit test for the vault
config verification (which covers also `IsEmpty()`).
Vault-related code is hard to test with unit tests
since a Vault service would be necessary. Therefore
this commit only adds tests for a fraction of the code.
Fixes#7409
Most hadoop distributions hortonworks, cloudera all
depend on aws-sdk-java 1.7.x to 1.10.x - the releases
which have bugs related case sensitive check for
ETag header. Go changes the case of the headers set
to be canonical but only preserves them when set
through a direct map.
This fixes most compatibility issues we have had
in the past supporting older hadoop distributions.
This commit fixes a privilege escalation issue against
the S3 and web handlers. An authenticated IAM user
can:
- Read from or write to the internal '.minio.sys'
bucket by simply sending a properly signed
S3 GET or PUT request. Further, the user can
- Read from or write to the internal '.minio.sys'
bucket using the 'Upload'/'Download'/'DownloadZIP'
API by sending a "browser" request authenticated
with its JWT token.
This commit fixes another privilege escalation issue
abusing the inter-node communication of distributed
servers to obtain/modify the server configuration.
The inter-node communication is authenticated using
JWT-Tokens. Further, IAM users accessing the cluster
via the web UI also get a JWT token and the browser
will add this "user" JWT token to each the request.
Now, a user can extract that JWT token an can craft
HTTP POST requests for the inter-node communication
API endpoint. Since the server accepts ANY valid
JWT token it also accepts inter-node commands from
an authenticated user such that the user can execute
arbitrary commands bypassing the IAM policy engine
and impersonate other users, change its own IAM policy
or extract the admin access/secret key.
This is fixed by only accepting "admin" JWT tokens
(tokens containing the admin access key - and therefore
were generated with the admin secret key). Consequently,
only the admin user can execute such inter-node commands.
Simplify the cmd/http package overall by removing
custom plain text v/s tls connection detection, by
migrating to go1.12 and choose minimum version
to be go1.12
Also remove all the vendored deps, since they
are not useful anymore.
A race is detected between a bytes.Buffer generated with cmd/rpc.Pool
and http2 module. An issue is raised in golang (https://github.com/golang/go/issues/31192).
Meanwhile, this commit disables Pool in RPC code and it generates a
new 1kb of bytes.Buffer for each RPC call.
Before this commit, nodes wait indefinitely without showing any
indicate error message when a node is started with different access
and secret keys.
This PR will show '401 Unauthorized' in this case.
Currently message is set to error type value.
Message field is not used in error logs. it is used only in the case of info logs.
This PR sets error message field to store error type correctly.
Copying an encrypted SSEC object when this latter is uploaded using
multipart mechanism was failing because ETag in case of encrypted
multipart upload is not encrypted.
This PR fixes the behavior.
This fixes varying pids for server-respawns. And avoids duplicate process
creating multiple pids when the server restart signal is triggered with
service restart enabled.
Fixes#7350
It is required to set the environment variable in the case of distributed
minio. LoadCredentials is used to notify peers of the change and will not work if
environment variable is set. so, this function will never be called.
In scenario 1
```
- bucket/object-prefix
- bucket/object-prefix/object
```
Server responds with `XMinioParentIsObject`
In scenario 2
```
- bucket/object-prefix/object
- bucket/object-prefix
```
Server responds with `XMinioObjectExistsAsDirectory`
Fixes#6566
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
We should internally handle when http2 input stream has smaller
content than its content-length header
Upstream issue reported https://github.com/golang/go/issues/30648
This a change which we need to handle internally until Go fixes it
correctly, till now our code doesn't expect a custom error to be returned.
CopyObject precondition checks into GetObjectReader
in order to perform SSE-C pre-condition checks using the
last 32 bytes of encrypted ETag rather than the decrypted
ETag
This also necessitates moving precondition checks for
gateways to gateway layer rather than object handler check
if a bucket with `Captialized letters` is created, `InvalidBucketName` error
will be returned.
In the case of pre-existing buckets, it will be listed.
Fixes#6938
Prevents deferred close functions from being called while still
attempting to copy reader to snappyWriter.
Reduces code duplication when compressing objects.
This change allows indefinitely running go-routines to cleanup
gracefully.
This channel is now closed at the beginning of each test so that
long-running go-routines quit and a new one is assigned.
The side affect of this change memory
increase, but this is a trade-off between
performance and actual memory usage.
For all practical scenarios this should be
an adequate change.
- The events will be persisted in queueStore if `queueDir` is set.
- Else, if queueDir is not set events persist in memory.
The events are replayed back when the mqtt broker is back online.
Clients like AWS SDK Java and AWS cli XML parsers are
unable to handle on `\r\n` characters to avoid these
errors send XML header first and write white space characters
instead.
Also handle cases to avoid double WriteHeader calls
- Current implementation was spawning renewer goroutines
without waiting for the lease duration to end. Remove vault renewer
and call vault.RenewToken directly and manage reauthentication if
lease expired.
We should change the logic for both isObject()
and isObjectDir() leaf detection to be done
with quorum, due to how our directory navigation
works - this allows for properly deleting all
the dangling directories or objects if any.
This commit fixes a nil pointer dereference issue
that can occur when the Vault KMS returns e.g. a 404
with an empty HTTP response. The Vault client SDK
does not treat that as error and returns nil for
the error and the secret.
Further it simplifies the token renewal and
re-authentication mechanism by using a single
background go-routine.
The control-flow of Vault authentications looks
like this:
1. `authenticate()`: Initial login and start of background job
2. Background job starts a `vault.Renewer` to renew the token
3. a) If this succeeds the token gets updated
b) If this fails the background job tries to login again
4. If the login in 3b. succeeded goto 2. If it fails
goto 3b.
Currently, we were sending errors in Select binary format,
which is incompatible with AWS S3 behavior, errors in binary
are sent after HTTP status code is already 200 OK - i.e it
happens during the evaluation of the record reader.
This commit increases storage REST requests to 5 minutes, this includes
the opening TCP connection, and sending/receiving data. This will reduce
clients receiving errors when the server is under high load.
Different gateway implementations due to different backend
API errors, might return different unsupported errors at
our handler layer. Current code posed a problem for us because
this information was lost and we would convert it to InternalError
in this situation all S3 clients end up retrying the request.
To avoid this unexpected situation implement a way to support
this cleanly such that the underlying information is not lost
which is returned by gateway.
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.
Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.
We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.
This PR fixes all of the above items.
This PR also adds some comments and simplifies
the code. Primary handling is done to ensure
that we make sure to honor cached buffer.
Added unit tests as well
Fixes#7141
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.
But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
ListObjectParts is using xl.readXLMetaParts which picks the first
xl meta found in any disk, which is an inconsistent information.
E.g.: In a middle of a multipart upload, one node can go offline
and get back later with an outdated multipart information.
This commit fixes the computation of Before/After healing state
for empty directories.
Issues before the commit:
- Before state doesn't reflect the real status (no StatVol() called)
- For any MakeVol() error, healObjectDir is exited directly, which is
wrong.
Currently during a heal of a bucket, if one disk is offline an empty endpoint entry is added.
Then another entry with the missing endpoint is also added.
This results in more entries than disks being added.
Code that adds empty endpoint has been removed.
Collect historic cpu and mem stats. Also, use actual values
instead of formatted strings while returning to the client. The string
formatting prevents values from being processed by the server or
by the client without parsing it.
This change will allow the values to be processed (eg.
compute rolling-average over the lifetime of the minio server)
and offloads the formatting to the client.
We made a change previously in #7111 which moved support
for AWS envs only for AWS S3 endpoint. Some users requested
that this be added back to Non-AWS endpoints as well as
they require separate credentials for backend authentication
from security point of view.
More than one client can't use the same clientID for MQTT connection.
This causes problem in distributed deployments where config is shared
across nodes, as each Minio instance tries to connect to MQTT using the
same clientID.
This commit removes the clientID field in config, and allows
MQTT client to create random clientID for each node.
- New parser written from scratch, allows easier and complete parsing
of the full S3 Select SQL syntax. Parser definition is directly
provided by the AST defined for the SQL grammar.
- Bring support to parse and interpret SQL involving JSON path
expressions; evaluation of JSON path expressions will be
subsequently added.
- Bring automatic type inference and conversion for untyped
values (e.g. CSV data).
This situation happens only in gateway nas which supports
etcd based `config.json` to support all FS mode features.
The issue was we would try to migrate something which doesn't
exist when etcd is configured which leads to inconsistent
server configs in memory.
This PR fixes this situation by properly loading config after
initialization, avoiding backend disk config migration to be
done only if etcd is not configured.
If it does happen that we have a lot files in '.minio.sys/tmp',
minio startup might block deleting this folder. Rename and
delete in background instead to allow Minio to start serving
requests.
To avoid a large number of concurrent connections between minio
servers and to reduce CPU pressure, it is better to limit the number
of objects healed in parallel to number_of_CPUs.
Requirements like being able to run minio gateway in ec2
pointing to a Minio deployment wouldn't work properly
because IAM creds take precendence on ec2.
Add checks such that we only enable AWS specific features
if our backend URL points to actual AWS S3 not S3 compatible
endpoints.
Returning unexpected errors can cause problems for config handling,
which is what led gateway deployments with etcd to misbehave and
had stopped working properly
Deployment ID is not copied into new formats after healing format. Although,
this is not critical since a new deployment ID will be generated and set in the
next cluster restart, it is still much better if we don't change the deployment
id of a cluster for a better tracking.
Fix regexp matcher for special assets for the browser to clash with
less of the object namespace.
Assets should now be loaded with the /minio/ prefix. Previously,
favicon.ico (and others) could be loaded at any path matching
/minio/*/favicon.ico. This clashes with a large part of the object
namespace. With this change, /minio/favicon.ico will serve the favicon
but not /minio/mybucket/favicon.ico
Fixes#7077
This changes causes `getRootCAs` to always load system-wide CAs.
Any additional custom CAs (at `certs/CA/`) are added to the certificate pool
of system CAs.
The previous behavior was incorrect since all no system-wide CAs were
loaded if either there were CAs under `certs/CA` or the `certs/CA`
directory didn't exist at all.
Also add a cross compile script to test always cross
compilation for some well known platforms and architectures
, we support out of box compilation of these platforms even
if we don't make an official release build.
This script is to avoid regressions in this area when we
add platform dependent code.
This PR supports iam and bucket policies to have
policy variable replacements in resource and
condition key values.
For example
- ${aws:username}
- ${aws:userid}
Health checking programs very frequently use /minio/health/live
to check health, hence we can avoid doing StorageInfo() and
ListBuckets() for FS/Erasure backend.
* Use 0-byte file for bitrot verification of whole-file-bitrot files
Also pass the right checksum information for bitrot verification
* Copy xlMeta info from latest meta except []checksums and []Parts while healing
Simplify parallelReader.Read() which also fixes previous
implementation where it was returning before all the parallel
reading go-routines had terminated which caused race conditions.
When auto-encryption is turned on, we pro-actively add SSEHeader
for all PUT, POST operations. This is unusual for V2 signature
calculation because V2 signature doesn't have a pre-defined set
of signed headers in the request like V4 signature. According to
V2 we should canonicalize all incoming supported HTTP headers.
Make sure to validate signatures before we mutate http headers
Deprecate the use of Admin Peers concept and migrate all peer
communication to Notification subsystem. This finally allows
for a common subsystem for all peer notification in case of
distributed server deployments.
Before this change the CopyObjectHandler and the CopyObjectPartHandler
both looked for a `versionId` parameter on the `X-Amz-Copy-Source` URL
for the version of the object to be copied on the URL unescaped version
of the header. This meant that files that had question marks in were
truncated after the question mark so that files with `?` in their
names could not be server side copied.
After this change the URL unescaping is done during the parsing of the
`versionId` parameter which fixes the problem.
This change also introduces the same logic for the
`X-Amz-Copy-Source-Version-Id` header field which was previously
ignored, namely returning an error if it is present and not `null`
since minio does not currently support versions.
S3 Docs:
- https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
- https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html
When source is encrypted multipart object and the parts are not
evenly divisible by DARE package block size, target encrypted size
will not necessarily be the same as encrypted source object.
This commit removes old code preventing PUT requests with '/' as a path,
because this is not needed anymore after the introduction of the virtual
host style in Minio server code.
'PUT /' when global domain is not configured already returns 405 Method
Not Allowed http error.
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).
If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.
When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.
Fixes#6323, #6696
This is part of implementation for mc admin health command. The
ServerDrivesPerfInfo() admin API returns read and write speed
information for all the drives (local and remote) in a given Minio
server deployment.
Part of minio/mc#2606
It can happen with erroneous clients which do not send `Host:`
header until 4k worth of header bytes have been read. This can lead
to Peek() method of bufio to fail with ErrBufferFull.
To avoid this we should make sure that Peek buffer is as large as
our maxHeaderBytes count.
minio-java tests were failing under multiple places when
auto encryption was turned on, handle all the cases properly
This PR fixes
- CopyObject should decrypt ETag before it does if-match
- CopyObject should not try to preserve metadata of source
when rotating keys, unless explicitly asked by the user.
- We should not try to decrypt Compressed object etag, the
potential case was if user sets encryption headers along
with compression enabled.
Especially in gateway IAM admin APIs are not enabled
if etcd is not enabled, we should enable admin API though
but only enable IAM and Config APIs with etcd configured.
By default when we listen on all interfaces, we print all the
endpoints that at local to all interfaces including IPv6
addresses. Remove IPv6 addresses in endpoint list to be
printed in endpoints unless explicitly specified with '--address'
This commit adds an auto-encryption feature which allows
the Minio operator to ensure that uploaded objects are
always encrypted.
This change adds the `autoEncryption` configuration option
as part of the KMS conifguration and the ENV. variable
`MINIO_SSE_AUTO_ENCRYPTION:{on,off}`.
It also updates the KMS documentation according to the
changes.
Fixes#6502
Currently we use GetObject to check if we are allowed to list,
this might be a security problem since there are many users now
who actively disable a publicly readable listing, anyone who
can guess the browser URL can list the objects.
This PR turns off this behavior and provides a more expected way
based on the policies.
This PR also additionally improves the Download() object
implementation to use a more streamlined code.
These are precursor changes to facilitate federation and web
identity support in browser.
One user reported having discovered the following error:
API: SYSTEM()
Time: 20:06:17 UTC 12/06/2018
Error: xml: encoding "US-ASCII" declared but Decoder.CharsetReader is nil
1: cmd/handler-utils.go:43:cmd.parseLocationConstraint()
2: cmd/auth-handler.go:250:cmd.checkRequestAuthType()
3: cmd/bucket-handlers.go:411:cmd.objectAPIHandlers.PutBucketHandler()
4: cmd/api-router.go100cmd.(objectAPIHandlers).PutBucketHandler-fm()
5: net/http/server.go:1947:http.HandlerFunc.ServeHTTP()
Hence, adding support of different xml encoding. Although there
is no clear specification about it, even setting "GARBAGE" as an xml
encoding won't change the behavior of AWS, hence the encoding seems
to be ignored.
This commit will follow that behavior and will ignore encoding field
and consider all xml as utf8 encoded.
This refactors the vault configuration by moving the
vault-related environment variables to `environment.go`
(Other ENV should follow in the future to have a central
place for adding / handling ENV instead of magic constants
and handling across different files)
Further this commit adds master-key SSE-S3 support.
The operator can specify a SSE-S3 master key using
`MINIO_SSE_MASTER_KEY` which will be used as master key
to derive and encrypt per-object keys for SSE-S3
requests.
This commit is also a pre-condition for SSE-S3
auto-encyption support.
Fixes#6329
This PR implements one of the pending items in issue #6286
in S3 API a user can request CSV output for a JSON document
and a JSON output for a CSV document. This PR refactors
the code a little bit to bring this feature.
guessIsRPCReq() considers all POST requests as RPC but doesn't
check if this is an object operation API or not, which is actually
confusing bucket forwarder handler when it receives a new multipart
upload API which is a POST http request.
Due to this bug, users having a federated setup are not able to
upload a multipart object using an endpoint which doesn't actually
contain the specified bucket that will store the object.
Hence this commit will fix the described issue.
registering notFound handler more than once causes
gorilla mux to return error for all registered paths
greater than > 8. This might be a bug in the gorilla/mux
but we shouldn't be using it this way. NotFound handler
should be only registered once per root router.
Fixes#6915
When MINIO_PUBLIC_IPS is not specified and no endpoints are passed
as arguments, fallback to the address of non loop-back interfaces.
This is useful so users can avoid setting MINIO_PUBLIC_IPS in docker
or orchestration scripts, ince users naturally setup an internal
network that connects all instances.
This commit renames the env variable for vault namespaces
such that it begins with `MINIO_SSE_`. This is the prefix
for all Minio SSE related env. variables (like KMS).
clientID must be a unique `UUID` for each connections. Now, the
server generates it, rather considering the config.
Removing it as it is non-beneficial right now.
Fixes#6364
When migrating configs it happens often that some
servers fail to start due to version mismatch etc.
Hold a transaction lock such that all servers get
serialized.
This can create inconsistencies i.e Parts might have
lesser number of parts than ChecksumInfos. This will
result in object to be not readable.
This PR also allows for deleting previously created
corrupted objects.
globalMinioPort is used in federation which stores the address
and the port number of the server hosting the specified bucket,
this latter uses globalMinioPort but this latter is not set in
startup of the gateway mode.
This commit fixes the behavior.
Calling /minio/prometheuses/metrics calls xlSets.StorageInfo() which creates a new
storage REST client and closes it. However, currently, closing does nothing
to the underlying opened http client.
This commit introduces a closing behavior by calling CloseIdleConnections
provided by http.Transport upon the initialization of this latter.
This refactor brings a change which allows
targets to be added in a cleaner way and also
audit is now moved out.
This PR also simplifies logger dependency for auditing
The current code triggers a timeout to cleanup a heal seq from
healSeqMap, but we don't know if the user did or not launch a new
healing sequence with the same path.
Add endTime to healSequence struct and add a periodic heal-sequence
cleaner to remove heal sequences only if this latter is older than
10 minutes.
Rolling update doesn't work properly because Storage REST API has
a new API WriteAll() but without API version number increase.
Also be sure to return 404 for unknown http paths.
To conform with AWS S3 Spec on ETag for SSE-S3 encrypted objects,
encrypt client sent MD5Sum and store it on backend as ETag.Extend
this behavior to SSE-C encrypted objects.
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.
Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m42.464s
user 0m0.071s
sys 0m0.010s
```
With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m17.603s
user 0m0.093s
sys 0m0.008s
```
Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.
```
benchcmp old new
benchmark old ns/op new ns/op delta
BenchmarkSQLAggregate_100K-4 551213 259782 -52.87%
BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16%
BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42%
BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00%
benchmark old allocs new allocs delta
BenchmarkSQLAggregate_100K-4 2366 485 -79.50%
BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77%
BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70%
BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52%
benchmark old bytes new bytes delta
BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93%
BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64%
BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53%
BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36%
```
xl.json is the source of truth for all erasure
coded objects, without which we won't be able to
read the objects properly. This PR enables sync
mode for writing `xl.json` such all writes go hit
the disk and are persistent under situations such
as abrupt power failures on servers running Minio.
This change will allow users to enter the endpoint of the
storage account if this latter belongs to a different Azure
cloud environment, such as US gov cloud.
e.g:
`MINIO_ACCESS_KEY=testaccount \
MINIO_SECRET_KEY=accountsecretkey \
minio gateway azure https://testaccount.blob.usgovcloudapi.net`
In many situations, while testing we encounter
ErrInternalError, to reduce logging we have
removed logging from quite a few places which
is acceptable but when ErrInternalError occurs
we should have a facility to log the corresponding
error, this helps to debug Minio server.
Multipart object final size is not a contiguous
encrypted object representation, so trying to
decrypt this size will lead to an error in some
cases. The multipart object should be detected first
and then decoded with its respective parts instead.
This PR handles this situation properly, added a
test as well to detect these in the future.
This commit adds key-rotation for SSE-S3 objects.
To execute a key-rotation a SSE-S3 client must
- specify the `X-Amz-Server-Side-Encryption: AES256` header
for the destination
- The source == destination for the COPY operation.
Fixes#6754
This PR adds support
- Request query params
- Request headers
- Response headers
AuditLogEntry is exported and versioned as well
starting with this PR.
On Windows erasure coding setup if
```
~ minio server V:\ W:\ X:\ Z:\
```
is not possible due to NTFS creating couple of
hidden folders, this PR allows minio to use
the entire drive.
Execute method in s3Select package makes a response.WriteHeader call.
Not calling it again in SelectObjectContentHandler function in case of
error in s3Select.Execute call.
On a heavily loaded server, getBucketInfo() becomes slow,
one can easily observe deleting an object causes many
additional network calls.
This PR is to let the underlying call return the actual
error and write it back to the client.
This PR supports two models for etcd certs
- Client-to-server transport security with HTTPS
- Client-to-server authentication with HTTPS client certificates
Endpoint comparisons blindly without looking
if its local is wrong because the actual drive
for a local disk is always going to provide just
the path without the HTTP endpoint.
Add code such that this is taken care properly in
all situations. Without this PR HealBucket() would
wrongly conclude that the healing doesn't have quorum
when there are larger number of local disks involved.
Fixes#6703
Current master didn't support CopyObjectPart when source
was encrypted, this PR fixes this by allowing range
CopySource decryption at different sequence numbers.
Fixes#6698
This PR fixes
- The target object should be compressed even if the
source object is not compressed.
- The actual size for an encrypted object should be the
`decryptedSize`
This commit fixes a wrong assignment to `actualPartSize`.
The `actualPartSize` for an encrypted src object is not `srcInfo.Size`
because that's the encrypted object size which is larger than the
actual object size. So the actual part size for an encrypted
object is the decrypted size of `srcInfo.Size`.
Without this fix we have room for two different type of
errors.
- Source is encrypted and we didn't provide any source encryption keys
This results in Incomplete body error to be returned back to the client
since source is encrypted and we gave the reader as is to the object
layer which was of a decrypted value leading to "IncompleteBody"
- Source is not encrypted and we provided source encryption keys.
This results in a corrupted object on the destination which is
considered encrypted but cannot be read by the server and returns
the following error.
```
<Error><Code>XMinioObjectTampered</Code><Message>The requested object
was modified and may be compromised</Message><Resource>/id-platform-gamma/
</Resource><RequestId>155EDC3E86BFD4DA</RequestId><HostId>3L137</HostId>
</Error>
```
This commit fixes a regression introduced in f187a16962
the regression returned AccessDenied when a client is trying to create an empty
directory on a existing prefix, though it should return 200 OK to be close as
much as possible to S3 specification.
Since refactoring to GetObjectNInfo style, there are many cases
when i/o closed pipe is printed like, downloading an object
with wrong encryption key. This PR removes the log.
This commit moves the check that SSE-C requests
must be made over TLS into a generic HTTP handler.
Since the HTTP server uses custom TCP connection handling
it is not possible to use `http.Request.TLS` to check
for TLS connections. So using `globalIsSSL` is the only
option to detect whether the request is made over TLS.
By extracting this check into a separate handler it's possible
to refactor other parts of the SSE handling code further.
This commit adds two functions for sealing/unsealing the
etag (a.k.a. content MD5) in case of SSE single-part upload.
Sealing the ETag is neccessary in case of SSE-S3 to preserve
the security guarantees. In case of SSE-S3 AWS returns the
content-MD5 of the plaintext object as ETag. However, we
must not store the MD5 of the plaintext for encrypted objects.
Otherwise it becomes possible for an attacker to detect
equal/non-equal encrypted objects. Therefore we encrypt
the ETag before storing on the backend. But we only need
to encrypt the ETag (content-MD5) if the client send it -
otherwise the client cannot verify it anyway.
CopyObject handler forgot to remove multipart encryption flag in metadata
when source is an encrypted multipart object and the target is also encrypted
but single part object.
This PR also simplifies the code to facilitate review.
This PR brings an additional logger implementation
called AuditLog which logs to http targets
The intention is to use AuditLog to log all incoming
requests, this is used as a mechanism by external log
collection entities for processing Minio requests.
This PR introduces two new features
- AWS STS compatible STS API named AssumeRoleWithClientGrants
```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```
This API endpoint returns temporary access credentials, access
tokens signature types supported by this API
- RSA keys
- ECDSA keys
Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.
- External policy engine support, in this case OPA policy engine
- Credentials are stored on disks
- Only require len(disks)/2 to initialize the cluster
- Fix checking of read/write quorm in subsystems init
- Add retry mechanism in policy and notification to avoid aborting in case of read/write quorums errors
in xl.PutObjectPart call, prepareFile detected an error when the storage
is exhausted but we were returning the wrong error.
With this commit, users can see the correct error message when their disks
become full.
Simplify the logic of using rename() in xl. Currently, renaming
doesn't require the source object/dir to be existent in at least
read quorum disks, since there is no apparent reason for it
to be as a general rule, this commit will just simplify the
logic to avoid possible inconsistency in the backend in the future.
This is a major regression introduced in this commit
ce02ab613d is the first bad commit
commit ce02ab613d
Author: Krishna Srinivas <634494+krishnasrinivas@users.noreply.github.com>
Date: Mon Aug 6 15:14:08 2018 -0700
Simplify erasure code by separating bitrot from erasure code (#5959)
:040000 040000 794f58d82ad2201ebfc8 M cmd
This effects all distributed server deployments since this commit
All the following releases are affected
- RELEASE.2018-09-25T21-34-43Z
- RELEASE.2018-09-12T18-49-56Z
- RELEASE.2018-09-11T01-39-21Z
- RELEASE.2018-09-01T00-38-25Z
- RELEASE.2018-08-25T01-56-38Z
- RELEASE.2018-08-21T00-37-20Z
- RELEASE.2018-08-18T03-49-57Z
Thanks to Anis for reproducing the issue
Without this PR minio server is writing an erroneous
response to clients on an idle connections which ends
up printing following message
```
Unsolicited response received on idle HTTP channel
```
This PR would avoid sending responses on idle connections
.i.e routine network errors.
In XL PutObject & CompleteMultipartUpload, the existing object is renamed
to the temporary directory before checking if worm is enabled or not.
Most of the times, this doesn't cause an issue unless two uploads to the
same location occurs at the same time. Since there is no locking in object
handlers, both uploads will reach XL layer. The second client acquiring
write lock in put object or complete upload in XL will rename the object
to the temporary directory before doing the check and returning the error (wrong!).
This commit fixes then the behavior: no rename to temporary directory if
worm is enabled.
Current implementation simply uses all the memory locally
and crashes when a large upload is initiated using Minio
browser UI.
This PR uploads stream in blocks and finally commits the blocks
allowing for low memory footprint while uploading large objects
through Minio browser UI.
This PR also adds ETag compatibility for single PUT operations.
Fixes#6542Fixes#6550
This to ensure that we heal all entries in config/
prefix, we will have IAM and STS related files which
are being introduced in #6168 PR
This is a change to ensure that we heal all of them
properly, not just `config.json`
go test shows the following warning:
```
WARNING: DATA RACE
Write at 0x000002909e18 by goroutine 276:
github.com/minio/minio/cmd.testAdminCmdRunnerSignalService()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-rpc_test.go:44 +0x94
Previous read at 0x000002909e18 by goroutine 194:
github.com/minio/minio/cmd.testServiceSignalReceiver()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-handlers_test.go:467 +0x70
```
The reason for this data race is that some admin tests are not waiting for go routines
that they created to be properly exited, which triggers the race detector.
When download profiling data API fails to gather profiling data
from all nodes for any reason (including profiler not enabled),
return 400 http code with the appropriate json message.
This commit adds two functions for removing
confidential information - like SSE-C keys -
from HTTP headers / object metadata.
This creates a central point grouping all
headers/entries which must be filtered / removed.
See also https://github.com/minio/minio/pull/6489#discussion_r219797993
of #6489
The new call combines GetObjectInfo and GetObject, and returns an
object with a ReadCloser interface.
Also adds a number of end-to-end encryption tests at the handler
level.
Allow minio s3 gateway to use aws environment credentials,
IAM instance credentials, or AWS file credentials.
If AWS_ACCESS_KEY_ID, AWS_SECRET_ACCSES_KEY are set,
or minio is running on an ec2 instance with IAM instance credentials,
or there is a file $HOME/.aws/credentials, minio running as an S3
gateway will authenticate with AWS S3 using those one of credentials.
The lookup order:
1. AWS environment varaibles
2. IAM instance credentials
3. $HOME/.aws/credentials
4. minio environment variables
To authenticate with the minio gateway, you will always use the
minio environment variables MINIO_ACCESS_KEY MINIO_SECRET_KEY.
Two handlers are added to admin API to enable profiling and disable
profiling of a server in a standalone mode, or all nodes in the
distributed mode.
/minio/admin/profiling/start/{cpu,block,mem}:
- Start profiling and return starting JSON results, e.g. one
node is offline.
/minio/admin/profiling/download:
- Stop the on-going profiling task
- Stream a zip file which contains all profiling files that can
be later inspected by go tool pprof
The test TestServerTLSCiphers seems to fail sometimes for
no obvious reason. Actually the test is not needed
(as unit test) since minio/mint tests the server's TLS ciphers
as part of its security tests.
Fixes#5977
Currently, one node in a cluster can fail to boot with the following error message:
```
ERROR Unable to initialize config system: Storage resources are insufficient for the write operation
```
This happens when disks are formatted, read quorum is met but write
quorum is not met. In checkServerConfig(), a insufficient read quorum
error is replaced by errConfigNotFound, the code will generate a
new config json and try to save it, but it will fail because write
quorum is not met.
Replacing read quorum with errConfigNotFound is also wrong because it
can lead, in rare cases, to overwrite the config set by the user.
So, this commit adds a retry mechanism in configuration initialization
to retry only with read or write quorum errors.
This commit will also fix the following cases:
- Read quorum is lost just after the initialization of the object layer.
- Write quorum not met when upgrading configuration version.
ReadFile RPC input argument has been changed in commit a8f5939452959d27674560c6b803daa9,
however, RPC doesn't detect such a change when it calls other nodes with older versions.
Hence, bumping RPC version.
Fixes#6458
It was expected that in gateway mode, we do not know
the backend types whereas in NAS gateway since its
an extension of FS mode (standalone) this leads to
an issue in LivenessCheckHandler() which would perpetually
return 503, this would affect all kubernetes, openshift
deployments of NAS gateway.
This commit fixes an AWS S3 incompatibility issue.
The AccessKeyID may contain one or more `/` which caused
the server to interpret parts of the AccessKeyID as
other `X-Amz-Credential` parameters (like date, region, ...)
This commit fixes this by allowing 5 or more
`X-Amz-Credential` parameter strings and only interpreting
the last 5.
Fixes#6443
This commit will print connection failures to other disks in other nodes
after 5 retries. It is useful for users to understand why the
distribued cluster fails to boot up.
Enhance a little bit the error message that is showing
when access & secret keys are not specified in the
environment when running Minio in gateway and server mode.
This commit also removes a redundant check of access/secret keys.
This commit fixes the Manta gateway client creation flow. We now affix
the endpoint scheme with endpoint URL while creating the Manta client
for gateway.
Also add steps in Manta gateway docs on how to run with custom Manta
endpoint.
Fixes#6408
This commit fixes are regression in the server regarding
handling SSE requests with wrong SSE-C keys.
The server now returns an AWS S3 compatable API error (access denied)
in case of the SSE key does not match the secret key used during upload.
Fixes#6431
This PR adds two new admin APIs in Minio server and madmin package:
- GetConfigKeys(keys []string) ([]byte, error)
- SetConfigKeys(params map[string]string) (err error)
A key is a path in Minio configuration file, (e.g. notify.webhook.1)
The user will always send a string value when setting it in the config file,
the API will know how to convert the value to the appropriate type. The user
is also able to set a raw json.
Before setting a new config, Minio will validate all fields and try to connect
to notification targets if available.
Currently Go http connection pool was not being properly
utilized leading to degrading performance as the number
of concurrent requests increased.
As recommended by Go implementation, we have to drain the
response body and close it.
Removing an empty directory is not working because of xl.DeleteObject()
was only checking if the passed prefix is an actual object but it
should also check if it is an empty directory.
soMaxConn value is 128 on almost all linux systems,
this value is too low for Minio at times when used
against large concurrent workload e.g: spark applications
this causes a sort of SYN flooding observed by the kernel
to allow for large backlog increase this value to 2048.
With this value we do not see anymore SYN flooding
kernel messages.
ListMultipartUploads implementation is meant for docker-registry
use-case only. It lists only the first upload with a prefix matching
the object being uploaded.
* Revert "Encrypted reader wrapped in NewGetObjectReader should be closed (#6383)"
This reverts commit 53a0bbeb5b.
* Revert "Change SelectAPI to use new GetObjectNInfo API (#6373)"
This reverts commit 5b05df215a.
* Revert "Implement GetObjectNInfo object layer call (#6290)"
This reverts commit e6d740ce09.
An issue was reproduced when minio-js client functional
tests are setting lower case http headers, in our current
master branch we specifically look for canonical host header
which may be not necessarily true for all http clients.
This leads to a perpetual hang on the *net.Conn*.
This PR fixes regression caused by #6206 by handling the
case insensitivity.
This combines calling GetObjectInfo and GetObject while returning a
io.ReadCloser for the object's body. This allows the two operations to
be under a single lock, fixing a race between getting object info and
reading the object body.
One typo introduced in a recent commit miscalculates if worm and browser
are enabled or not. A simple test is also added to detect this issue
in the future if it ever happens again.
In current master when you do `mc watch` you can see a
dynamic ARN being listed which exposes the remote IP as well
```
mc watch play/airlines
```
On another terminal
```
mc admin info play
● play.minio.io:9000
Uptime : online since 11 hours ago
Version : 2018-08-22T07:50:45Z
Region :
SQS ARNs : arn:minio:sqs::httpclient+51c39c3f-131d-42d9-b212-c5eb1450b9ee+73.222.245.195:33408
Stats : Incoming 30GiB, Outgoing 7.6GiB
Storage : Used 7.7GiB
```
SQS ARNs listed as part of ServerInfo should be only external targets,
since listing an ARN here is not useful and it cannot be re-purposed in
any manner.
This PR fixes this issue by filtering out httpclient from the ARN list.
This is a regression introduced in #52940e4431725c
This commit adds error handling for SSE-KMS requests to
HEAD, GET, PUT and COPY operations. The server responds
with `not implemented` if a client sends a SSE-KMS
request.
This package provide customizable TCP net.Listener with various
performance-related options:
* SO_REUSEPORT. This option allows linear scaling server performance
on multi-CPU servers.
See https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/ for details.
* TCP_DEFER_ACCEPT. This option expects the server reads from the accepted
connection before writing to them.
* TCP_FASTOPEN. See https://lwn.net/Articles/508865/ for details.
Add support for sse-s3 encryption with vault as KMS.
Also refactoring code to make use of headers and functions defined in
crypto package and clean up duplicated code.
This PR fixes a regression introduced in 8eb838bf91
where hashing technique was used on prefixes to get the right set
to perform the operation, this is not correct since prefixes and
their corresponding keys might hash to a different value depending
on the key length.
For prefixes/directories we should look everywhere to support proper
quorum based listing.
Fixes#6293
This PR is the first set of changes to move the config
to the backend, the changes use the existing `config.json`
allows it to be migrated such that we can save it in on
backend disks.
In future releases, we will slowly migrate out of the
current architecture.
Fixes#6182
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.
Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf,
which will print on any error.
Fixes#5997
This commit adds support for detecting SSE-KMS headers.
The server should be able to detect SSE-KMS headers to
at least fail such S3 requests with not implemented.
When a S3 client sends a GET Object with a range header, 206 http
code is returned indicating success, however the call of the object
layer's GetObject() inside the handler can return an error and will lead
to writing an XML error message, which is obviously wrong since
we already sent 206 http code. So in the case, we just stop sending
data to the S3 client, this latter can still detect if there is no
error when comparing received data with Content-Length header
in the Get Object response.
ANSI colors do not work on dumb terminals, in situations
when minio is running as a service under systemd.
This PR ensures we turn off color in those situations.
This is to avoid serializing RPC contention on ongoing
parallel operations, the blocking profile indicating
all calls were being serialized through setRetryTicker.
This commit adds the crypto.* errors to the
`toAPIErrorCode` switch. Further this commit adds an S3
API error code returned whenever the client specifes a
SSE-S3 request with an invalid algorithm parameter.
Fixes#6238
globalPolicySys used to be initialized in fs/xl layer. The referenced
commit moved this logic to server/gateway initialization,but a check
to avoid double initialization prevented globalPolicySys to be loaded
from disk for NAS.
fixes regression from commit be1700f595
No locks are ever left in memory, we also
have a periodic interval of clearing stale locks
anyways. The lock instrumentation was not complete
and was seldom used.
Deprecate this for now and bring it back later if
it is really needed. This also in-turn seems to improve
performance slightly.
POST mime/multipart upload style can have filename value optional
which leads to implementation issues in Go releases in their
standard mime/multipart library.
When `filename` doesn't exist Go doesn't update `form.File` which
we rely on to extract the incoming file data, strangely when `filename`
is not specified this data is buffered in memory and is now part of
`form.Value` instead of `form.File` which creates an inconsistent
behavior.
This PR tries to fix this in our code for the time being, but ideal PR
would be to fix the upstream mime/multipart library to handle the
above situation consistently.
This commit adds a `fmt.Stringer` implementation for
SSE-S3 and SSE-C. The string representation is the
domain used for object key sealing.
See: `ObjectKey.Seal(...)` and `ObjectKey.Unseal(...)`
Continuing from PR 157ed65c35
Our posix.go implementation did not handle I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.
This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.
This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
* crypto: add support for parsing SSE-C/SSE-S3 metadata
This commit adds support for detecting and parsing
SSE-C/SSE-S3 object metadata. With the `IsEncrypted`
functions it is possible to determine whether an object
seems to be encrypted. With the `ParseMetadata` functions
it is possible to validate such metadata and extract the
SSE-C/SSE-S3 related values.
It also fixes some naming issues.
* crypto: add functions for creating SSE object metadata
This commit adds functions for creating SSE-S3 and
SSE-C metadata. It also adds a `CreateMultipartMetadata`
for creating multipart metadata.
For all functions unit tests are included.
Since implementing `pwrite` like implementation would
require a more complex code than background append
implementation, it is better to keep the current code
as is and not implement `pwrite` based functionality.
Closes#4881
Healthcheck handler in current implementation was
performing ListBuckets() to check for liveness of Minio
service. ListBuckets() implementation on the other hand
doesn't do quorum based listing and if one of the disks
returned error, an I/O error it would be lead to kubernetes
taking the minio pod down prematurely even if the disk
is not local to that minio server.
The reason is ListBuckets() call cannot be trusted to
provide us the valid information that we need, Minio is a
clustered application which is designed to handle disk
failures. Error on one of the disks doesn't mean the pod
should become fully non-operational.
This PR attempts to fix this by only checking for alive
disks which are local to each setup and also by simply
performing a Stat() operation, if the Stat() returned
error on all disks local to a particular server then
we can let kubernetes safely take it down, until then
we should be operational.
The current code for deleting 1000 objects simultaneously
causes significant random I/O, which on slower drives
leads to servers disconnecting in a distributed setup.
Simplify this by serially deleting and reducing the
chattiness of this operation.