This commit fixes a DoS issue that is caused by an incorrect
SHA-256 content verification during STS requests.
Before that fix clients could write arbitrary many bytes
to the server memory. This commit fixes this by limiting the
request body size.
This change adds admin APIs and IAM subsystem APIs to:
- add or remove members to a group (group addition and deletion is
implicit on add and remove)
- enable/disable a group
- list and fetch group info
When checking if federation is necessary, the code compares
the SRV record stored in etcd against the list of endpoints
that the MinIO server is exposing. If there is an intersection
in this list the request is forwarded.
The SRV record includes both the host and the port, but the
intersection check previously only looked at the IP address. This
would prevent federation from working in situations where the endpoint
IP is the same for multiple MinIO servers. Some examples of where this
can occur are:
- running mulitiple copies of MinIO on the same host
- using multiple MinIO servers behind a NAT with port-forwarding
This commit adds a new method `UpdateKey` to the KMS
interface.
The purpose of `UpdateKey` is to re-wrap an encrypted
data key (the key generated & encrypted with a master key by e.g.
Vault).
For example, consider Vault with a master key ID: `master-key-1`
and an encrypted data key `E(dk)` for a particular object. The
data key `dk` has been generated randomly when the object was created.
Now, the KMS operator may "rotate" the master key `master-key-1`.
However, the KMS cannot forget the "old" value of that master key
since there is still an object that requires `dk`, and therefore,
the `D(E(dk))`.
With the `UpdateKey` method call MinIO can ask the KMS to decrypt
`E(dk)` with the old key (internally) and re-encrypted `dk` with
the new master key value: `E'(dk)`.
However, this operation only works for the same master key ID.
When rotating the data key (replacing it with a new one) then
we perform a `UnsealKey` operation with the 1st master key ID
and then a `GenerateKey` operation with the 2nd master key ID.
This commit also updates the KMS documentation and removes
the `encrypt` policy entry (we don't use `encrypt`) and
add a policy entry for `rewarp`.
After some extensive refactors, it turned out empty directories
are not healed and heal status is also not reported correctly.
This commit fixes it and adds the appropriate unit tests
This PR fixes relying on r.Context().Done()
by setting
```
Connection: "close"
```
HTTP Header, this has detrimental issues for
client side connection pooling. Since this
header explicitly tells clients to turn-off
connection pooling. This causing pro-active
connections to be closed leaving many conn's
in TIME_WAIT state. This can be observed with
`mc admin trace -a` when running distributed
setup.
This PR also fixes tracing filtering issue
when bucket names have `minio` as prefixes,
trace was erroneously ignoring them.
Golang proactively prints this error
`http: proxy error: context canceled`
when a request arrived to the current deployment and
redirected to another deployment in a federated setup.
Since this error can confuse users, this commit will
just hide it.
Allow renaming/editing a notification config. By replying with
a successful GetBucketNotification response, without checking
for any missing config ARN in targetList.
Fixes#7650
Related to #7982, this PR refactors the code
such that we validate the OPA or JWKS in a
common place.
This is also a refactor which is already done
in the new config migration change. Attempt
to avoid any network I/O during Unmarshal of
JSON from disk, instead do it later when
updating the in-memory data structure.
In situations such as when client uploading data,
prematurely disconnects from server such as pressing
ctrl-c before uploading all the data. Under this
situation in distributed setup we prematurely
disconnect disks causing a reconnect loop. This has
an adverse affect we end up leaving a lot of files
in temporary location which ideally should have been
cleaned up when Put() prematurely fails.
This is also a regression which got introduced in #7610
- Policy mapping is now at `config/iam/policydb/users/myuser1.json`
and includes version.
- User identity file is now versioned.
- Migrate old data to the new format.
Calling ListMultipartUploads fails if an upload is aborted while a
part is being uploaded because the directory for the upload exists
(since fsRenameFile ends up calling os.MkdirAll) but the meta JSON file
doesn't. To fix this we make sure an upload hasn't been aborted during
PutObjectPart by checking the existence of the directory for the upload
while moving the temporary part file into it.
This PR is based off @sinhaashish's PR for object lifecycle
management, which includes support only for,
- Expiration of object
- Filter using object prefix (_not_ object tags)
N B the code for actual expiration of objects will be included in a
subsequent PR.
Current code already allows users to GetPolicy/SetPolicy
there was a missing code in ListAllBucketPolicies to allow
access, this fixes this behavior.
Fixes#7913
The fix in #7646 introduced a regression which
was left unnoticed, the fix didn't work for
sub-commands unfortunately. This fixes it
by moving v1.21.0 version of the minio/cli
package.
Fixes#7924
posix.VerifyFile() doesn't know how to check if a file
is corrupted if that file is empty. We do have the part
size in xl.json so we pass it to VerifyFile to return
an error so healing empty parts can work properly.
When MinIO is behind a proxy, proxies end up killing
clients when no data is seen on the connection, adding
the right content-type ensures that proxies do not come
in the way.
Update prompt shows some weird characters under Windows, the reason
is that go-prompt is used to show a yes/no prompt, since go-prompt
does not seem to have a way to support color/fatih, this PR will
implements its own yes/no prompt with the correct text coloration.
This allows for canonicalization of the strings
throughout our code and provides a common space
for all these constants to reside.
This list is rather non-exhaustive but captures
all the headers used in AWS S3 API operations
r.URL.Path is empty when HEAD bucket with virtual
DNS requests come in since bucket is now part of
r.Host, we should use our domain names and fetch
the right bucket/object names.
This fixes an really old issue in our federation
setups.
This API returns the information related to the self healing routine.
For the moment, it returns:
- The total number of objects that are scanned
- The last time when an item was scanned
- return all objects in web-handlers listObjects response
- added local pagination to object list ui
- also fixed infinite loader and removed unused fields
While checking for port availability, ip address should be included.
When a machine has multiple ip addresses, multiple minio instances
or some other applications can be run on same port but different
ip address.
Fixes#7685
This PR adds support for adding session policies
for further restrictions on STS credentials, useful
in situations when applications want to generate
creds for multiple interested parties with different
set of policy restrictions.
This session policy is not mandatory, but optional.
Fixes#7732
This commit relaxes the restriction that the MinIO gateway
does not accept SSE-KMS headers. Now, the S3 gateway allows
SSE-KMS headers for PUT and MULTIPART PUT requests and forwards them
to the S3 gateway backend (AWS). This is considered SSE pass-through
mode.
Fixes#7753
Previously the read/write lock applied both for gateway use cases as
well the object store use case. Nothing from sys is touched or looked
at in the gateway usecase though, so we don't need to lock. Don't lock
to make the gateway policy getting a little more efficient, particularly
as where this is called from (checkRequestAuthType) is quite common.
etcd when used in federated setups, currently
mandates that all clusters should have same
config.json, which is too restrictive and makes
federation a restrictive environment.
This change makes it apparent that each cluster
needs to be independently managed if necessary
from `mc admin info` command line.
Each cluster with in federation can have their
own root credentials and as well as separate
regions. This way buckets get further restrictions
and allows for root creds to be not common
across clusters/data centers.
Existing data in etcd gets migrated to backend
on each clusters, upon start. Once done
users can change their config entries
independently.
- Background Heal routine receives heal requests from a channel, either to
heal format, buckets or objects
- Daily sweeper lists all objects in all buckets, these objects
don't necessarly have read quorum so they can be removed if
these objects are unhealable
- Heal daily ops receives objects from the daily sweeper
and send them to the heal routine.
Inconsistencies can arise after applying bucket policies in
gateway mode, since all gateway instances do not share a
common shared state. This is by design to keep gateway as
shared nothing architecture.
This PR fixes such inconsistencies by reloading policy
if any from the backend.
Fixes#7723
Consider errors returned by httpClient.Do() as network errors. This is because
the http clients returns different types of errors and it is hard to catch
all the error types.
With these changes we are now able to peak performances
for all Write() operations across disks HDD and NVMe.
Also adds readahead for disk reads, which also increases
performance for reads by 3x.
IsTruncated should not be set to true if there is no further
possible entries beyond maxKeys.
This commit will also move wide testing on object API from xl
to xl sets.
The problem in current code was we were removing
an entry from a lock lockerMap without considering
the fact that different entry for same resource is
a possibility due the nature of locks that can be
acquired in parallel before we decide if the lock
is considered stale
A sequence of events is as follows
- Lock("resource")
- lockMaintenance(finds a long lived lock in this "resource")
- Owner node rebooted which now retruns Expired() as true for
this "resource"
- Unlock("resource") which succeeded in quorum
- Now by this time application retried and acquired a new
Lock() on the same "resource"
- Now that we have Expired() true from the previous call,
we proceed to purge the entry from the local lockMap()
local lockMap reports a different entry for the expired
UID which results in a spurious log entry.
This PR removes this logging as this situation is an
expected scenario.
This will allow cache to consistently work for
server and gateways. Range GET requests will
be cached in the background after the request
is served from the backend.
Fixes: #7458, #7573, #6265, #6630
This is necessary to avoid connection build up between servers
unexpectedly for example in a situation where 16 servers are
talking to each other and one server now allows a maximum of
15*4096 = 61440 idle connections
Will be kept in pool. Such a large pool is perhaps inefficient for
many reasons and also affects overall system resources.
This PR also reduces idleConnection timeout from 120 secs to 60 secs.
errs was passed to many goroutines but they are all allowed
to update errs if any error happens during deletion, which
can cause a data race.
This commit will avoid issuing bulk delete operations in parallel
to avoid the warning race.
We broke into parts previously as we had checksum for the entire file
to tax less on memory and to have better TTFB. We dont need to now,
after the streaming-bitrot change.
Bulk delete at storage level in Multiple Delete Objects API
In order to accelerate bulk delete in Multiple Delete objects API,
a new bulk delete is introduced in storage layer, which will accept
a list of objects to delete rather than only one. Consequently,
a new API is also need to be added to Object API.
PR #7595 fixed part of the regression, but did not handle the
scenario, where in docker, the internal port is different from
the port on the host.
This PR modifies the regular expression such that all the
scenarios are handled.
Fixes#7619
After recent listing refactor, recursive list doesn't return empty
directories, this commit will fix the behavior and add unit tests
so it won't happen again.
When size is unknown and auto encryption is enabled,
and compression is set to true, putobject API is failing.
Moving adding the SSE-S3 header as part of the request to before
checking if compression can be done, otherwise the size is set to -1
and that seems to cause problems.
Currently we used to reload users every five minutes,
regardless of etcd is configured or not. But with etcd
configured we can do this more asynchronously to trigger
a refresh by using the watch API
Fixes#7515
One user has seen this following error log:
API: CompleteMultipartUpload(bucket=vertica, object=perf-dss-v03/cc2/02596813aecd4e476d810148586c2a3300d00000013557ef_0.gt)
Time: 15:44:07 UTC 04/11/2019
RequestID: 159475EFF4DEDFFB
RemoteHost: 172.26.87.184
UserAgent: vertica-v9.1.1-5
Error: open /data/.minio.sys/tmp/100bb3ec-6c0d-4a37-8b36-65241050eb02/xl.json: file exists
1: cmd/xl-v1-metadata.go:448:cmd.writeXLMetadata()
2: cmd/xl-v1-metadata.go:501:cmd.writeUniqueXLMetadata.func1()
This can happen when CompleteMultipartUpload fails with write quorum,
the S3 client will retry (since write quorum is 500 http response),
however the second call of CompleteMultipartUpload will fail because
this latter doesn't truly use a random uuid under .minio.sys/tmp/
directory but pick the upload id.
This commit fixes the behavior to choose a random uuid for generating
xl.json
Since AssumeRole API was introduced we have a wrong route
match which results in certain clients failing to upload objects
using multipart because, multipart POST conflicts with STS POST
AssumeRole API.
Write a proper matcher function which verifies the route more
appropriately such that both can co-exist.
Other listing optimizations include
- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
Allow server to start if one of the local nodes in docker/kubernetes setup is successfully resolved
- The rule is that we need atleast one local node to work. We dont need to resolve the
rest at that point.
- In a non-orchestrational setup, we fail if we do not have atleast one local node up
and running.
- In an orchestrational setup (docker-swarm and kubernetes), We retry with a sleep of 5
seconds until any one local node shows up.
Fixes#6995
In distributed mode, use REST API to acquire and manage locks instead
of RPC.
RPC has been completely removed from MinIO source.
Since we are moving from RPC to REST, we cannot use rolling upgrades as the
nodes that have not yet been upgraded cannot talk to the ones that have
been upgraded.
We expect all minio processes on all nodes to be stopped and then the
upgrade process to be completed.
Also force http1.1 for inter-node communication
common prefixes in bucket name if already created
are disallowed when etcd is configured due to the
prefix matching issue. Make sure that when we look
for bucket we are only interested in exact bucket
name not the prefix.
- [x] Support bucket and regular object operations
- [x] Supports Select API on HDFS
- [x] Implement multipart API support
- [x] Completion of ListObjects support
There is no written specification about how to encode key names
when url encoding type is passed.
However, this change will encode URLs as url.QueryEscape() does
while considering AWS S3 exceptions.
This commit adds a unit test for the vault
config verification (which covers also `IsEmpty()`).
Vault-related code is hard to test with unit tests
since a Vault service would be necessary. Therefore
this commit only adds tests for a fraction of the code.
Fixes#7409
Most hadoop distributions hortonworks, cloudera all
depend on aws-sdk-java 1.7.x to 1.10.x - the releases
which have bugs related case sensitive check for
ETag header. Go changes the case of the headers set
to be canonical but only preserves them when set
through a direct map.
This fixes most compatibility issues we have had
in the past supporting older hadoop distributions.
This commit fixes a privilege escalation issue against
the S3 and web handlers. An authenticated IAM user
can:
- Read from or write to the internal '.minio.sys'
bucket by simply sending a properly signed
S3 GET or PUT request. Further, the user can
- Read from or write to the internal '.minio.sys'
bucket using the 'Upload'/'Download'/'DownloadZIP'
API by sending a "browser" request authenticated
with its JWT token.
This commit fixes another privilege escalation issue
abusing the inter-node communication of distributed
servers to obtain/modify the server configuration.
The inter-node communication is authenticated using
JWT-Tokens. Further, IAM users accessing the cluster
via the web UI also get a JWT token and the browser
will add this "user" JWT token to each the request.
Now, a user can extract that JWT token an can craft
HTTP POST requests for the inter-node communication
API endpoint. Since the server accepts ANY valid
JWT token it also accepts inter-node commands from
an authenticated user such that the user can execute
arbitrary commands bypassing the IAM policy engine
and impersonate other users, change its own IAM policy
or extract the admin access/secret key.
This is fixed by only accepting "admin" JWT tokens
(tokens containing the admin access key - and therefore
were generated with the admin secret key). Consequently,
only the admin user can execute such inter-node commands.
Simplify the cmd/http package overall by removing
custom plain text v/s tls connection detection, by
migrating to go1.12 and choose minimum version
to be go1.12
Also remove all the vendored deps, since they
are not useful anymore.
A race is detected between a bytes.Buffer generated with cmd/rpc.Pool
and http2 module. An issue is raised in golang (https://github.com/golang/go/issues/31192).
Meanwhile, this commit disables Pool in RPC code and it generates a
new 1kb of bytes.Buffer for each RPC call.
Before this commit, nodes wait indefinitely without showing any
indicate error message when a node is started with different access
and secret keys.
This PR will show '401 Unauthorized' in this case.
Currently message is set to error type value.
Message field is not used in error logs. it is used only in the case of info logs.
This PR sets error message field to store error type correctly.
Copying an encrypted SSEC object when this latter is uploaded using
multipart mechanism was failing because ETag in case of encrypted
multipart upload is not encrypted.
This PR fixes the behavior.
This fixes varying pids for server-respawns. And avoids duplicate process
creating multiple pids when the server restart signal is triggered with
service restart enabled.
Fixes#7350
It is required to set the environment variable in the case of distributed
minio. LoadCredentials is used to notify peers of the change and will not work if
environment variable is set. so, this function will never be called.
In scenario 1
```
- bucket/object-prefix
- bucket/object-prefix/object
```
Server responds with `XMinioParentIsObject`
In scenario 2
```
- bucket/object-prefix/object
- bucket/object-prefix
```
Server responds with `XMinioObjectExistsAsDirectory`
Fixes#6566
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
We should internally handle when http2 input stream has smaller
content than its content-length header
Upstream issue reported https://github.com/golang/go/issues/30648
This a change which we need to handle internally until Go fixes it
correctly, till now our code doesn't expect a custom error to be returned.
CopyObject precondition checks into GetObjectReader
in order to perform SSE-C pre-condition checks using the
last 32 bytes of encrypted ETag rather than the decrypted
ETag
This also necessitates moving precondition checks for
gateways to gateway layer rather than object handler check
if a bucket with `Captialized letters` is created, `InvalidBucketName` error
will be returned.
In the case of pre-existing buckets, it will be listed.
Fixes#6938
Prevents deferred close functions from being called while still
attempting to copy reader to snappyWriter.
Reduces code duplication when compressing objects.
This change allows indefinitely running go-routines to cleanup
gracefully.
This channel is now closed at the beginning of each test so that
long-running go-routines quit and a new one is assigned.
The side affect of this change memory
increase, but this is a trade-off between
performance and actual memory usage.
For all practical scenarios this should be
an adequate change.
- The events will be persisted in queueStore if `queueDir` is set.
- Else, if queueDir is not set events persist in memory.
The events are replayed back when the mqtt broker is back online.
Clients like AWS SDK Java and AWS cli XML parsers are
unable to handle on `\r\n` characters to avoid these
errors send XML header first and write white space characters
instead.
Also handle cases to avoid double WriteHeader calls
- Current implementation was spawning renewer goroutines
without waiting for the lease duration to end. Remove vault renewer
and call vault.RenewToken directly and manage reauthentication if
lease expired.
We should change the logic for both isObject()
and isObjectDir() leaf detection to be done
with quorum, due to how our directory navigation
works - this allows for properly deleting all
the dangling directories or objects if any.
This commit fixes a nil pointer dereference issue
that can occur when the Vault KMS returns e.g. a 404
with an empty HTTP response. The Vault client SDK
does not treat that as error and returns nil for
the error and the secret.
Further it simplifies the token renewal and
re-authentication mechanism by using a single
background go-routine.
The control-flow of Vault authentications looks
like this:
1. `authenticate()`: Initial login and start of background job
2. Background job starts a `vault.Renewer` to renew the token
3. a) If this succeeds the token gets updated
b) If this fails the background job tries to login again
4. If the login in 3b. succeeded goto 2. If it fails
goto 3b.
Currently, we were sending errors in Select binary format,
which is incompatible with AWS S3 behavior, errors in binary
are sent after HTTP status code is already 200 OK - i.e it
happens during the evaluation of the record reader.
This commit increases storage REST requests to 5 minutes, this includes
the opening TCP connection, and sending/receiving data. This will reduce
clients receiving errors when the server is under high load.
Different gateway implementations due to different backend
API errors, might return different unsupported errors at
our handler layer. Current code posed a problem for us because
this information was lost and we would convert it to InternalError
in this situation all S3 clients end up retrying the request.
To avoid this unexpected situation implement a way to support
this cleanly such that the underlying information is not lost
which is returned by gateway.
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.
Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.
We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.
This PR fixes all of the above items.
This PR also adds some comments and simplifies
the code. Primary handling is done to ensure
that we make sure to honor cached buffer.
Added unit tests as well
Fixes#7141
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.
But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
ListObjectParts is using xl.readXLMetaParts which picks the first
xl meta found in any disk, which is an inconsistent information.
E.g.: In a middle of a multipart upload, one node can go offline
and get back later with an outdated multipart information.
This commit fixes the computation of Before/After healing state
for empty directories.
Issues before the commit:
- Before state doesn't reflect the real status (no StatVol() called)
- For any MakeVol() error, healObjectDir is exited directly, which is
wrong.
Currently during a heal of a bucket, if one disk is offline an empty endpoint entry is added.
Then another entry with the missing endpoint is also added.
This results in more entries than disks being added.
Code that adds empty endpoint has been removed.
Collect historic cpu and mem stats. Also, use actual values
instead of formatted strings while returning to the client. The string
formatting prevents values from being processed by the server or
by the client without parsing it.
This change will allow the values to be processed (eg.
compute rolling-average over the lifetime of the minio server)
and offloads the formatting to the client.
We made a change previously in #7111 which moved support
for AWS envs only for AWS S3 endpoint. Some users requested
that this be added back to Non-AWS endpoints as well as
they require separate credentials for backend authentication
from security point of view.
More than one client can't use the same clientID for MQTT connection.
This causes problem in distributed deployments where config is shared
across nodes, as each Minio instance tries to connect to MQTT using the
same clientID.
This commit removes the clientID field in config, and allows
MQTT client to create random clientID for each node.
- New parser written from scratch, allows easier and complete parsing
of the full S3 Select SQL syntax. Parser definition is directly
provided by the AST defined for the SQL grammar.
- Bring support to parse and interpret SQL involving JSON path
expressions; evaluation of JSON path expressions will be
subsequently added.
- Bring automatic type inference and conversion for untyped
values (e.g. CSV data).
This situation happens only in gateway nas which supports
etcd based `config.json` to support all FS mode features.
The issue was we would try to migrate something which doesn't
exist when etcd is configured which leads to inconsistent
server configs in memory.
This PR fixes this situation by properly loading config after
initialization, avoiding backend disk config migration to be
done only if etcd is not configured.
If it does happen that we have a lot files in '.minio.sys/tmp',
minio startup might block deleting this folder. Rename and
delete in background instead to allow Minio to start serving
requests.
To avoid a large number of concurrent connections between minio
servers and to reduce CPU pressure, it is better to limit the number
of objects healed in parallel to number_of_CPUs.
Requirements like being able to run minio gateway in ec2
pointing to a Minio deployment wouldn't work properly
because IAM creds take precendence on ec2.
Add checks such that we only enable AWS specific features
if our backend URL points to actual AWS S3 not S3 compatible
endpoints.
Returning unexpected errors can cause problems for config handling,
which is what led gateway deployments with etcd to misbehave and
had stopped working properly
Deployment ID is not copied into new formats after healing format. Although,
this is not critical since a new deployment ID will be generated and set in the
next cluster restart, it is still much better if we don't change the deployment
id of a cluster for a better tracking.
Fix regexp matcher for special assets for the browser to clash with
less of the object namespace.
Assets should now be loaded with the /minio/ prefix. Previously,
favicon.ico (and others) could be loaded at any path matching
/minio/*/favicon.ico. This clashes with a large part of the object
namespace. With this change, /minio/favicon.ico will serve the favicon
but not /minio/mybucket/favicon.ico
Fixes#7077
This changes causes `getRootCAs` to always load system-wide CAs.
Any additional custom CAs (at `certs/CA/`) are added to the certificate pool
of system CAs.
The previous behavior was incorrect since all no system-wide CAs were
loaded if either there were CAs under `certs/CA` or the `certs/CA`
directory didn't exist at all.
Also add a cross compile script to test always cross
compilation for some well known platforms and architectures
, we support out of box compilation of these platforms even
if we don't make an official release build.
This script is to avoid regressions in this area when we
add platform dependent code.
This PR supports iam and bucket policies to have
policy variable replacements in resource and
condition key values.
For example
- ${aws:username}
- ${aws:userid}
Health checking programs very frequently use /minio/health/live
to check health, hence we can avoid doing StorageInfo() and
ListBuckets() for FS/Erasure backend.
* Use 0-byte file for bitrot verification of whole-file-bitrot files
Also pass the right checksum information for bitrot verification
* Copy xlMeta info from latest meta except []checksums and []Parts while healing
Simplify parallelReader.Read() which also fixes previous
implementation where it was returning before all the parallel
reading go-routines had terminated which caused race conditions.
When auto-encryption is turned on, we pro-actively add SSEHeader
for all PUT, POST operations. This is unusual for V2 signature
calculation because V2 signature doesn't have a pre-defined set
of signed headers in the request like V4 signature. According to
V2 we should canonicalize all incoming supported HTTP headers.
Make sure to validate signatures before we mutate http headers
Deprecate the use of Admin Peers concept and migrate all peer
communication to Notification subsystem. This finally allows
for a common subsystem for all peer notification in case of
distributed server deployments.
Before this change the CopyObjectHandler and the CopyObjectPartHandler
both looked for a `versionId` parameter on the `X-Amz-Copy-Source` URL
for the version of the object to be copied on the URL unescaped version
of the header. This meant that files that had question marks in were
truncated after the question mark so that files with `?` in their
names could not be server side copied.
After this change the URL unescaping is done during the parsing of the
`versionId` parameter which fixes the problem.
This change also introduces the same logic for the
`X-Amz-Copy-Source-Version-Id` header field which was previously
ignored, namely returning an error if it is present and not `null`
since minio does not currently support versions.
S3 Docs:
- https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
- https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html
When source is encrypted multipart object and the parts are not
evenly divisible by DARE package block size, target encrypted size
will not necessarily be the same as encrypted source object.
This commit removes old code preventing PUT requests with '/' as a path,
because this is not needed anymore after the introduction of the virtual
host style in Minio server code.
'PUT /' when global domain is not configured already returns 405 Method
Not Allowed http error.
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).
If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.
When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.
Fixes#6323, #6696
This is part of implementation for mc admin health command. The
ServerDrivesPerfInfo() admin API returns read and write speed
information for all the drives (local and remote) in a given Minio
server deployment.
Part of minio/mc#2606
It can happen with erroneous clients which do not send `Host:`
header until 4k worth of header bytes have been read. This can lead
to Peek() method of bufio to fail with ErrBufferFull.
To avoid this we should make sure that Peek buffer is as large as
our maxHeaderBytes count.
minio-java tests were failing under multiple places when
auto encryption was turned on, handle all the cases properly
This PR fixes
- CopyObject should decrypt ETag before it does if-match
- CopyObject should not try to preserve metadata of source
when rotating keys, unless explicitly asked by the user.
- We should not try to decrypt Compressed object etag, the
potential case was if user sets encryption headers along
with compression enabled.
Especially in gateway IAM admin APIs are not enabled
if etcd is not enabled, we should enable admin API though
but only enable IAM and Config APIs with etcd configured.
By default when we listen on all interfaces, we print all the
endpoints that at local to all interfaces including IPv6
addresses. Remove IPv6 addresses in endpoint list to be
printed in endpoints unless explicitly specified with '--address'
This commit adds an auto-encryption feature which allows
the Minio operator to ensure that uploaded objects are
always encrypted.
This change adds the `autoEncryption` configuration option
as part of the KMS conifguration and the ENV. variable
`MINIO_SSE_AUTO_ENCRYPTION:{on,off}`.
It also updates the KMS documentation according to the
changes.
Fixes#6502
Currently we use GetObject to check if we are allowed to list,
this might be a security problem since there are many users now
who actively disable a publicly readable listing, anyone who
can guess the browser URL can list the objects.
This PR turns off this behavior and provides a more expected way
based on the policies.
This PR also additionally improves the Download() object
implementation to use a more streamlined code.
These are precursor changes to facilitate federation and web
identity support in browser.
One user reported having discovered the following error:
API: SYSTEM()
Time: 20:06:17 UTC 12/06/2018
Error: xml: encoding "US-ASCII" declared but Decoder.CharsetReader is nil
1: cmd/handler-utils.go:43:cmd.parseLocationConstraint()
2: cmd/auth-handler.go:250:cmd.checkRequestAuthType()
3: cmd/bucket-handlers.go:411:cmd.objectAPIHandlers.PutBucketHandler()
4: cmd/api-router.go100cmd.(objectAPIHandlers).PutBucketHandler-fm()
5: net/http/server.go:1947:http.HandlerFunc.ServeHTTP()
Hence, adding support of different xml encoding. Although there
is no clear specification about it, even setting "GARBAGE" as an xml
encoding won't change the behavior of AWS, hence the encoding seems
to be ignored.
This commit will follow that behavior and will ignore encoding field
and consider all xml as utf8 encoded.
This refactors the vault configuration by moving the
vault-related environment variables to `environment.go`
(Other ENV should follow in the future to have a central
place for adding / handling ENV instead of magic constants
and handling across different files)
Further this commit adds master-key SSE-S3 support.
The operator can specify a SSE-S3 master key using
`MINIO_SSE_MASTER_KEY` which will be used as master key
to derive and encrypt per-object keys for SSE-S3
requests.
This commit is also a pre-condition for SSE-S3
auto-encyption support.
Fixes#6329
This PR implements one of the pending items in issue #6286
in S3 API a user can request CSV output for a JSON document
and a JSON output for a CSV document. This PR refactors
the code a little bit to bring this feature.
guessIsRPCReq() considers all POST requests as RPC but doesn't
check if this is an object operation API or not, which is actually
confusing bucket forwarder handler when it receives a new multipart
upload API which is a POST http request.
Due to this bug, users having a federated setup are not able to
upload a multipart object using an endpoint which doesn't actually
contain the specified bucket that will store the object.
Hence this commit will fix the described issue.
registering notFound handler more than once causes
gorilla mux to return error for all registered paths
greater than > 8. This might be a bug in the gorilla/mux
but we shouldn't be using it this way. NotFound handler
should be only registered once per root router.
Fixes#6915
When MINIO_PUBLIC_IPS is not specified and no endpoints are passed
as arguments, fallback to the address of non loop-back interfaces.
This is useful so users can avoid setting MINIO_PUBLIC_IPS in docker
or orchestration scripts, ince users naturally setup an internal
network that connects all instances.
This commit renames the env variable for vault namespaces
such that it begins with `MINIO_SSE_`. This is the prefix
for all Minio SSE related env. variables (like KMS).
clientID must be a unique `UUID` for each connections. Now, the
server generates it, rather considering the config.
Removing it as it is non-beneficial right now.
Fixes#6364
When migrating configs it happens often that some
servers fail to start due to version mismatch etc.
Hold a transaction lock such that all servers get
serialized.
This can create inconsistencies i.e Parts might have
lesser number of parts than ChecksumInfos. This will
result in object to be not readable.
This PR also allows for deleting previously created
corrupted objects.
globalMinioPort is used in federation which stores the address
and the port number of the server hosting the specified bucket,
this latter uses globalMinioPort but this latter is not set in
startup of the gateway mode.
This commit fixes the behavior.
Calling /minio/prometheuses/metrics calls xlSets.StorageInfo() which creates a new
storage REST client and closes it. However, currently, closing does nothing
to the underlying opened http client.
This commit introduces a closing behavior by calling CloseIdleConnections
provided by http.Transport upon the initialization of this latter.
This refactor brings a change which allows
targets to be added in a cleaner way and also
audit is now moved out.
This PR also simplifies logger dependency for auditing
The current code triggers a timeout to cleanup a heal seq from
healSeqMap, but we don't know if the user did or not launch a new
healing sequence with the same path.
Add endTime to healSequence struct and add a periodic heal-sequence
cleaner to remove heal sequences only if this latter is older than
10 minutes.
Rolling update doesn't work properly because Storage REST API has
a new API WriteAll() but without API version number increase.
Also be sure to return 404 for unknown http paths.
To conform with AWS S3 Spec on ETag for SSE-S3 encrypted objects,
encrypt client sent MD5Sum and store it on backend as ETag.Extend
this behavior to SSE-C encrypted objects.
This improves the performance of certain queries dramatically,
such as 'count(*)' etc.
Without this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m42.464s
user 0m0.071s
sys 0m0.010s
```
With this PR
```
~ time mc select --query "select count(*) from S3Object" myminio/sjm-airlines/star2000.csv.gz
2173762
real 0m17.603s
user 0m0.093s
sys 0m0.008s
```
Almost a 250% improvement in performance. This PR avoids a lot of type
conversions and instead relies on raw sequences of data and interprets
them lazily.
```
benchcmp old new
benchmark old ns/op new ns/op delta
BenchmarkSQLAggregate_100K-4 551213 259782 -52.87%
BenchmarkSQLAggregate_1M-4 6981901985 2432413729 -65.16%
BenchmarkSQLAggregate_2M-4 13511978488 4536903552 -66.42%
BenchmarkSQLAggregate_10M-4 68427084908 23266283336 -66.00%
benchmark old allocs new allocs delta
BenchmarkSQLAggregate_100K-4 2366 485 -79.50%
BenchmarkSQLAggregate_1M-4 47455492 21462860 -54.77%
BenchmarkSQLAggregate_2M-4 95163637 43110771 -54.70%
BenchmarkSQLAggregate_10M-4 476959550 216906510 -54.52%
benchmark old bytes new bytes delta
BenchmarkSQLAggregate_100K-4 1233079 1086024 -11.93%
BenchmarkSQLAggregate_1M-4 2607984120 557038536 -78.64%
BenchmarkSQLAggregate_2M-4 5254103616 1128149168 -78.53%
BenchmarkSQLAggregate_10M-4 26443524872 5722715992 -78.36%
```
xl.json is the source of truth for all erasure
coded objects, without which we won't be able to
read the objects properly. This PR enables sync
mode for writing `xl.json` such all writes go hit
the disk and are persistent under situations such
as abrupt power failures on servers running Minio.
This change will allow users to enter the endpoint of the
storage account if this latter belongs to a different Azure
cloud environment, such as US gov cloud.
e.g:
`MINIO_ACCESS_KEY=testaccount \
MINIO_SECRET_KEY=accountsecretkey \
minio gateway azure https://testaccount.blob.usgovcloudapi.net`
In many situations, while testing we encounter
ErrInternalError, to reduce logging we have
removed logging from quite a few places which
is acceptable but when ErrInternalError occurs
we should have a facility to log the corresponding
error, this helps to debug Minio server.
Multipart object final size is not a contiguous
encrypted object representation, so trying to
decrypt this size will lead to an error in some
cases. The multipart object should be detected first
and then decoded with its respective parts instead.
This PR handles this situation properly, added a
test as well to detect these in the future.
This commit adds key-rotation for SSE-S3 objects.
To execute a key-rotation a SSE-S3 client must
- specify the `X-Amz-Server-Side-Encryption: AES256` header
for the destination
- The source == destination for the COPY operation.
Fixes#6754
This PR adds support
- Request query params
- Request headers
- Response headers
AuditLogEntry is exported and versioned as well
starting with this PR.
On Windows erasure coding setup if
```
~ minio server V:\ W:\ X:\ Z:\
```
is not possible due to NTFS creating couple of
hidden folders, this PR allows minio to use
the entire drive.
Execute method in s3Select package makes a response.WriteHeader call.
Not calling it again in SelectObjectContentHandler function in case of
error in s3Select.Execute call.
On a heavily loaded server, getBucketInfo() becomes slow,
one can easily observe deleting an object causes many
additional network calls.
This PR is to let the underlying call return the actual
error and write it back to the client.
This PR supports two models for etcd certs
- Client-to-server transport security with HTTPS
- Client-to-server authentication with HTTPS client certificates
Endpoint comparisons blindly without looking
if its local is wrong because the actual drive
for a local disk is always going to provide just
the path without the HTTP endpoint.
Add code such that this is taken care properly in
all situations. Without this PR HealBucket() would
wrongly conclude that the healing doesn't have quorum
when there are larger number of local disks involved.
Fixes#6703
Current master didn't support CopyObjectPart when source
was encrypted, this PR fixes this by allowing range
CopySource decryption at different sequence numbers.
Fixes#6698
This PR fixes
- The target object should be compressed even if the
source object is not compressed.
- The actual size for an encrypted object should be the
`decryptedSize`
This commit fixes a wrong assignment to `actualPartSize`.
The `actualPartSize` for an encrypted src object is not `srcInfo.Size`
because that's the encrypted object size which is larger than the
actual object size. So the actual part size for an encrypted
object is the decrypted size of `srcInfo.Size`.
Without this fix we have room for two different type of
errors.
- Source is encrypted and we didn't provide any source encryption keys
This results in Incomplete body error to be returned back to the client
since source is encrypted and we gave the reader as is to the object
layer which was of a decrypted value leading to "IncompleteBody"
- Source is not encrypted and we provided source encryption keys.
This results in a corrupted object on the destination which is
considered encrypted but cannot be read by the server and returns
the following error.
```
<Error><Code>XMinioObjectTampered</Code><Message>The requested object
was modified and may be compromised</Message><Resource>/id-platform-gamma/
</Resource><RequestId>155EDC3E86BFD4DA</RequestId><HostId>3L137</HostId>
</Error>
```
This commit fixes a regression introduced in f187a16962
the regression returned AccessDenied when a client is trying to create an empty
directory on a existing prefix, though it should return 200 OK to be close as
much as possible to S3 specification.
Since refactoring to GetObjectNInfo style, there are many cases
when i/o closed pipe is printed like, downloading an object
with wrong encryption key. This PR removes the log.
This commit moves the check that SSE-C requests
must be made over TLS into a generic HTTP handler.
Since the HTTP server uses custom TCP connection handling
it is not possible to use `http.Request.TLS` to check
for TLS connections. So using `globalIsSSL` is the only
option to detect whether the request is made over TLS.
By extracting this check into a separate handler it's possible
to refactor other parts of the SSE handling code further.
This commit adds two functions for sealing/unsealing the
etag (a.k.a. content MD5) in case of SSE single-part upload.
Sealing the ETag is neccessary in case of SSE-S3 to preserve
the security guarantees. In case of SSE-S3 AWS returns the
content-MD5 of the plaintext object as ETag. However, we
must not store the MD5 of the plaintext for encrypted objects.
Otherwise it becomes possible for an attacker to detect
equal/non-equal encrypted objects. Therefore we encrypt
the ETag before storing on the backend. But we only need
to encrypt the ETag (content-MD5) if the client send it -
otherwise the client cannot verify it anyway.
CopyObject handler forgot to remove multipart encryption flag in metadata
when source is an encrypted multipart object and the target is also encrypted
but single part object.
This PR also simplifies the code to facilitate review.
This PR brings an additional logger implementation
called AuditLog which logs to http targets
The intention is to use AuditLog to log all incoming
requests, this is used as a mechanism by external log
collection entities for processing Minio requests.
This PR introduces two new features
- AWS STS compatible STS API named AssumeRoleWithClientGrants
```
POST /?Action=AssumeRoleWithClientGrants&Token=<jwt>
```
This API endpoint returns temporary access credentials, access
tokens signature types supported by this API
- RSA keys
- ECDSA keys
Fetches the required public key from the JWKS endpoints, provides
them as rsa or ecdsa public keys.
- External policy engine support, in this case OPA policy engine
- Credentials are stored on disks
- Only require len(disks)/2 to initialize the cluster
- Fix checking of read/write quorm in subsystems init
- Add retry mechanism in policy and notification to avoid aborting in case of read/write quorums errors
in xl.PutObjectPart call, prepareFile detected an error when the storage
is exhausted but we were returning the wrong error.
With this commit, users can see the correct error message when their disks
become full.
Simplify the logic of using rename() in xl. Currently, renaming
doesn't require the source object/dir to be existent in at least
read quorum disks, since there is no apparent reason for it
to be as a general rule, this commit will just simplify the
logic to avoid possible inconsistency in the backend in the future.
This is a major regression introduced in this commit
ce02ab613d is the first bad commit
commit ce02ab613d
Author: Krishna Srinivas <634494+krishnasrinivas@users.noreply.github.com>
Date: Mon Aug 6 15:14:08 2018 -0700
Simplify erasure code by separating bitrot from erasure code (#5959)
:040000 040000 794f58d82ad2201ebfc8 M cmd
This effects all distributed server deployments since this commit
All the following releases are affected
- RELEASE.2018-09-25T21-34-43Z
- RELEASE.2018-09-12T18-49-56Z
- RELEASE.2018-09-11T01-39-21Z
- RELEASE.2018-09-01T00-38-25Z
- RELEASE.2018-08-25T01-56-38Z
- RELEASE.2018-08-21T00-37-20Z
- RELEASE.2018-08-18T03-49-57Z
Thanks to Anis for reproducing the issue
Without this PR minio server is writing an erroneous
response to clients on an idle connections which ends
up printing following message
```
Unsolicited response received on idle HTTP channel
```
This PR would avoid sending responses on idle connections
.i.e routine network errors.
In XL PutObject & CompleteMultipartUpload, the existing object is renamed
to the temporary directory before checking if worm is enabled or not.
Most of the times, this doesn't cause an issue unless two uploads to the
same location occurs at the same time. Since there is no locking in object
handlers, both uploads will reach XL layer. The second client acquiring
write lock in put object or complete upload in XL will rename the object
to the temporary directory before doing the check and returning the error (wrong!).
This commit fixes then the behavior: no rename to temporary directory if
worm is enabled.
Current implementation simply uses all the memory locally
and crashes when a large upload is initiated using Minio
browser UI.
This PR uploads stream in blocks and finally commits the blocks
allowing for low memory footprint while uploading large objects
through Minio browser UI.
This PR also adds ETag compatibility for single PUT operations.
Fixes#6542Fixes#6550
This to ensure that we heal all entries in config/
prefix, we will have IAM and STS related files which
are being introduced in #6168 PR
This is a change to ensure that we heal all of them
properly, not just `config.json`
go test shows the following warning:
```
WARNING: DATA RACE
Write at 0x000002909e18 by goroutine 276:
github.com/minio/minio/cmd.testAdminCmdRunnerSignalService()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-rpc_test.go:44 +0x94
Previous read at 0x000002909e18 by goroutine 194:
github.com/minio/minio/cmd.testServiceSignalReceiver()
/home/travis/gopath/src/github.com/minio/minio/cmd/admin-handlers_test.go:467 +0x70
```
The reason for this data race is that some admin tests are not waiting for go routines
that they created to be properly exited, which triggers the race detector.
When download profiling data API fails to gather profiling data
from all nodes for any reason (including profiler not enabled),
return 400 http code with the appropriate json message.
This commit adds two functions for removing
confidential information - like SSE-C keys -
from HTTP headers / object metadata.
This creates a central point grouping all
headers/entries which must be filtered / removed.
See also https://github.com/minio/minio/pull/6489#discussion_r219797993
of #6489
The new call combines GetObjectInfo and GetObject, and returns an
object with a ReadCloser interface.
Also adds a number of end-to-end encryption tests at the handler
level.
Allow minio s3 gateway to use aws environment credentials,
IAM instance credentials, or AWS file credentials.
If AWS_ACCESS_KEY_ID, AWS_SECRET_ACCSES_KEY are set,
or minio is running on an ec2 instance with IAM instance credentials,
or there is a file $HOME/.aws/credentials, minio running as an S3
gateway will authenticate with AWS S3 using those one of credentials.
The lookup order:
1. AWS environment varaibles
2. IAM instance credentials
3. $HOME/.aws/credentials
4. minio environment variables
To authenticate with the minio gateway, you will always use the
minio environment variables MINIO_ACCESS_KEY MINIO_SECRET_KEY.
Two handlers are added to admin API to enable profiling and disable
profiling of a server in a standalone mode, or all nodes in the
distributed mode.
/minio/admin/profiling/start/{cpu,block,mem}:
- Start profiling and return starting JSON results, e.g. one
node is offline.
/minio/admin/profiling/download:
- Stop the on-going profiling task
- Stream a zip file which contains all profiling files that can
be later inspected by go tool pprof