- Go might reset the internal http.ResponseWriter() to `nil`
after Write() failure if the go-routine has returned, do not
flush() such scenarios and avoid spurious flushes() as
returning handlers always flush.
- fix some racy tests with the console
- avoid ticker leaks in certain situations
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
Borrowed idea from Go's usage of this
optimization for ReadFrom() on client
side, we should re-use the 32k buffers
io.Copy() allocates for generic copy
from a reader to writer.
the performance increase for reads for
really tiny objects is at this range
after this change.
> * Fastest: +7.89% (+1.3 MiB/s) throughput, +7.89% (+1308.1) obj/s
Removes RLock/RUnlock for updating metadata,
since we already take a write lock to update
metadata, this change removes reading of xl.meta
as well as an additional lock, the performance gain
should increase 3x theoretically for
- PutObjectRetention
- PutObjectLegalHold
This optimization is mainly for Veeam like
workloads that require a certain level of iops
from these API calls, we were losing iops.
- Supports object locked buckets that require
PutObject() to set content-md5 always.
- Use SSE-S3 when S3 gateway is being used instead
of SSE-KMS for auto-encryption.
Replication was not working properly for encrypted
objects in single PUT object for preserving etag,
We need to make sure to preserve etag such that replication
works properly and not gets into infinite loops of copying
due to ETag mismatches.
* reduce extra getObjectInfo() calls during ILM transition
This PR also changes expiration logic to be non-blocking,
scanner is now free from additional costs incurred due
to slower object layer calls and hitting the drives.
* move verifying expiration inside locks
- deletes should always Sweep() for tiering at the
end and does not need an extra getObjectInfo() call
- puts, copy and multipart writes should conditionally
do getObjectInfo() when tiering targets are configured
- introduce 'TransitionedObject' struct for ease of usage
and understanding.
- multiple-pools optimization deletes don't need to hold
read locks verifying objects across namespace and pools.
This happens because of a change added where any sub-credential
with parentUser == rootCredential i.e (MINIO_ROOT_USER) will
always be an owner, you cannot generate credentials with lower
session policy to restrict their access.
This doesn't affect user service accounts created with regular
users, LDAP or OpenID
Before, the gateway will complain that it found KMS configured in the
environment but the gateway mode does not support encryption. This
commit will allow starting of the gateway but ensure that S3 operations
with encryption headers will fail when the gateway doesn't support
encryption. That way, the user can use etcd + KMS and have IAM data
encrypted in the etcd store.
Co-authored-by: Anis Elleuch <anis@min.io>
if object was uploaded with multipart. This is to ensure that
GetObject calls with partNumber in URI request parameters
have same behavior on source and replication target.
Bonus: remove kms_kes as sub-system, since its ENV only.
- also fixes a crash with etcd cluster without KMS
configured and also if KMS decryption is missing.
This feature also changes the default port where
the browser is running, now the port has moved
to 9001 and it can be configured with
```
--console-address ":9001"
```
Always use `GetActualSize` to get the part size, not just when encrypted.
Fixes mint test io.minio.MinioClient.uploadPartCopy,
error "Range specified is not valid for source object".
Also adding an API to allow resyncing replication when
existing object replication is enabled and the remote target
is entirely lost. With the `mc replicate reset` command, the
objects that are eligible for replication as per the replication
config will be resynced to target if existing object replication
is enabled on the rule.
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
This commit fixes a bug causing the MinIO server to compute
the ETag of a single-part object as MD5 of the compressed
content - not as MD5 of the actual content.
This usually does not affect clients since the MinIO appended
a `-1` to indicate that the ETag belongs to a multipart object.
However, this behavior was problematic since:
- A S3 client being very strict should reject such an ETag since
the client uploaded the object via single-part API but got
a multipart ETag that is not the content MD5.
- The MinIO server leaks (via the ETag) that it compressed the
object.
This commit addresses both cases. Now, the MinIO server returns
an ETag equal to the content MD5 for single-part objects that got
compressed.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit adds the `X-Amz-Server-Side-Encryption-Aws-Kms-Key-Id`
response header to the GET, HEAD, PUT and Download API.
Based on AWS documentation [1] AWS S3 returns the KMS key ID as part
of the response headers.
[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/specifying-kms-encryption.html
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit adds support for SSE-KMS bucket configurations.
Before, the MinIO server did not support SSE-KMS, and therefore,
it was not possible to specify an SSE-KMS bucket config.
Now, this is possible. For example:
```
mc encrypt set sse-kms some-key <alias>/my-bucket
```
Further, this commit fixes an issue caused by not supporting
SSE-KMS bucket configuration and switching to SSE-KMS as default
SSE method.
Before, the server just checked whether an SSE bucket config was
present (not which type of SSE config) and applied the default
SSE method (which was switched from SSE-S3 to SSE-KMS).
This caused objects to get encrypted with SSE-KMS even though a
SSE-S3 bucket config was present.
This issue is fixed as a side-effect of this commit.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
when bidirectional replication is set up.
If ReplicaModifications is enabled in the replication
configuration, sync metadata updates to source if
replication rules are met. By default, if this
configuration is unset, MinIO automatically sync's
metadata updates on replica back to the source.
This commit replaces the custom KES client implementation
with the KES SDK from https://github.com/minio/kes
The SDK supports multi-server client load-balancing and
requests retry out of the box. Therefore, this change reduces
the overall complexity within the MinIO server and there
is no need to maintain two separate client implementations.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit adds basic SSE-KMS support.
Now, a client can specify the SSE-KMS headers
(algorithm, optional key-id, optional context)
such that the object gets encrypted using the
SSE-KMS method. Further, auto-encryption now
defaults to SSE-KMS.
This commit does not try to do any refactoring
and instead tries to implement SSE-KMS as a minimal
change to the code base. However, refactoring the entire
crypto-related code is planned - but needs a separate
effort.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit adds basic SSE-KMS support.
Now, a client can specify the SSE-KMS headers
(algorithm, optional key-id, optional context)
such that the object gets encrypted using the
SSE-KMS method. Further, auto-encryption now
defaults to SSE-KMS.
This commit does not try to do any refactoring
and instead tries to implement SSE-KMS as a minimal
change to the code base. However, refactoring the entire
crypto-related code is planned - but needs a separate
effort.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
Co-authored-by: Klaus Post <klauspost@gmail.com>
https://github.com/minio/console takes over the functionality for the
future object browser development
Signed-off-by: Harshavardhana <harsha@minio.io>
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.
Some new additions include:
- Admin APIs remote tier configuration management
- Simple journal to track remote objects to be 'collected'
This is used by object API handlers which 'mutate' object versions by
overwriting/replacing content (Put/CopyObject) or removing the version
itself (e.g DeleteObjectVersion).
- Rework of previous ILM transition to fit the new model
In the new model, a storage class (a.k.a remote tier) is defined by the
'remote' object storage type (one of s3, azure, GCS), bucket name and a
prefix.
* Fixed bugs, review comments, and more unit-tests
- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
This commit introduces a new package `pkg/fips`
that bundles functionality to handle and configure
cryptographic protocols in case of FIPS 140.
If it is compiled with `--tags=fips` it assumes
that a FIPS 140-2 cryptographic module is used
to implement all FIPS compliant cryptographic
primitives - like AES, SHA-256, ...
In "FIPS mode" it excludes all non-FIPS compliant
cryptographic primitives from the protocol parameters.
only in case of S3 gateway we have a case where we
need to allow for SSE-S3 headers as passthrough,
If SSE-C headers are passed then they are rejected
if KMS is not configured.
This commit fixes a bug in the put-part
implementation. The SSE headers should be
set as specified by AWS - See:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html
Now, the MinIO server should set SSE-C headers,
like `x-amz-server-side-encryption-customer-algorithm`.
Fixes#11991
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.
This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.
This helps reduce the chattiness for following calls..
- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.
- add API to get replication metrics
- add MRF worker to handle spill-over replication operations
- multiple issues found with replication
- fixes an issue when client sends a bucket
name with `/` at the end from SetRemoteTarget
API call make sure to trim the bucket name to
avoid any extra `/`.
- hold write locks in GetObjectNInfo during replication
to ensure that object version stack is not overwritten
while reading the content.
- add additional protection during WriteMetadata() to
ensure that we always write a valid FileInfo{} and avoid
ever writing empty FileInfo{} to the lowest layers.
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
replication didn't work as expected when deletion of
delete markers was requested in DeleteMultipleObjects
API, this is due to incorrect lookup elements being
used to look for delete markers.
This feature brings in support for auto extraction
of objects onto MinIO's namespace from an incoming
tar gzipped stream, the only expected metadata sent
by the client is to set `snowball-auto-extract`.
All the contents from the tar stream are saved as
folders and objects on the namespace.
fixes#8715
Cases where we have applications making request
for `//` in object names make sure that all
are normalized to `/` and all such requests that
are prefixed '/' are removed. To ensure a
consistent view from all operations.
This commit adds the `FromContentMD5` function to
parse a client-provided content-md5 as ETag.
Further, it also adds multipart ETag computation
for future needs.
There was an io.LimitReader was missing for the 'length'
parameter for ranged requests, that would cause client to
get truncated responses and errors.
fixes#11651
This commit adds a new package `etag` for dealing
with S3 ETags.
Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.
In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.
Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.
One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.
This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.
The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
Skip notifications on objects that might have had
an error during deletion, this also avoids unnecessary
replication attempt on such objects.
Refactor some places to make sure that we have notified
the client before we
- notify
- schedule for replication
- lifecycle etc.
continuation of PR#11491 for multiple server pools and
bi-directional replication.
Moving proxying for GET/HEAD to handler level rather than
server pool layer as this was also causing incorrect proxying
of HEAD.
Also fixing metadata update on CopyObject - minio-go was not passing
source version ID in X-Amz-Copy-Source header
This change moves away from a unified constructor for plaintext and encrypted
usage. NewPutObjReader is simplified for the plain-text reader use. For
encrypted reader use, WithEncryption should be called on an initialized PutObjReader.
Plaintext:
func NewPutObjReader(rawReader *hash.Reader) *PutObjReader
The hash.Reader is used to provide payload size and md5sum to the downstream
consumers. This is different from the previous version in that there is no need
to pass nil values for unused parameters.
Encrypted:
func WithEncryption(encReader *hash.Reader,
key *crypto.ObjectKey) (*PutObjReader, error)
This method sets up encrypted reader along with the key to seal the md5sum
produced by the plain-text reader (already setup when NewPutObjReader was
called).
Usage:
```
pReader := NewPutObjReader(rawReader)
// ... other object handler code goes here
// Prepare the encrypted hashed reader
pReader, err = pReader.WithEncryption(encReader, objEncKey)
```
When lifecycle decides to Delete an object and not a version in a
versioned bucket, the code should create a delete marker and not
removing the scanned version.
This commit fixes the issue.
- using miniogo.ObjectInfo.UserMetadata is not correct
- using UserTags from Map->String() can change order
- ContentType comparison needs to be removed.
- Compare both lowercase and uppercase key names.
- do not silently error out constructing PutObjectOptions
if tag parsing fails
- avoid notification for empty object info, failed operations
should rely on valid objInfo for notification in all
situations
- optimize copyObject implementation, also introduce a new
replication event
- clone ObjectInfo() before scheduling for replication
- add additional headers for comparison
- remove strings.EqualFold comparison avoid unexpected bugs
- fix pool based proxying with multiple pools
- compare only specific metadata
Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
This commit refactors the SSE implementation and add
S3-compatible SSE-KMS context handling.
SSE-KMS differs from SSE-S3 in two main aspects:
1. The client can request a particular key and
specify a KMS context as part of the request.
2. The ETag of an SSE-KMS encrypted object is not
the MD5 sum of the object content.
This commit only focuses on the 1st aspect.
A client can send an optional SSE context when using
SSE-KMS. This context is remembered by the S3 server
such that the client does not have to specify the
context again (during multipart PUT / GET / HEAD ...).
The crypto. context also includes the bucket/object
name to prevent renaming objects at the backend.
Now, AWS S3 behaves as following:
- If the user does not provide a SSE-KMS context
it does not store one - resp. does not include
the SSE-KMS context header in the response (e.g. HEAD).
- If the user specifies a SSE-KMS context without
the bucket/object name then AWS stores the exact
context the client provided but adds the bucket/object
name internally. The response contains the KMS context
without the bucket/object name.
- If the user specifies a SSE-KMS context with
the bucket/object name then AWS again stores the exact
context provided by the client. The response contains
the KMS context with the bucket/object name.
This commit implements this behavior w.r.t. SSE-KMS.
However, as of now, no such object can be created since
the server rejects SSE-KMS encryption requests.
This commit is one stepping stone for SSE-KMS support.
Co-authored-by: Harshavardhana <harsha@minio.io>
```
mc admin config set alias/ storage_class standard=EC:3
```
should only succeed if parity ratio is valid for all
server pools, if not we should fail proactively.
This PR also needs to bring other changes now that
we need to cater for variadic drive counts per pool.
Bonus fixes also various bugs reproduced with
- GetObjectWithPartNumber()
- CopyObjectPartWithOffsets()
- CopyObjectWithMetadata()
- PutObjectPart,PutObject with truncated streams
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.
This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
This commit refactors the code in `cmd/crypto`
and separates SSE-S3, SSE-C and SSE-KMS.
This commit should not cause any behavior change
except for:
- `IsRequested(http.Header)`
which now returns the requested type {SSE-C, SSE-S3,
SSE-KMS} and does not consider SSE-C copy headers.
However, SSE-C copy headers alone are anyway not valid.
additionally also configure http2 healthcheck
values to quickly detect unstable connections
and let them timeout.
also use single transport for proxying requests
X-Minio-Replication-Delete-Status header shows the
status of the replication of a permanent delete of a version.
All GETs are disallowed and return 405 on this object version.
In the case of replicating delete markers.
X-Minio-Replication-DeleteMarker-Status shows the status
of replication, and would similarly return 405.
Additionally, this PR adds reporting of delete marker event completion
and updates documentation
This PR adds transition support for ILM
to transition data to another MinIO target
represented by a storage class ARN. Subsequent
GET or HEAD for that object will be streamed from
the transition tier. If PostRestoreObject API is
invoked, the transitioned object can be restored for
duration specified to the source cluster.
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).
This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
A new field called AccessKey is added to the ReqInfo struct and populated.
Because ReqInfo is added to the context, this allows the AccessKey to be
accessed from 3rd-party code, such as a custom ObjectLayer.
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
The entire encryption layer is dependent on the fact that
KMS should be configured for S3 encryption to work properly
and we only support passing the headers as is to the backend
for encryption only if KMS is configured.
Make sure that this predictability is maintained, currently
the code was allowing encryption to go through and fail
at later to indicate that KMS was not configured. We should
simply reply "NotImplemented" if KMS is not configured, this
allows clients to simply proceed with their tests.
This is to ensure that Go contexts work properly, after some
interesting experiments I found that Go net/http doesn't
cancel the context when Body is non-zero and hasn't been
read till EOF.
The following gist explains this, this can lead to pile up
of go-routines on the server which will never be canceled
and will die at a really later point in time, which can
simply overwhelm the server.
https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150
To avoid this refactor the locking such that we take locks after we
have started reading from the body and only take locks when needed.
Also, remove contextReader as it's not useful, doesn't work as expected
context is not canceled until the body reaches EOF so there is no point
in wrapping it with context and putting a `select {` on it which
can unnecessarily increase the CPU overhead.
We will still use the context to cancel the lockers etc.
Additional simplification in the locker code to avoid timers
as re-using them is a complicated ordeal avoid them in
the hot path, since locking is very common this may avoid
lots of allocations.
configurable remote transport timeouts for some special cases
where this value needs to be bumped to a higher value when
transferring large data between federated instances.
This PR adds a DNS target that ensures to update an entry
into Kubernetes operator when a bucket is created or deleted.
See minio/operator#264 for details.
Co-authored-by: Harshavardhana <harsha@minio.io>
Generalize replication target management so
that remote targets for a bucket can be
managed with ARNs. `mc admin bucket remote`
command will be used to manage targets.
- copyObject in-place decryption failed
due to incorrect verification of headers
- do not decode ETag when object is encrypted
with SSE-C, so that pre-conditions don't fail
prematurely.
In federated NAS gateway setups, multiple hosts in srvRecords
was picked at random which could mean that if one of the
host was down the request can indeed fail and if client
retries it would succeed. Instead allow server to figure
out the current online host quickly such that we can
exclude the host which is down.
At the max the attempt to look for a downed node is to
300 millisecond, if the node is taking longer to respond
than this value we simply ignore and move to the node,
total attempts are equal to number of srvRecords if no
server is online we simply fallback to last dialed host.
Use a separate client for these calls that can take a long time.
Add request context to these so they are canceled when the client
disconnects instead except for ListObject which doesn't have any equivalent.
object KMS is configured with auto-encryption,
there were issues when using docker registry -
this has been left unnoticed for a while.
This PR fixes an issue with compatibility.
Additionally also fix the continuation-token implementation
infinite loop issue which was missed as part of #9939
Also fix the heal token to be generated as a client
facing value instead of what is remembered by the
server, this allows for the server to be stateless
regarding the token's behavior.
- x-amz-storage-class specified CopyObject
should proceed regardless, its not a precondition
- sourceVersionID is specified CopyObject should
proceed regardless, its not a precondition
Just like GET/DELETE APIs it is possible to preserve
client supplied versionId's, of course the versionIds
have to be uuid, if an existing versionId is found
it is overwritten if no object locking policies
are found.
- PUT /bucketname/objectname?versionId=<id>
- POST /bucketname/objectname?uploads=&versionId=<id>
- PUT /bucketname/objectname?verisonId=<id> (with x-amz-copy-source)
- Implement a new xl.json 2.0.0 format to support,
this moves the entire marshaling logic to POSIX
layer, top layer always consumes a common FileInfo
construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
for object placements.
Fixes#2111
CopyObject was not correctly figuring out the correct
destination object location and would end up creating
duplicate objects on two different zones, reproduced
by doing encryption based key rotation.
Advantages avoids 100's of stats which are needed for each
upload operation in FS/NAS gateway mode when uploading a large
multipart object, dramatically increases performance for
multipart uploads by avoiding recursive calls.
For other gateway's simplifies the approach since
azure, gcs, hdfs gateway's don't capture any specific
metadata during upload which needs handler validation
for encryption/compression.
Erasure coding was already optimized, additionally
just avoids small allocations of large data structure.
Fixes#7206
some clients such as veeam expect the x-amz-meta to
be sent in lower cased form, while this does indeed
defeats the HTTP protocol contract it is harder to
change these applications, while these applications
get fixed appropriately in future.
x-amz-meta is usually sent in lowercased form
by AWS S3 and some applications like veeam
incorrectly end up relying on the case sensitivity
of the HTTP headers.
Bonus fixes
- Fix the iso8601 time format to keep it same as
AWS S3 response
- Increase maxObjectList to 50,000 and use
maxDeleteList as 10,000 whenever multi-object
deletes are needed.
size calculation in crawler was using the real size
of the object instead of its actual size i.e either
a decrypted or uncompressed size.
this is needed to make sure all other accounting
such as bucket quota and mcs UI to display the
correct values.
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.
Additionally also fixes#9396, #9394
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.
This PR fixes the first stage of this
cleanup.
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}
This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
This PR allows setting a "hard" or "fifo" quota
restriction at the bucket level. Buckets that
have reached the FIFO quota configured, will
automatically be cleaned up in FIFO manner until
bucket usage drops to configured quota.
If a bucket is configured with a "hard" quota
ceiling, all further writes are disallowed.
global WORM mode is a complex piece for which
the time has passed, with the advent of S3 compatible
object locking and retention implementation global
WORM is sort of deprecated, this has been mentioned
in our documentation for some time, now the time
has come for this to go.