Commit Graph

436 Commits

Author SHA1 Message Date
Harshavardhana 124544d834
add pre-conditions support for PUT calls during replication (#15674)
PUT shall only proceed if pre-conditions are met, the new
code uses

- x-minio-source-mtime
- x-minio-source-etag

to verify if the object indeed needs to be replicated
or not, allowing us to avoid StatObject() call.
2022-09-14 18:44:04 -07:00
Klaus Post 8e4a45ec41
fix: encrypt checksums in metadata (#15620) 2022-08-31 08:13:23 -07:00
Klaus Post a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
Harshavardhana 8902561f3c
use new xxml for XML responses to support rare control characters (#15511)
use new xxml/XML responses to support rare control characters

fixes #15023
2022-08-23 17:04:11 -07:00
Poorna 21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
Poorna 426c902b87
site replication: fix healing of bucket deletes. (#15377)
This PR changes the handling of bucket deletes for site 
replicated setups to hold on to deleted bucket state until 
it syncs to all the clusters participating in site replication.
2022-07-25 17:51:32 -07:00
Klaus Post f939d1c183
Independent Multipart Uploads (#15346)
Do completely independent multipart uploads.

In distributed mode, a lock was held to merge each multipart 
upload as it was added. This lock was highly contested and 
retries are expensive (timewise) in distributed mode.

Instead, each part adds its metadata information uniquely. 
This eliminates the per object lock required for each to merge.
The metadata is read back and merged by "CompleteMultipartUpload" 
without locks when constructing final object.

Co-authored-by: Harshavardhana <harsha@minio.io>
2022-07-19 08:35:29 -07:00
Andreas Auernhammer 242d06274a
kms: add `context.Context` to KMS API calls (#15327)
This commit adds a `context.Context` to the
the KMS `{Stat, CreateKey, GenerateKey}` API
calls.

The context will be used to terminate external calls
as soon as the client requests gets canceled.

A follow-up PR will add a `context.Context` to
the remaining `DecryptKey` API call.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-18 18:54:27 -07:00
Klaus Post 0149382cdc
Add padding to compressed+encrypted files (#15282)
Add up to 256 bytes of padding for compressed+encrypted files.

This will obscure the obvious cases of extremely compressible content 
and leave a similar output size for a very wide variety of inputs.

This does *not* mean the compression ratio doesn't leak information 
about the content, but the outcome space is much smaller, 
so often *less* information is leaked.
2022-07-13 07:52:15 -07:00
Harshavardhana 0a8b78cb84
fix: simplify passing auditLog eventType (#15278)
Rename Trigger -> Event to be a more appropriate
name for the audit event.

Bonus: fixes a bug in AddMRFWorker() it did not
cancel the waitgroup, leading to waitgroup leaks.
2022-07-12 10:43:32 -07:00
Klaus Post 9f02f51b87
Add 4K minimum compressed size (#15273)
There is no point in compressing very small files.

Typically the effective size on disk will be the same due to disk blocks.

So don't waste resources on extremely small files.

We don't check on multipart. 1) because we don't know and 2) this is very likely a big object anyway.
2022-07-12 07:42:04 -07:00
Klaus Post 911a17b149
Add compressed file index (#15247) 2022-07-11 17:30:56 -07:00
Minio Trusted e60b67d246 Revert "Tighten enforcement of object retention (#14993)"
This reverts commit 5e3010d455.

This commit causes regression on object locked buckets causine
delete-markers to be not created.
2022-06-30 13:06:32 -07:00
Anis Elleuch b7c7e59dac
Revert proxying requests with precondition errors (#15180)
In a replicated setup, when an object is updated in one cluster but
still waiting to be replicated to the other cluster, GET requests with
if-match, and range headers will likely fail. It is better to proxy
requests instead.

Also, this commit avoids printing verbose logs about precondition &
range errors.
2022-06-27 14:03:44 -07:00
Harshavardhana 699cf6ff45
perform object sweep after equeue the latest CopyObject() (#15183)
keep it similar to PutObject/CompleteMultipart
2022-06-27 12:11:33 -07:00
Poorna cb097e6b0a
CopyObject: fix read/write err on closed pipe (#15135)
Fixes: #15128
Regression from PR#14971
2022-06-21 19:20:11 -07:00
Poorna 1cfb03fb74
replication: Avoid proxying when precondition failed (#15134)
Proxying is not required when content is on this cluster and
does not meet pre-conditions specified in the request.

Fixes #15124
2022-06-21 14:11:35 -07:00
Andreas Auernhammer cd7a0a9757
fips: simplify TLS configuration (#15127)
This commit simplifies the TLS configuration.
It inlines the FIPS / non-FIPS code.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-06-21 07:54:48 -07:00
Anis Elleuch 98ddc3596c
Avoid CompleteMultipart freeze with unexpected network issue (#15102)
If sending a white space during a long S3 handler call fails,
the whitespace goroutine forgets to return a result to the caller.
Therefore, the complete multipart handler will be blocked.

Remember to send the header written result to the caller 
or/and close the channel.
2022-06-17 10:41:25 -07:00
Harshavardhana 31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Poorna 5e3010d455
Tighten enforcement of object retention (#14993)
Ref issue#14991 - in the rare case that object in bucket under
retention has null version, make sure to enforce retention
rules.
2022-05-28 02:21:19 -07:00
Harshavardhana 38caddffe7
fix: copyObject on versioned bucket when updating metadata (#14971)
updating metadata with CopyObject on a versioned bucket
causes the latest version to be not readable, this PR fixes
this properly by handling the inline data bug fix introduced
in PR #14780.

This bug affects only inlined data.
2022-05-24 17:27:45 -07:00
Harshavardhana 62aa42cccf
avoid replication proxy on version excluded paths (#14878)
no need to attempt proxying objects that were
never replicated, but do have local `null`
versions on them.
2022-05-08 16:50:31 -07:00
Krishnan Parthasarathi ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Harshavardhana 73a6a60785
fix: replication deleteObject() regression and CopyObject() behavior (#14780)
This PR fixes two issues

- The first fix is a regression from #14555, the fix itself in #14555
  is correct but the interpretation of that information by the
  object layer code for "replication" was not correct. This PR
  tries to fix this situation by making sure the "Delete" replication
  works as expected when "VersionPurgeStatus" is already set.

  Without this fix, there is a DELETE marker created incorrectly on
  the source where the "DELETE" was triggered.

- The second fix is perhaps an older problem started since we inlined-data
  on the disk for small objects, CopyObject() incorrectly inline's
  a non-inlined data. This is due to the fact that we have code where
  we read the `part.1` under certain conditions where the size of the
  `part.1` is less than the specific "threshold".

  This eventually causes problems when we are "deleting" the data that
  is only inlined, which means dataDir is ignored leaving such
  dataDir on the disk, that looks like an inconsistent content on
  the namespace.

fixes #14767
2022-04-20 10:22:05 -07:00
Aditya Manthramurthy e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Harshavardhana 153a612253
fetch bucket retention config once for ILM evalAction (#14727)
This is mainly an optimization, does not change any
existing functionality.
2022-04-11 13:25:32 -07:00
Andreas Auernhammer ba17d46f15
ListObjectParts: simplify ETag decryption and size adjustment (#14653)
This commit simplifies the ETag decryption and size adjustment
when listing object parts.

When listing object parts, MinIO has to decrypt the ETag of all
parts if and only if the object resp. the parts is encrypted using
SSE-S3.
In case of SSE-KMS and SSE-C, MinIO returns a pseudo-random ETag.
This is inline with AWS S3 behavior.

Further, MinIO has to adjust the size of all encrypted parts due to
the encryption overhead.

The ListObjectParts does specifically not use the KMS bulk decryption
API (4d2fc530d0) since the ETags of all
parts are encrypted using the same object encryption key. Therefore,
MinIO only has to connect to the KMS once, even if there are multiple
parts resp. ETags. It can simply reuse the same object encryption key.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-30 15:23:25 -07:00
Poorna 4d13ddf6b3
Avoid shadowing error during replication proxy check (#14655)
Fixes #14652
2022-03-29 10:53:09 -07:00
Andreas Auernhammer b0a4beb66a
PutObjectPart: set SSE-KMS headers and truncate ETags. (#14578)
This commit fixes two bugs in the `PutObjectPartHandler`.
First, `PutObjectPart` should return SSE-KMS headers
when the object is encrypted using SSE-KMS.
Before, this was not the case.

Second, the ETag should always be a 16 byte hex string,
perhaps followed by a `-X` (where `X` is the number of parts).
However, `PutObjectPart` used to return the encrypted ETag
in case of SSE-KMS. This leaks MinIO internal etag details
through the S3 API.

The combination of both bugs causes clients that use SSE-KMS
to fail when trying to validate the ETag. Since `PutObjectPart`
did not send the SSE-KMS response headers, the response looked
like a plaintext `PutObjectPart` response. Hence, the client
tries to verify that the ETag is the content-md5 of the part.
This could never be the case, since MinIO used to return the
encrypted ETag.

Therefore, clients behaving as specified by the S3 protocol
tried to verify the ETag in a situation they should not.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-19 10:15:12 -07:00
Klaus Post c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Poorna 1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Klaus Post 5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Harshavardhana 84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana dbd05d6e82
remove FIFO bucket quota, use ILM expiration instead (#14206) 2022-01-31 11:07:04 -08:00
Harshavardhana a60ac7ca17
fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Klaus Post 038fdeea83
snowball: return errors on failures (#13836)
Return errors when untar fails at once.

Current error handling was quite a mess. Errors are written 
to the stream, but processing continues.

Instead, return errors when they occur and transform 
internal errors to bad request errors, since it is likely a 
problem with the input.

Fixes #13832
2021-12-06 09:45:23 -08:00
Harshavardhana be34fc9134
fix: kms-id header should have arn:aws:kms: prefix (#13833)
arn:aws:kms: is a must for KMS keyID.
2021-12-06 00:39:32 -08:00
Harshavardhana 8591d17d82
return appropriate errors upon parseErrors (#13831) 2021-12-05 11:36:26 -08:00
Aditya Manthramurthy 4ce6d35e30
Add new `site` config sub-system intended to replace `region` (#13672)
- New sub-system has "region" and "name" fields.

- `region` subsystem is marked as deprecated, however still works, unless the
new region parameter under `site` is set - in this case, the region subsystem is
ignored. `region` subsystem is hidden from top-level help (i.e. from `mc admin
config set myminio`), but appears when specifically requested (i.e. with `mc
admin config set myminio region`).

- MINIO_REGION, MINIO_REGION_NAME are supported as legacy environment variables for server region.

- Adds MINIO_SITE_REGION as the current environment variable to configure the
server region and MINIO_SITE_NAME for the site name.
2021-11-25 13:06:25 -08:00
Harshavardhana fb268add7a
do not flush if Write() failed (#13597)
- Go might reset the internal http.ResponseWriter() to `nil`
  after Write() failure if the go-routine has returned, do not
  flush() such scenarios and avoid spurious flushes() as
  returning handlers always flush.
- fix some racy tests with the console 
- avoid ticker leaks in certain situations
2021-11-18 17:19:58 -08:00
Harshavardhana 661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana 14d8a931fe
re-use io.Copy buffers with 32k pools (#13553)
Borrowed idea from Go's usage of this
optimization for ReadFrom() on client
side, we should re-use the 32k buffers
io.Copy() allocates for generic copy
from a reader to writer.

the performance increase for reads for
really tiny objects is at this range
after this change.

> * Fastest: +7.89% (+1.3 MiB/s) throughput, +7.89% (+1308.1) obj/s
2021-11-02 08:11:50 -07:00
Poorna K 15dcacc1fc
Add support for caching multipart in writethrough mode (#13507) 2021-11-01 08:11:58 -07:00
Harshavardhana 4ed0eb7012
remove double reads updating object metadata (#13542)
Removes RLock/RUnlock for updating metadata,
since we already take a write lock to update
metadata, this change removes reading of xl.meta
as well as an additional lock, the performance gain
should increase 3x theoretically for

- PutObjectRetention
- PutObjectLegalHold

This optimization is mainly for Veeam like
workloads that require a certain level of iops
from these API calls, we were losing iops.
2021-10-30 08:22:04 -07:00
Klaus Post 7bdf9005e5
Remove HTTP flushes for returning handlers (#13528)
When handlers return they are automatically flushed. Manual flushing can force responsewriters to use suboptimal paths and generally just wastes CPU.
2021-10-28 07:36:34 -07:00
Poorna K e7f559c582
Fixes to replication metrics (#13493)
For reporting ReplicaSize and loading initial
replication metrics correctly.
2021-10-21 18:52:55 -07:00
Harshavardhana 1e117b780a
fix: validate exclusivity with partNumber regardless of valid Range (#13418)
To mimic an exact AWS S3 behavior this fix is needed.
2021-10-12 09:24:19 -07:00
Krishnan Parthasarathi f3aeed77e5
Add immediate inline tiering support (#13298) 2021-10-01 11:58:17 -07:00