minio

Commit Graph

Author	SHA1	Message	Date
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Harshavardhana	2a6a40e93b	enable go1.18.x builds (#14746 )	2022-04-13 14:21:55 -07:00
Krishnan Parthasarathi	cdab4a3b85	Update hourly tier-stats only on succesful tiering (#14330 )	2022-02-16 17:29:12 -08:00
Anis Elleuch	661ea57907	restore: Add quotes some fields in x-amz-restore header (#14281 ) S3 spec returns x-amz-restore header in HEAD/GET object with the following format: ``` x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012 00:00:00 GMT" ``` This commit adds quotes as the current code does not support it. It will also supports the old format saved in the disk (in xl.meta) for backward compatibility.	2022-02-09 13:17:41 -08:00
Krishnan Parthasarathi	d2e5f01542	feat: maintain in-memory tier stats for the last 24hrs (#13782 )	2022-01-26 14:33:10 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Krishnan Parthasarathi	44a9339c0a	Newer noncurrent versions (#13815 ) - Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions Note: We apply overlapping NewerNoncurrentVersions rules such that we honor the highest among applicable limits. e.g if 2 overlapping rules are configured with 2 and 3 noncurrent versions to be retained, we will retain 3. - Expire newer noncurrent versions after noncurrent days - MinIO extension: allow noncurrent days to be zero, allowing expiry of noncurrent version as soon as more than configured NewerNoncurrentVersions are present. - Allow NewerNoncurrentVersions rules on object-locked buckets - No x-amz-expiration when NewerNoncurrentVersions configured - ComputeAction should skip rules with NewerNoncurrentVersions > 0 - Add unit tests for lifecycle.ComputeAction - Support lifecycle rules with MaxNoncurrentVersions - Extend ExpectedExpiryTime to work with zero days - Fix all-time comparisons to be relative to UTC	2021-12-14 09:41:44 -08:00
Krishnan Parthasarathi	3da9ee15d3	Add MaxNoncurrentVersions to NoncurrentExpiration action (#13580 ) This unit allows users to limit the maximum number of noncurrent versions of an object. To enable this rule you need the following ilm.json ``` cat >> ilm.json <<EOF { "Rules": [ { "ID": "test-max-noncurrent", "Status": "Enabled", "Filter": { "Prefix": "user-uploads/" }, "NoncurrentVersionExpiration": { "MaxNoncurrentVersions": 5 } } ] } EOF mc ilm import myminio/mybucket < ilm.json ```	2021-11-19 17:54:10 -08:00
Harshavardhana	7752cdbfaf	fix: restored object to preserve x-amz-meta properly (#13664 ) with SelectRestoreRequest OutputLocation provides additional metadata for the object, this is not preserved due to argument order change.	2021-11-15 13:25:55 -08:00
Krishnan Parthasarathi	f3aeed77e5	Add immediate inline tiering support (#13298 )	2021-10-01 11:58:17 -07:00
Harshavardhana	50a68a1791	allow S3 gateway to support object locked buckets (#13257 ) - Supports object locked buckets that require PutObject() to set content-md5 always. - Use SSE-S3 when S3 gateway is being used instead of SSE-KMS for auto-encryption.	2021-09-21 09:02:15 -07:00
Harshavardhana	35f2552fc5	reduce extra getObjectInfo() calls during ILM transition (#13091 ) * reduce extra getObjectInfo() calls during ILM transition This PR also changes expiration logic to be non-blocking, scanner is now free from additional costs incurred due to slower object layer calls and hitting the drives. * move verifying expiration inside locks	2021-08-27 17:06:47 -07:00
Harshavardhana	bbf3576f70	remove unecessary metadata structs in applyTransitionAction() (#13059 )	2021-08-24 12:24:00 -07:00
Krishnan Parthasarathi	cf8abd8888	Add prometheus metrics for ILM tasks (#12933 )	2021-08-17 10:21:19 -07:00
Harshavardhana	ef4d023c85	fix: various performance improvements to tiering (#12965 ) - deletes should always Sweep() for tiering at the end and does not need an extra getObjectInfo() call - puts, copy and multipart writes should conditionally do getObjectInfo() when tiering targets are configured - introduce 'TransitionedObject' struct for ease of usage and understanding. - multiple-pools optimization deletes don't need to hold read locks verifying objects across namespace and pools.	2021-08-17 07:50:00 -07:00
Krishnan Parthasarathi	65b6f4aa31	Add dynamic reconfiguration of number of transition workers (#12926 )	2021-08-11 22:23:56 -07:00
Harshavardhana	a2cd3c9a1d	use ParseForm() to allow query param lookups once (#12900 ) ``` cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz BenchmarkURLQueryForm BenchmarkURLQueryForm-4 247099363 4.809 ns/op 0 B/op 0 allocs/op BenchmarkURLQuery BenchmarkURLQuery-4 2517624 462.1 ns/op 432 B/op 4 allocs/op PASS ok github.com/minio/minio/cmd 3.848s ```	2021-08-07 22:43:01 -07:00
Krishnan Parthasarathi	209e6d00c6	Use ObjectInfo.ToLifecycleOpts instead of literal values (#12772 ) Promote getLifecycleTransitionTier to a method on lifecycle.Lifecycle.	2021-07-21 19:12:44 -07:00
Krishnan Parthasarathi	d0963974a5	pkg/lifecycle: Add SetPredictionHeaders method (#12755 ) This method is used to add expected expiration and transition time for an object in GET/HEAD Object response headers. Also fixed bugs in lifecycle.PredictTransitionTime and getLifecycleTransitionTier in handling current and non-current versions.	2021-07-20 17:36:55 -07:00
Krishnan Parthasarathi	6ea083d197	Add deployment-id and source bucket to transitioned object name (#12693 ) This allows remote bucket admin to identify the origin of transitioned objects by simply inspecting the object prefixes. e.g let's take a remote tier TIER-1 pointing to a remote bucket (prefix) testbucket/testprefix-1. The remote bucket admin can list all transitioned objects from a MinIO deployment identified by '2e78e906-1c5d-4f94-8689-9df44cafde39' and source bucket 'mybucket' like so, ``` $ ./mc ls -r minio-tier-target/testbucket/testprefix-1/2e78e906-1c5d-4f94-8689-9df44cafde39/mybucket/ [2021-07-12 17:15:50 PDT] 160B 48/fb/48fbc0e6-3a73-458b-9337-8e722c619ca4 [2021-07-12 16:58:46 PDT] 160B 7d/1c/7d1c96bd-031a-48d4-99ea-b1304e870830 ```	2021-07-20 10:49:52 -07:00
Harshavardhana	4f6c74a257	simplify audit logging for replication and ILM (#12610 ) auditLog should be attempted right before the return of the function and not multiple times per function, this ensures that we only trigger it once per function call.	2021-07-01 14:02:44 -07:00
Krishnan Parthasarathi	a1df230518	Add a 'free' version to track deletion of tiered object content (#12470 )	2021-06-30 19:32:07 -07:00
Harshavardhana	41caf89cf4	fix: apply pre-conditions first on object metadata (#12545 ) This change in error flow complies with AWS S3 behavior for applications depending on specific error conditions. fixes #12543	2021-06-24 09:44:00 -07:00
Aditya Manthramurthy	30a3921d3e	[Tiering] Support remote tiers with object versioning (#12342 ) - Adds versioning support for S3 based remote tiers that have versioning enabled. This ensures that when reading or deleting we specify the specific version ID of the object. In case of deletion, this is important to ensure that the object version is actually deleted instead of simply being marked for deletion. - Stores the remote object's version id in the tier-journal. Tier-journal file version is not bumped up as serializing the new struct version is compatible with old journals without the remote object version id. - `storageRESTVersion` is bumped up as FileInfo struct now includes a `TransitionRemoteVersionID` member. - Azure and GCS support for this feature will be added subsequently. Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2021-06-03 14:26:51 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Krishnan Parthasarathi	cfa94cc35c	Simplify remote tier validation in lifecycle rule validation (#12329 )	2021-05-19 18:51:23 -07:00
Krishnan Parthasarathi	860bf1bab2	Add IsRemote method on FileInfo, ObjectInfo (#12209 ) Provides a convenient method to know if an object's contents are in its remote tier.	2021-05-04 08:40:42 -07:00
Harshavardhana	0d3ddf7286	fix: improve NewObjectReader implementation for careful cleanup usage (#12199 ) cleanup functions should never be cleaned before the reader is instantiated, this type of design leads to situations where order of lockers and places for them to use becomes confusing. Allow WithCleanupFuncs() if the caller wishes to add cleanupFns to be run upon close() or an error during initialization of the reader. Also make sure streams are closed before we unlock the resources, this allows for ordered cleanup of resources.	2021-04-30 18:37:58 -07:00
Krishnan Parthasarathi	0c9d095deb	ilm: Close warmBackend GetObject reader (#12174 )	2021-04-27 22:42:18 -07:00
Harshavardhana	cbfdf97abf	Use CompleteMultipartUpload in RestoreTransitionedObject Signed-off-by: Krishnan Parthasarathi <kp@minio.io>	2021-04-23 11:58:53 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Anis Elleuch	c9dfa0d87b	audit: Add field to know who triggered the operation (#12129 ) This is for now needed to know if an external S3 request deleted a file or it was the scanner. Signed-off-by: Anis Elleuch <anis@min.io>	2021-04-23 09:51:12 -07:00
Harshavardhana	0a9d8dfb0b	fix: crash in single drive mode for lifecycle (#12077 ) also make sure to close the channel on the producer side, not in a separate go-routine, this can lead to races between a writer and a closer. fixes #12073	2021-04-16 14:09:25 -07:00
Harshavardhana	641150f2a2	change updateVersion to only update keys, no deletes (#12032 ) there are situations where metadata can have keys with empty values, preserve existing behavior	2021-04-10 09:13:12 -07:00
Harshavardhana	4223ebab8d	fix: remove auto-close GetObjectReader (#12009 ) locks can get relinquished when Read() sees io.EOF leading to prematurely closing of the readers concurrent writes on the same object can have undesired consequences here when these locks are relinquished.	2021-04-07 13:29:27 -07:00
Andreas Auernhammer	d4b822d697	pkg/etag: add new package for S3 ETag handling (#11577 ) This commit adds a new package `etag` for dealing with S3 ETags. Even though ETag is often viewed as MD5 checksum of an object, handling S3 ETags correctly is a surprisingly complex task. While it is true that the ETag corresponds to the MD5 for the most basic S3 API operations, there are many exceptions in case of multipart uploads or encryption. In worse, some S3 clients expect very specific behavior when it comes to ETags. For example, some clients expect that the ETag is a double-quoted string and fail otherwise. Non-AWS compliant ETag handling has been a source of many bugs in the past. Therefore, this commit adds a dedicated `etag` package that provides functionality for parsing, generating and converting S3 ETags. Further, this commit removes the ETag computation from the `hash` package. Instead, the `hash` package (i.e. `hash.Reader`) should focus only on computing and verifying the content-sha256. One core feature of this commit is to provide a mechanism to communicate a computed ETag from a low-level `io.Reader` to a high-level `io.Reader`. This problem occurs when an S3 server receives a request and has to compute the ETag of the content. However, the server may also wrap the initial body with several other `io.Reader`, e.g. when encrypting or compressing the content: ``` reader := Encrypt(Compress(ETag(content))) ``` In such a case, the ETag should be accessible by the high-level `io.Reader`. The `etag` provides a mechanism to wrap `io.Reader` implementations such that the `ETag` can be accessed by a type-check. This technique is applied to the PUT, COPY and Upload handlers.	2021-02-23 12:31:53 -08:00
Krishnan Parthasarathi	b87fae0049	Simplify PutObjReader for plain-text reader usage (#11470 ) This change moves away from a unified constructor for plaintext and encrypted usage. NewPutObjReader is simplified for the plain-text reader use. For encrypted reader use, WithEncryption should be called on an initialized PutObjReader. Plaintext: func NewPutObjReader(rawReader hash.Reader) PutObjReader The hash.Reader is used to provide payload size and md5sum to the downstream consumers. This is different from the previous version in that there is no need to pass nil values for unused parameters. Encrypted: func WithEncryption(encReader hash.Reader, key crypto.ObjectKey) (*PutObjReader, error) This method sets up encrypted reader along with the key to seal the md5sum produced by the plain-text reader (already setup when NewPutObjReader was called). Usage: ``` pReader := NewPutObjReader(rawReader) // ... other object handler code goes here // Prepare the encrypted hashed reader pReader, err = pReader.WithEncryption(encReader, objEncKey) ```	2021-02-10 08:52:50 -08:00
Harshavardhana	68d299e719	fix: case-insensitive lookups for metadata (#11489 ) continuation of #11487, with more changes	2021-02-08 18:12:28 -08:00
Poorna Krishnamoorthy	f9c5636c2d	fix: lookup metdata case insensitively (#11487 ) while setting replication options	2021-02-08 16:19:05 -08:00
Anis Elleuch	275f7a63e8	lc: Apply DeleteAction correctly to objects (#11471 ) When lifecycle decides to Delete an object and not a version in a versioned bucket, the code should create a delete marker and not removing the scanned version. This commit fixes the issue.	2021-02-06 16:10:33 -08:00
Anis Elleuch	65aa2bc614	ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296 ) Remove an object on the fly if there is a lifecycle rule with delete expiry action for the corresponding object.	2021-02-01 09:52:11 -08:00
Harshavardhana	7e266293e6	fix: notify bucket replication after replication/ilm (#11343 )	2021-01-25 14:04:41 -08:00
Poorna Krishnamoorthy	7824e19d20	Allow synchronous replication if enabled. (#11165 ) Synchronous replication can be enabled by setting the --sync flag while adding a remote replication target. This PR also adds proxying on GET/HEAD to another node in a active-active replication setup in the event of a 404 on the current node.	2021-01-11 22:36:51 -08:00
Klaus Post	51dad1d130	Fix missing GetObjectNInfo Closure (#11243 ) Review for missing Close of returned value from `GetObjectNInfo`. This was often obscured by the stuff that auto-unlocks when reaching EOF.	2021-01-08 10:12:26 -08:00
Harshavardhana	f0808bb2e5	fix: getObject fd leaks in transition and replication code (#11237 )	2021-01-06 16:13:10 -08:00
Andreas Auernhammer	8cdf2106b0	refactor cmd/crypto code for SSE handling and parsing (#11045 ) This commit refactors the code in `cmd/crypto` and separates SSE-S3, SSE-C and SSE-KMS. This commit should not cause any behavior change except for: - `IsRequested(http.Header)` which now returns the requested type {SSE-C, SSE-S3, SSE-KMS} and does not consider SSE-C copy headers. However, SSE-C copy headers alone are anyway not valid.	2020-12-22 09:19:32 -08:00
Poorna Krishnamoorthy	934bed47fa	Add transition event notification (#11047 ) This is a MinIO specific extension to allow monitoring of transition events.	2020-12-07 13:53:28 -08:00
Poorna Krishnamoorthy	2ff655a745	Refactor replication, ILM handling in DELETE API (#10945 )	2020-11-25 11:24:50 -08:00
Poorna Krishnamoorthy	08b24620c0	Display storage-class of transitioned object in HEAD	2020-11-20 09:17:31 -08:00

1 2

58 Commits