minio/internal
Krishnan Parthasarathi ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
..
arn run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
auth add gocritic/ruleguard checks back again, cleanup code. (#13665) 2021-11-16 09:28:29 -08:00
bpool run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
bucket feat: implement prefix-level versioning exclusion (#14828) 2022-05-06 19:05:28 -07:00
color rename all remaining packages to internal/ (#12418) 2021-06-01 14:59:40 -07:00
config Reorganize OpenID config (#14871) 2022-05-05 13:40:06 -07:00
crypto listing: improve listing of encrypted objects (#14667) 2022-04-04 11:42:03 -07:00
disk run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
dsync tests: Clean up dsync package (#14415) 2022-03-01 11:14:28 -08:00
etag etag: add `Format` and `Decrypt` functions (#14659) 2022-04-03 13:29:13 -07:00
event re-use transport for AdminInfo() call (#14571) 2022-03-17 16:20:10 -07:00
fips tls: add TLS 1.3 ciphers to the list of supported ciphers (#13158) 2021-09-07 09:57:32 -07:00
handlers run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
hash fix: enable go1.17 github ci/cd (#12997) 2021-08-18 18:35:22 -07:00
http allow forcibly creating metadata on buckets (#14820) 2022-04-27 04:44:07 -07:00
init Disable AVX512 on Darwin (#13550) 2021-11-01 08:03:16 -07:00
ioutil Add local disk health checks (#14447) 2022-03-09 11:38:54 -08:00
jwt run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
kernel update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
kms listing: improve listing of encrypted objects (#14667) 2022-04-04 11:42:03 -07:00
lock run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
logger Add "enable" to config help (#14866) 2022-05-05 04:17:04 -07:00
lsync run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
mountinfo run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
pubsub rename all remaining packages to internal/ (#12418) 2021-06-01 14:59:40 -07:00
rest cleanup dsync tests and remove net/rpc references (#14118) 2022-01-18 12:44:38 -08:00
s3select start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
smart run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
sync/errgroup rename all remaining packages to internal/ (#12418) 2021-06-01 14:59:40 -07:00