Commit Graph

185 Commits

Author SHA1 Message Date
Harshavardhana e9055e9ef7
fix: walk() should cancel itself upon context cancellation (#15553)
This PR fixes possible leaks that may emanate from not
listening on context cancelation or timeouts.

```
goroutine 60957610 [chan send, 16 minutes]:
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1.1(...)
        github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368
github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},...
        github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1()
        github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1
        github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7
```
2022-08-18 17:49:08 -07:00
Poorna 21fe14201f
replication: centralize healthcheck for remote targets (#15516)
This PR moves health check from minio-go client to being
managed on the server.

Additionally integrating health check into site replication
2022-08-16 17:46:22 -07:00
Poorna 21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Poorna 5e0776e96a
replication: Include replica object versions for resync (#15427) 2022-07-28 13:43:02 -07:00
Harshavardhana 5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Poorna cab8d3d568
feat: add API to return list of objects waiting to be replicated (#15091) 2022-07-21 11:05:44 -07:00
Poorna b4f6901903
resync: Avoid concurrent access/write on map (#15286)
fixes a crash

```
fatal error: concurrent map iteration and map write
minio[19309]: goroutine 18640 [running]:
minio[19309]: runtime.throw({0x27a3399?, 0x1785?})
minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671
minio[19309]: runtime.mapiternext(0xc0062f1e90?)
minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b
minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0
```
2022-07-13 16:29:10 -07:00
Harshavardhana 0a8b78cb84
fix: simplify passing auditLog eventType (#15278)
Rename Trigger -> Event to be a more appropriate
name for the audit event.

Bonus: fixes a bug in AddMRFWorker() it did not
cancel the waitgroup, leading to waitgroup leaks.
2022-07-12 10:43:32 -07:00
Harshavardhana 31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Harshavardhana 48e367ff7d
reject resync start on misconfigured replication rules (#15041)
we expect resync to start on buckets with replication
rule ExistingObjects enabled, if not we reject such
calls.
2022-06-06 02:54:39 -07:00
Harshavardhana 52221db7ef
fix: for unexpected errors in reading versioning config panic (#14994)
We need to make sure if we cannot read bucket metadata
for some reason, and bucket metadata is not missing and
returning corrupted information we should panic such
handlers to disallow I/O to protect the overall state
on the system.

In-case of such corruption we have a mechanism now
to force recreate the metadata on the bucket, using
`x-minio-force-create` header with `PUT /bucket` API
call.

Additionally fix the versioning config updated state
to be set properly for the site replication healing
to trigger correctly.
2022-05-31 02:57:57 -07:00
Harshavardhana f1abb92f0c
feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Poorna 5c81d0d89a
site replication: heal missing/invalid replication config (#14979)
Validate remote target ARNs and heal any stale rules in
the replication config
2022-05-26 17:57:23 -07:00
Harshavardhana f8650a3493
fetch bucket replication stats across peers in single call (#14956)
current implementation relied on recursively calling one bucket
at a time across all peers, this would be very slow and chatty
when there are 100's of buckets which would mean 100*peerCount
amount of network operations.

This PR attempts to reduce this entire call into `peerCount`
amount of network calls only. This functionality addresses also a
concern where the Prometheus metrics would significantly slow
down when one of the peers is offline.
2022-05-23 09:15:30 -07:00
Harshavardhana 6cfb1cb6fd
fix: timer usage across codebase (#14935)
it seems in some places we have been wrongly using the
timer.Reset() function, nicely exposed by an example
shared by @donatello https://go.dev/play/p/qoF71_D1oXD

this PR fixes all the usage comprehensively
2022-05-17 22:42:59 -07:00
Harshavardhana 62aa42cccf
avoid replication proxy on version excluded paths (#14878)
no need to attempt proxying objects that were
never replicated, but do have local `null`
versions on them.
2022-05-08 16:50:31 -07:00
Harshavardhana 5cffd3780a
fix: multiple fixes in prefix exclude implementation (#14877)
- do not need to restrict prefix exclusions that do not
  have `/` as suffix, relax this requirement as spark may
  have staging folders with other autogenerated characters
  , so we are better off doing full prefix March and skip. 

- multiple delete objects was incorrectly creating a
  null delete marker on a versioned bucket instead of
  creating a proper versioned delete marker.

- do not suspend paths on the excluded prefixes during
  delete operations to avoid creating `null` delete markers,
  honor suspension of versioning only at bucket level for
  delete markers.
2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Poorna 3a64580663
Add support for site replication healing (#14572)
heal bucket metadata and IAM entries for
sites participating in site replication from
the site with the most updated entry.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <aditya@minio.io>
2022-04-24 02:36:31 -07:00
Krishnan Parthasarathi 7b81967a3c
Fix handling of object versions pending purge (#14555)
- GetObject() with vid should return 405
- GetObject() without vid should return 404
- ListObjects() should ignore this object if this is the "latest" version of the object
- ListObjectVersions() should list this object as "DELETE marker"
- Remove data parts before sync'ing the version pending purge
2022-03-16 16:59:43 -07:00
Poorna 1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Poorna ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Poorna 288e276abe
Specify tags in options while selecting replication targets (#14126)
When the replication rule is based on tag matches, the replication process
should pick up targets matching the tags specified in the replication
rule.

Fixing regression due to #12880
2022-01-19 10:45:42 -08:00
Harshavardhana cc3f139d1f
replication: attempt abort multipart-upload at max 3 times on remote (#14087)
this is mainly an attempt to relinquish space on the remote
site, if this still doesn't do it we give and let the admin
know with a log message.
2022-01-11 22:32:29 -08:00
Poorna 54a98773f8
fix: replication of tag removal (#14056)
Currently tag removal leaves replication state as `PENDING` 
because the `HEAD` api returns just a tag count but not the 
actual tags, and this is treated as a no-op
2022-01-10 19:06:10 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Aditya Manthramurthy 997e808088
fix; race in bucket replication stats (#13942)
- r.ulock was not locked when r.UsageCache was being modified

Bonus:

- simplify code by removing some unnecessary clone methods - we can 
do this because go arrays are values (not pointers/references) that are 
automatically copied on assignment.

- remove some unnecessary map allocation calls
2021-12-17 15:33:13 -08:00
Poorna K e270ab65b3
fix: healing of replication delete markers (#13933)
A corner case can occur where the delete-marker was propagated 
but the metadata could not be updated on the primary. Sending 
a RemoveObject call with the Delete marker version would end 
up permanently deleting the version on target. Instead, perform 
a Stat on the delete-marker version on target and redo replication 
only if the delete-marker is missing on target.
2021-12-16 15:34:55 -08:00
Poorna K d422d24278
replication: warn if insufficient workers (#13899)
This should give an early warning if configured replication 
workers are insufficient to meet application workload.
2021-12-13 18:22:56 -08:00
Harshavardhana 914bfb2d9c
fix: allow compaction on replicated buckets (#13711)
currently getReplicationConfig() failure incorrectly
returns error on unexpected buckets upon upgrade, we
should always calculate usage as much as possible.
2021-11-19 14:46:14 -08:00
Klaus Post faf013ec84
Improve performance on multiple versions (#13573)
Existing:

```go
type xlMetaV2 struct {
    Versions []xlMetaV2Version `json:"Versions" msg:"Versions"`
}
```

Serialized as regular MessagePack.

```go
//msgp:tuple xlMetaV2VersionHeader
type xlMetaV2VersionHeader struct {
	VersionID [16]byte
	ModTime   int64
	Type      VersionType
	Flags     xlFlags
}
```

Serialize as streaming MessagePack, format:

```
int(headerVersion)
int(xlmetaVersion)
int(nVersions)
for each version {
    binary blob, xlMetaV2VersionHeader, serialized
    binary blob, xlMetaV2Version, serialized.
}
```

xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct 
can easily be reused and does not contain pointers, so efficient as a 
slice (single allocation)

This allows quickly parsing everything as slices of bytes (no copy).

Versions are always *saved* sorted by modTime, newest *first*. 
No more need to sort on load.

* Allows checking if a version exists.
* Allows reading single version without unmarshal all.
* Allows reading latest version of type without unmarshal all.
* Allows reading latest version without unmarshal of all.
* Allows checking if the latest is deleteMarker by reading first entry.
* Allows adding/updating/deleting a version with only header deserialization.
* Reduces allocations on conversion to FileInfo(s).
2021-11-18 12:15:22 -08:00
Anis Elleuch 4caed7cc0d
metrics: Add replication latency metrics (#13515)
Add a new Prometheus metric for bucket replication latency

e.g.:
minio_bucket_replication_latency_ns{
    bucket="testbucket",
    operation="upload",
    range="LESS_THAN_1_MiB",
    server="127.0.0.1:9001",
    targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-11-17 12:10:57 -08:00
Harshavardhana 661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana 4ed0eb7012
remove double reads updating object metadata (#13542)
Removes RLock/RUnlock for updating metadata,
since we already take a write lock to update
metadata, this change removes reading of xl.meta
as well as an additional lock, the performance gain
should increase 3x theoretically for

- PutObjectRetention
- PutObjectLegalHold

This optimization is mainly for Veeam like
workloads that require a certain level of iops
from these API calls, we were losing iops.
2021-10-30 08:22:04 -07:00
Poorna K e7f559c582
Fixes to replication metrics (#13493)
For reporting ReplicaSize and loading initial
replication metrics correctly.
2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy 7f6ed35347
Allow null versions to be replicated (#13310)
for pre-existing objects present in a bucket
prior to enabling existing object replication.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-09-28 10:26:12 -07:00
Poorna Krishnamoorthy 19ecdc75a8
replication: Simplify metrics calculation (#13274)
Also doing some code cleanup
2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy 806b10b934
fix: improve error messages returned during replication setup (#13261) 2021-09-21 13:03:20 -07:00
Poorna Krishnamoorthy c4373ef290
Add support for multi site replication (#12880) 2021-09-18 13:31:35 -07:00
Harshavardhana 0892f1e406
fix: multipart replication and encrypted etag for sse-s3 (#13171)
Replication was not working properly for encrypted
objects in single PUT object for preserving etag,

We need to make sure to preserve etag such that replication
works properly and not gets into infinite loops of copying
due to ETag mismatches.
2021-09-08 22:25:23 -07:00
Poorna Krishnamoorthy 9af4e7b1da
Add healthcheck back for replication targets (#13168)
This will allow objects to relinquish read lock held during
replication earlier if the target is known to be down
without waiting for connection timeout when replication 
is attempted.
2021-09-08 15:34:50 -07:00
Poorna Krishnamoorthy a366143c5b
Remove replication permission check (#13135)
Fixes #13105
2021-09-02 09:31:13 -07:00
Poorna Krishnamoorthy 6a7e22386e
Use part sizes correctly in multipart replication (#13061)
fixes #13057
2021-08-24 14:41:05 -07:00
Poorna Krishnamoorthy 674c6f7a7b
fix: resync of replication of delete markers (#12932)
Fixes #12919
2021-08-23 14:48:22 -07:00
Klaus Post 63f3e5c3fc
replication: Lock object while replicating (#13014)
Introduce a replication lock that will ensure that only one replication 
operation will run for any given object at any time.

Fixes #13013
2021-08-23 08:16:18 -07:00
Harshavardhana 3c34e18a4e
allow multipart uploads for single part multipart (#12821)
its possible that some multipart uploads would have
uploaded only single parts so relying on `len(o.Parts)`
alone is not sufficient, we need to look for ETag
pattern to be absolutely sure.
2021-07-28 22:11:55 -07:00
Poorna Krishnamoorthy b6cd54779c
Increase context timeout for bandwidth throttled reader (#12820)
increase default timeout up to one hour for toy setups.

fixes #12812
2021-07-28 15:20:01 -07:00
Anis Elleuch b8f95fb3d4
fix: Use correct replication status in replication healing (#12711)
In case of replication healing, we always store completed status in the
object metadata, which is wrong because replication could fail in the
further retries.
2021-07-14 09:58:46 -07:00
Harshavardhana 4f6c74a257
simplify audit logging for replication and ILM (#12610)
auditLog should be attempted right before the
return of the function and not multiple times
per function, this ensures that we only trigger
it once per function call.
2021-07-01 14:02:44 -07:00
Poorna Krishnamoorthy a3f0288262
Use multipart call for replication (#12535)
if object was uploaded with multipart. This is to ensure that
GetObject calls with partNumber in URI request parameters
have same behavior on source and replication target.
2021-06-30 07:44:24 -07:00
Poorna Krishnamoorthy a69c2a2fb3
Change replication to use read lock instead of writelock (#12581)
Fixes #12573

This PR also adding audit logging for replication activity
2021-06-28 23:58:08 -07:00
Poorna Krishnamoorthy d00783c923
Use rate.Limiter for bandwidth monitoring (#12506)
Bonus: fixes a hang when bandwidth caps are enabled for
synchronous replication
2021-06-24 18:29:30 -07:00
Harshavardhana 41caf89cf4
fix: apply pre-conditions first on object metadata (#12545)
This change in error flow complies with AWS S3 behavior
for applications depending on specific error conditions.

fixes #12543
2021-06-24 09:44:00 -07:00
Harshavardhana cdeccb5510
feat: Deprecate embedded browser and import console (#12460)
This feature also changes the default port where
the browser is running, now the port has moved
to 9001 and it can be configured with

```
--console-address ":9001"
```
2021-06-17 20:27:04 -07:00
Poorna Krishnamoorthy dbea8d2ee0
Add support for existing object replication. (#12109)
Also adding an API to allow resyncing replication when
existing object replication is enabled and the remote target
is entirely lost. With the `mc replicate reset` command, the
objects that are eligible for replication as per the replication
config will be resynced to target if existing object replication
is enabled on the rule.
2021-06-01 19:59:11 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana fdc2020b10
move to iam, bucket policy from minio/pkg (#12400) 2021-05-29 21:16:42 -07:00
Poorna Krishnamoorthy 547bb7d0a1
replication: Init worker kill channel correctly (#12379)
Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-05-28 13:28:37 -07:00
Poorna Krishnamoorthy 951acf561c
Add support for syncing replica modifications (#11104)
when bidirectional replication is set up.

If ReplicaModifications is enabled in the replication
configuration, sync metadata updates to source if
replication rules are met. By default, if this
configuration is unset, MinIO automatically sync's
metadata updates on replica back to the source.
2021-05-13 19:20:45 -07:00
Harshavardhana 1aa5858543
move madmin to github.com/minio/madmin-go (#12239) 2021-05-06 08:52:02 -07:00
Harshavardhana f7a87b30bf Revert "deprecate embedded browser (#12163)"
This reverts commit 736d8cbac4.

Bring contrib files for older contributions
2021-04-30 08:50:39 -07:00
Harshavardhana 0faa4e6187
fix: make sure failed requests only to failed queue (#12196)
failed queue should be used for retried requests to
avoid cascading the failures into incoming queue, this
would allow for a more fair retry for failed replicas.

Additionally also avoid taking context in queue task
to avoid confusion, simplifies its usage.
2021-04-29 18:20:39 -07:00
Poorna Krishnamoorthy 90112b5644
Update ReplicationStatus if metadata not updated correctly (#12191)
There can be situations where replication completed but the
`X-Amz-Replication-Status` metadata update failed such as
when the server returns 503 under high load. This object version will
continue to be picked up by the scanner and replicateObject would perform
no action since the versions match between source and target.
The metadata would never reflect that replication was successful
without this fix, leading to repeated re-queuing.
2021-04-29 16:46:26 -07:00
Harshavardhana c4b21ac7fa
fix: remove healthcheck routine for replication targets (#12192)
Bonus also fix a racy lookup on arnsMap() without a
read lock, hold read locks to avoid such race.

moving the healthcheck logic to minio-go
2021-04-29 16:41:28 -07:00
Poorna Krishnamoorthy 632252ff1d
fix: change SetRemoteTarget API to allow editing remote target granularly (#12175)
Currently, only credentials could be updated with
`mc admin bucket remote edit`. 

Allow updating synchronous replication flag, path, 
bandwidth and healthcheck duration on buckets, and
a flag to disable proxying in active-active replication.
2021-04-28 15:26:20 -07:00
Harshavardhana 736d8cbac4
deprecate embedded browser (#12163)
https://github.com/minio/console takes over the functionality for the
future object browser development

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-27 10:52:12 -07:00
Harshavardhana 82dc6aff1c
add support for configurable replication MRF workers (#12125)
just like replication workers, allow failed replication
workers to be configurable in situations like DR failures
etc to catch up on replication sooner when DR is back
online.

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 21:58:45 -07:00
Poorna Krishnamoorthy 014e419151
fix: ensure pending replication queued to MRF queue (#12138)
Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-04-23 16:52:57 -07:00
Krishnan Parthasarathi c829e3a13b Support for remote tier management (#12090)
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.

Some new additions include:

 - Admin APIs remote tier configuration management

 - Simple journal to track remote objects to be 'collected'
   This is used by object API handlers which 'mutate' object versions by
   overwriting/replacing content (Put/CopyObject) or removing the version
   itself (e.g DeleteObjectVersion).

 - Rework of previous ILM transition to fit the new model
   In the new model, a storage class (a.k.a remote tier) is defined by the
   'remote' object storage type (one of s3, azure, GCS), bucket name and a
   prefix.

* Fixed bugs, review comments, and more unit-tests

- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana 069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana 2ef824bbb2
collapse two distinct calls into single RenameData() call (#12093)
This is an optimization by reducing one extra system call,
and many network operations. This reduction should increase
the performance for small file workloads.
2021-04-20 10:44:39 -07:00
Harshavardhana 0a9d8dfb0b
fix: crash in single drive mode for lifecycle (#12077)
also make sure to close the channel on the producer
side, not in a separate go-routine, this can lead
to races between a writer and a closer.

fixes #12073
2021-04-16 14:09:25 -07:00
Poorna Krishnamoorthy d30c5d1cf0
Avoid metadata update for incoming replication failure (#12054)
This is an optimization to save IOPS. The replication
failures will be re-queued once more to re-attempt
replication. If it still does not succeed, the replication
status is set as `FAILED` and will be caught up on
scanner cycle.
2021-04-15 16:32:00 -07:00
Harshavardhana abb55bd49e
fix: properly close leaking bandwidth monitor channel (#11967)
This PR fixes

- close leaking bandwidth report channel leakage
- remove the closer requirement for bandwidth monitor
  instead if Read() fails remember the error and return
  error for all subsequent reads.
- use locking for usage-cache.bin updates, with inline
  data we cannot afford to have concurrent writes to
  usage-cache.bin corrupting xl.meta
2021-04-05 16:07:53 -07:00
Harshavardhana 09ee303244
add cluster support for realtime bucket stats (#11963)
implementation in #11949 only catered from single
node, but we need cluster metrics by capturing
from all peers. introduce bucket stats API that
will be used for capturing in-line bucket usage
as well eventually
2021-04-04 15:34:33 -07:00
Harshavardhana d46386246f
api: Introduce metadata update APIs to update only metadata (#11962)
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.

This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.

This helps reduce the chattiness for following calls..

- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy 47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Harshavardhana 8e6e287729
fix: delete/delete marker replication versions consistent (#11932)
replication didn't work as expected when deletion of
delete markers was requested in DeleteMultipleObjects
API, this is due to incorrect lookup elements being
used to look for delete markers.
2021-03-30 17:15:36 -07:00
Poorna Krishnamoorthy 5e003549cc
Replication: Enforce DeleteMarker disable setting (#11720)
This PR also enforces DeleteReplication
disable setting
2021-03-13 10:28:35 -08:00
Poorna Krishnamoorthy 2f29719e6b
resize replication worker pool dynamically after config update (#11737) 2021-03-09 02:56:42 -08:00
Poorna Krishnamoorthy 690434514d
Avoid notification event for replicas (#11683)
Creating notification events for replica creation
is not particularly useful to send as the notification
event generated at source already includes replication
completion events.

For applications using replica cluster as failover, avoiding
duplicate notifications for replica event will allow seamless
failover.
2021-03-03 11:13:31 -08:00
Poorna Krishnamoorthy 85d2187c20
fix: ETag mismatch for large upload in replica (#11587) 2021-02-20 00:22:17 -08:00
Poorna Krishnamoorthy 2dce5d9442
fix: delete marker permanent delete replication (#11581) 2021-02-18 16:35:37 -08:00
Poorna Krishnamoorthy 8e8a792d9d
Allow delete marker replication from replica (#11566)
in the case of active-active replication.

This PR also has the following changes:

- add docs on replication design
- fix corner case of completing versioned delete on a delete marker
  when the target is down and `mc rm --vid` is performed repeatedly. Instead
  the version should still be retained in the `PENDING|FAILED` state until
  replication sync completes.
- remove `s3:Replication:OperationCompletedReplication` and
   `s3:Replication:OperationFailedReplication` from ObjectCreated 
  events type
2021-02-18 00:33:51 -08:00
Harshavardhana 7875d472bc
avoid notification for non-existent delete objects (#11514)
Skip notifications on objects that might have had
an error during deletion, this also avoids unnecessary
replication attempt on such objects.

Refactor some places to make sure that we have notified
the client before we

- notify
- schedule for replication
- lifecycle etc.
2021-02-10 22:00:42 -08:00
Poorna Krishnamoorthy e6b4ea7618
More fixes for delete marker replication (#11504)
continuation of PR#11491 for multiple server pools and
bi-directional replication.

Moving proxying for GET/HEAD to handler level rather than
server pool layer as this was also causing incorrect proxying 
of HEAD.

Also fixing metadata update on CopyObject - minio-go was not passing
source version ID in X-Amz-Copy-Source header
2021-02-10 17:25:04 -08:00
Harshavardhana cbf4bb62e0
fix: getPoolIdx decouple from top level options (#11512)
top-level options shouldn't be passed down for
GetObjectInfo() while verifying the objects in
different pools, this is to make sure that
we always get the value from the pool where
the object exists.
2021-02-10 11:45:02 -08:00
Poorna Krishnamoorthy 93eb549a83
fix: duplicate delete marker attempts in bi-directional replication (#11491) 2021-02-09 15:11:43 -08:00
Harshavardhana 68d299e719
fix: case-insensitive lookups for metadata (#11489)
continuation of #11487, with more changes
2021-02-08 18:12:28 -08:00
Poorna Krishnamoorthy f9c5636c2d
fix: lookup metdata case insensitively (#11487)
while setting replication options
2021-02-08 16:19:05 -08:00
Poorna Krishnamoorthy 8e1bbd989a
replication:alloc UserDefined map before use (#11478) 2021-02-07 22:01:10 -08:00
Harshavardhana f108873c48
fix: replication metadata comparsion and other fixes (#11410)
- using miniogo.ObjectInfo.UserMetadata is not correct
- using UserTags from Map->String() can change order
- ContentType comparison needs to be removed.
- Compare both lowercase and uppercase key names.
- do not silently error out constructing PutObjectOptions
  if tag parsing fails
- avoid notification for empty object info, failed operations
  should rely on valid objInfo for notification in all
  situations
- optimize copyObject implementation, also introduce a new 
  replication event
- clone ObjectInfo() before scheduling for replication
- add additional headers for comparison
- remove strings.EqualFold comparison avoid unexpected bugs
- fix pool based proxying with multiple pools
- compare only specific metadata

Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
2021-02-03 20:41:33 -08:00
Poorna Krishnamoorthy fe3aca70c3
Make number of replication workers configurable. (#11379)
MINIO_API_REPLICATION_WORKERS env.var and
`mc admin config set api` allow number of replication
workers to be configurable. Defaults to half the number
of cpus available.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-02-02 16:45:06 +05:30
Poorna Krishnamoorthy fd3f02637a
fix: replication regression due to proxying requests (#11356)
In PR #11165 due to incorrect proxying for 2 
way replication even when the object was not 
yet replicated

Additionally, fix metadata comparisons when
deciding to do full replication vs metadata copy.

fixes #11340
2021-01-27 11:22:34 -08:00
Harshavardhana 7e266293e6
fix: notify bucket replication after replication/ilm (#11343) 2021-01-25 14:04:41 -08:00
Poorna Krishnamoorthy feaf8dfb9a
Fix replication status reported on completion (#11273)
Fixes: #11272
2021-01-13 11:52:28 -08:00
Poorna Krishnamoorthy 7824e19d20
Allow synchronous replication if enabled. (#11165)
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.

This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
2021-01-11 22:36:51 -08:00
Klaus Post 51dad1d130
Fix missing GetObjectNInfo Closure (#11243)
Review for missing Close of returned value from `GetObjectNInfo`.

This was often obscured by the stuff that auto-unlocks when reaching EOF.
2021-01-08 10:12:26 -08:00
Harshavardhana f0808bb2e5
fix: getObject fd leaks in transition and replication code (#11237) 2021-01-06 16:13:10 -08:00