Commit Graph

234 Commits

Author SHA1 Message Date
Poorna
21fe14201f
replication: centralize healthcheck for remote targets (#15516)
This PR moves health check from minio-go client to being
managed on the server.

Additionally integrating health check into site replication
2022-08-16 17:46:22 -07:00
Poorna
21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
ebozduman
b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Poorna
5e0776e96a
replication: Include replica object versions for resync (#15427) 2022-07-28 13:43:02 -07:00
Harshavardhana
5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Poorna
cab8d3d568
feat: add API to return list of objects waiting to be replicated (#15091) 2022-07-21 11:05:44 -07:00
Poorna
b4f6901903
resync: Avoid concurrent access/write on map (#15286)
fixes a crash

```
fatal error: concurrent map iteration and map write
minio[19309]: goroutine 18640 [running]:
minio[19309]: runtime.throw({0x27a3399?, 0x1785?})
minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671
minio[19309]: runtime.mapiternext(0xc0062f1e90?)
minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b
minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0
```
2022-07-13 16:29:10 -07:00
Harshavardhana
0a8b78cb84
fix: simplify passing auditLog eventType (#15278)
Rename Trigger -> Event to be a more appropriate
name for the audit event.

Bonus: fixes a bug in AddMRFWorker() it did not
cancel the waitgroup, leading to waitgroup leaks.
2022-07-12 10:43:32 -07:00
Harshavardhana
31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Harshavardhana
48e367ff7d
reject resync start on misconfigured replication rules (#15041)
we expect resync to start on buckets with replication
rule ExistingObjects enabled, if not we reject such
calls.
2022-06-06 02:54:39 -07:00
Harshavardhana
52221db7ef
fix: for unexpected errors in reading versioning config panic (#14994)
We need to make sure if we cannot read bucket metadata
for some reason, and bucket metadata is not missing and
returning corrupted information we should panic such
handlers to disallow I/O to protect the overall state
on the system.

In-case of such corruption we have a mechanism now
to force recreate the metadata on the bucket, using
`x-minio-force-create` header with `PUT /bucket` API
call.

Additionally fix the versioning config updated state
to be set properly for the site replication healing
to trigger correctly.
2022-05-31 02:57:57 -07:00
Harshavardhana
f1abb92f0c
feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Poorna
5c81d0d89a
site replication: heal missing/invalid replication config (#14979)
Validate remote target ARNs and heal any stale rules in
the replication config
2022-05-26 17:57:23 -07:00
Harshavardhana
f8650a3493
fetch bucket replication stats across peers in single call (#14956)
current implementation relied on recursively calling one bucket
at a time across all peers, this would be very slow and chatty
when there are 100's of buckets which would mean 100*peerCount
amount of network operations.

This PR attempts to reduce this entire call into `peerCount`
amount of network calls only. This functionality addresses also a
concern where the Prometheus metrics would significantly slow
down when one of the peers is offline.
2022-05-23 09:15:30 -07:00
Harshavardhana
6cfb1cb6fd
fix: timer usage across codebase (#14935)
it seems in some places we have been wrongly using the
timer.Reset() function, nicely exposed by an example
shared by @donatello https://go.dev/play/p/qoF71_D1oXD

this PR fixes all the usage comprehensively
2022-05-17 22:42:59 -07:00
Harshavardhana
62aa42cccf
avoid replication proxy on version excluded paths (#14878)
no need to attempt proxying objects that were
never replicated, but do have local `null`
versions on them.
2022-05-08 16:50:31 -07:00
Harshavardhana
5cffd3780a
fix: multiple fixes in prefix exclude implementation (#14877)
- do not need to restrict prefix exclusions that do not
  have `/` as suffix, relax this requirement as spark may
  have staging folders with other autogenerated characters
  , so we are better off doing full prefix March and skip. 

- multiple delete objects was incorrectly creating a
  null delete marker on a versioned bucket instead of
  creating a proper versioned delete marker.

- do not suspend paths on the excluded prefixes during
  delete operations to avoid creating `null` delete markers,
  honor suspension of versioning only at bucket level for
  delete markers.
2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi
ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Poorna
3a64580663
Add support for site replication healing (#14572)
heal bucket metadata and IAM entries for
sites participating in site replication from
the site with the most updated entry.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <aditya@minio.io>
2022-04-24 02:36:31 -07:00
Krishnan Parthasarathi
7b81967a3c
Fix handling of object versions pending purge (#14555)
- GetObject() with vid should return 405
- GetObject() without vid should return 404
- ListObjects() should ignore this object if this is the "latest" version of the object
- ListObjectVersions() should list this object as "DELETE marker"
- Remove data parts before sync'ing the version pending purge
2022-03-16 16:59:43 -07:00
Poorna
1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Poorna
ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Poorna
288e276abe
Specify tags in options while selecting replication targets (#14126)
When the replication rule is based on tag matches, the replication process
should pick up targets matching the tags specified in the replication
rule.

Fixing regression due to #12880
2022-01-19 10:45:42 -08:00
Harshavardhana
cc3f139d1f
replication: attempt abort multipart-upload at max 3 times on remote (#14087)
this is mainly an attempt to relinquish space on the remote
site, if this still doesn't do it we give and let the admin
know with a log message.
2022-01-11 22:32:29 -08:00
Poorna
54a98773f8
fix: replication of tag removal (#14056)
Currently tag removal leaves replication state as `PENDING` 
because the `HEAD` api returns just a tag count but not the 
actual tags, and this is treated as a no-op
2022-01-10 19:06:10 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Aditya Manthramurthy
997e808088
fix; race in bucket replication stats (#13942)
- r.ulock was not locked when r.UsageCache was being modified

Bonus:

- simplify code by removing some unnecessary clone methods - we can 
do this because go arrays are values (not pointers/references) that are 
automatically copied on assignment.

- remove some unnecessary map allocation calls
2021-12-17 15:33:13 -08:00
Poorna K
e270ab65b3
fix: healing of replication delete markers (#13933)
A corner case can occur where the delete-marker was propagated 
but the metadata could not be updated on the primary. Sending 
a RemoveObject call with the Delete marker version would end 
up permanently deleting the version on target. Instead, perform 
a Stat on the delete-marker version on target and redo replication 
only if the delete-marker is missing on target.
2021-12-16 15:34:55 -08:00
Poorna K
d422d24278
replication: warn if insufficient workers (#13899)
This should give an early warning if configured replication 
workers are insufficient to meet application workload.
2021-12-13 18:22:56 -08:00
Harshavardhana
914bfb2d9c
fix: allow compaction on replicated buckets (#13711)
currently getReplicationConfig() failure incorrectly
returns error on unexpected buckets upon upgrade, we
should always calculate usage as much as possible.
2021-11-19 14:46:14 -08:00
Klaus Post
faf013ec84
Improve performance on multiple versions (#13573)
Existing:

```go
type xlMetaV2 struct {
    Versions []xlMetaV2Version `json:"Versions" msg:"Versions"`
}
```

Serialized as regular MessagePack.

```go
//msgp:tuple xlMetaV2VersionHeader
type xlMetaV2VersionHeader struct {
	VersionID [16]byte
	ModTime   int64
	Type      VersionType
	Flags     xlFlags
}
```

Serialize as streaming MessagePack, format:

```
int(headerVersion)
int(xlmetaVersion)
int(nVersions)
for each version {
    binary blob, xlMetaV2VersionHeader, serialized
    binary blob, xlMetaV2Version, serialized.
}
```

xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct 
can easily be reused and does not contain pointers, so efficient as a 
slice (single allocation)

This allows quickly parsing everything as slices of bytes (no copy).

Versions are always *saved* sorted by modTime, newest *first*. 
No more need to sort on load.

* Allows checking if a version exists.
* Allows reading single version without unmarshal all.
* Allows reading latest version of type without unmarshal all.
* Allows reading latest version without unmarshal of all.
* Allows checking if the latest is deleteMarker by reading first entry.
* Allows adding/updating/deleting a version with only header deserialization.
* Reduces allocations on conversion to FileInfo(s).
2021-11-18 12:15:22 -08:00
Anis Elleuch
4caed7cc0d
metrics: Add replication latency metrics (#13515)
Add a new Prometheus metric for bucket replication latency

e.g.:
minio_bucket_replication_latency_ns{
    bucket="testbucket",
    operation="upload",
    range="LESS_THAN_1_MiB",
    server="127.0.0.1:9001",
    targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-11-17 12:10:57 -08:00
Harshavardhana
661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana
4ed0eb7012
remove double reads updating object metadata (#13542)
Removes RLock/RUnlock for updating metadata,
since we already take a write lock to update
metadata, this change removes reading of xl.meta
as well as an additional lock, the performance gain
should increase 3x theoretically for

- PutObjectRetention
- PutObjectLegalHold

This optimization is mainly for Veeam like
workloads that require a certain level of iops
from these API calls, we were losing iops.
2021-10-30 08:22:04 -07:00
Poorna K
e7f559c582
Fixes to replication metrics (#13493)
For reporting ReplicaSize and loading initial
replication metrics correctly.
2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy
7f6ed35347
Allow null versions to be replicated (#13310)
for pre-existing objects present in a bucket
prior to enabling existing object replication.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-09-28 10:26:12 -07:00
Poorna Krishnamoorthy
19ecdc75a8
replication: Simplify metrics calculation (#13274)
Also doing some code cleanup
2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy
806b10b934
fix: improve error messages returned during replication setup (#13261) 2021-09-21 13:03:20 -07:00
Poorna Krishnamoorthy
c4373ef290
Add support for multi site replication (#12880) 2021-09-18 13:31:35 -07:00
Harshavardhana
0892f1e406
fix: multipart replication and encrypted etag for sse-s3 (#13171)
Replication was not working properly for encrypted
objects in single PUT object for preserving etag,

We need to make sure to preserve etag such that replication
works properly and not gets into infinite loops of copying
due to ETag mismatches.
2021-09-08 22:25:23 -07:00
Poorna Krishnamoorthy
9af4e7b1da
Add healthcheck back for replication targets (#13168)
This will allow objects to relinquish read lock held during
replication earlier if the target is known to be down
without waiting for connection timeout when replication 
is attempted.
2021-09-08 15:34:50 -07:00
Poorna Krishnamoorthy
a366143c5b
Remove replication permission check (#13135)
Fixes #13105
2021-09-02 09:31:13 -07:00
Poorna Krishnamoorthy
6a7e22386e
Use part sizes correctly in multipart replication (#13061)
fixes #13057
2021-08-24 14:41:05 -07:00
Poorna Krishnamoorthy
674c6f7a7b
fix: resync of replication of delete markers (#12932)
Fixes #12919
2021-08-23 14:48:22 -07:00
Klaus Post
63f3e5c3fc
replication: Lock object while replicating (#13014)
Introduce a replication lock that will ensure that only one replication 
operation will run for any given object at any time.

Fixes #13013
2021-08-23 08:16:18 -07:00
Harshavardhana
3c34e18a4e
allow multipart uploads for single part multipart (#12821)
its possible that some multipart uploads would have
uploaded only single parts so relying on `len(o.Parts)`
alone is not sufficient, we need to look for ETag
pattern to be absolutely sure.
2021-07-28 22:11:55 -07:00
Poorna Krishnamoorthy
b6cd54779c
Increase context timeout for bandwidth throttled reader (#12820)
increase default timeout up to one hour for toy setups.

fixes #12812
2021-07-28 15:20:01 -07:00
Anis Elleuch
b8f95fb3d4
fix: Use correct replication status in replication healing (#12711)
In case of replication healing, we always store completed status in the
object metadata, which is wrong because replication could fail in the
further retries.
2021-07-14 09:58:46 -07:00
Harshavardhana
4f6c74a257
simplify audit logging for replication and ILM (#12610)
auditLog should be attempted right before the
return of the function and not multiple times
per function, this ensures that we only trigger
it once per function call.
2021-07-01 14:02:44 -07:00
Poorna Krishnamoorthy
a3f0288262
Use multipart call for replication (#12535)
if object was uploaded with multipart. This is to ensure that
GetObject calls with partNumber in URI request parameters
have same behavior on source and replication target.
2021-06-30 07:44:24 -07:00
Poorna Krishnamoorthy
a69c2a2fb3
Change replication to use read lock instead of writelock (#12581)
Fixes #12573

This PR also adding audit logging for replication activity
2021-06-28 23:58:08 -07:00
Poorna Krishnamoorthy
d00783c923
Use rate.Limiter for bandwidth monitoring (#12506)
Bonus: fixes a hang when bandwidth caps are enabled for
synchronous replication
2021-06-24 18:29:30 -07:00
Harshavardhana
41caf89cf4
fix: apply pre-conditions first on object metadata (#12545)
This change in error flow complies with AWS S3 behavior
for applications depending on specific error conditions.

fixes #12543
2021-06-24 09:44:00 -07:00
Harshavardhana
cdeccb5510
feat: Deprecate embedded browser and import console (#12460)
This feature also changes the default port where
the browser is running, now the port has moved
to 9001 and it can be configured with

```
--console-address ":9001"
```
2021-06-17 20:27:04 -07:00
Poorna Krishnamoorthy
dbea8d2ee0
Add support for existing object replication. (#12109)
Also adding an API to allow resyncing replication when
existing object replication is enabled and the remote target
is entirely lost. With the `mc replicate reset` command, the
objects that are eligible for replication as per the replication
config will be resynced to target if existing object replication
is enabled on the rule.
2021-06-01 19:59:11 -07:00
Harshavardhana
1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana
fdc2020b10
move to iam, bucket policy from minio/pkg (#12400) 2021-05-29 21:16:42 -07:00
Poorna Krishnamoorthy
547bb7d0a1
replication: Init worker kill channel correctly (#12379)
Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-05-28 13:28:37 -07:00
Poorna Krishnamoorthy
951acf561c
Add support for syncing replica modifications (#11104)
when bidirectional replication is set up.

If ReplicaModifications is enabled in the replication
configuration, sync metadata updates to source if
replication rules are met. By default, if this
configuration is unset, MinIO automatically sync's
metadata updates on replica back to the source.
2021-05-13 19:20:45 -07:00
Harshavardhana
1aa5858543
move madmin to github.com/minio/madmin-go (#12239) 2021-05-06 08:52:02 -07:00
Harshavardhana
f7a87b30bf Revert "deprecate embedded browser (#12163)"
This reverts commit 736d8cbac4.

Bring contrib files for older contributions
2021-04-30 08:50:39 -07:00
Harshavardhana
0faa4e6187
fix: make sure failed requests only to failed queue (#12196)
failed queue should be used for retried requests to
avoid cascading the failures into incoming queue, this
would allow for a more fair retry for failed replicas.

Additionally also avoid taking context in queue task
to avoid confusion, simplifies its usage.
2021-04-29 18:20:39 -07:00
Poorna Krishnamoorthy
90112b5644
Update ReplicationStatus if metadata not updated correctly (#12191)
There can be situations where replication completed but the
`X-Amz-Replication-Status` metadata update failed such as
when the server returns 503 under high load. This object version will
continue to be picked up by the scanner and replicateObject would perform
no action since the versions match between source and target.
The metadata would never reflect that replication was successful
without this fix, leading to repeated re-queuing.
2021-04-29 16:46:26 -07:00
Harshavardhana
c4b21ac7fa
fix: remove healthcheck routine for replication targets (#12192)
Bonus also fix a racy lookup on arnsMap() without a
read lock, hold read locks to avoid such race.

moving the healthcheck logic to minio-go
2021-04-29 16:41:28 -07:00
Poorna Krishnamoorthy
632252ff1d
fix: change SetRemoteTarget API to allow editing remote target granularly (#12175)
Currently, only credentials could be updated with
`mc admin bucket remote edit`. 

Allow updating synchronous replication flag, path, 
bandwidth and healthcheck duration on buckets, and
a flag to disable proxying in active-active replication.
2021-04-28 15:26:20 -07:00
Harshavardhana
736d8cbac4
deprecate embedded browser (#12163)
https://github.com/minio/console takes over the functionality for the
future object browser development

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-27 10:52:12 -07:00
Harshavardhana
82dc6aff1c
add support for configurable replication MRF workers (#12125)
just like replication workers, allow failed replication
workers to be configurable in situations like DR failures
etc to catch up on replication sooner when DR is back
online.

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 21:58:45 -07:00
Poorna Krishnamoorthy
014e419151
fix: ensure pending replication queued to MRF queue (#12138)
Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-04-23 16:52:57 -07:00
Krishnan Parthasarathi
c829e3a13b Support for remote tier management (#12090)
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.

Some new additions include:

 - Admin APIs remote tier configuration management

 - Simple journal to track remote objects to be 'collected'
   This is used by object API handlers which 'mutate' object versions by
   overwriting/replacing content (Put/CopyObject) or removing the version
   itself (e.g DeleteObjectVersion).

 - Rework of previous ILM transition to fit the new model
   In the new model, a storage class (a.k.a remote tier) is defined by the
   'remote' object storage type (one of s3, azure, GCS), bucket name and a
   prefix.

* Fixed bugs, review comments, and more unit-tests

- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana
069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana
2ef824bbb2
collapse two distinct calls into single RenameData() call (#12093)
This is an optimization by reducing one extra system call,
and many network operations. This reduction should increase
the performance for small file workloads.
2021-04-20 10:44:39 -07:00
Harshavardhana
0a9d8dfb0b
fix: crash in single drive mode for lifecycle (#12077)
also make sure to close the channel on the producer
side, not in a separate go-routine, this can lead
to races between a writer and a closer.

fixes #12073
2021-04-16 14:09:25 -07:00
Poorna Krishnamoorthy
d30c5d1cf0
Avoid metadata update for incoming replication failure (#12054)
This is an optimization to save IOPS. The replication
failures will be re-queued once more to re-attempt
replication. If it still does not succeed, the replication
status is set as `FAILED` and will be caught up on
scanner cycle.
2021-04-15 16:32:00 -07:00
Harshavardhana
abb55bd49e
fix: properly close leaking bandwidth monitor channel (#11967)
This PR fixes

- close leaking bandwidth report channel leakage
- remove the closer requirement for bandwidth monitor
  instead if Read() fails remember the error and return
  error for all subsequent reads.
- use locking for usage-cache.bin updates, with inline
  data we cannot afford to have concurrent writes to
  usage-cache.bin corrupting xl.meta
2021-04-05 16:07:53 -07:00
Harshavardhana
09ee303244
add cluster support for realtime bucket stats (#11963)
implementation in #11949 only catered from single
node, but we need cluster metrics by capturing
from all peers. introduce bucket stats API that
will be used for capturing in-line bucket usage
as well eventually
2021-04-04 15:34:33 -07:00
Harshavardhana
d46386246f
api: Introduce metadata update APIs to update only metadata (#11962)
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.

This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.

This helps reduce the chattiness for following calls..

- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy
47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Harshavardhana
8e6e287729
fix: delete/delete marker replication versions consistent (#11932)
replication didn't work as expected when deletion of
delete markers was requested in DeleteMultipleObjects
API, this is due to incorrect lookup elements being
used to look for delete markers.
2021-03-30 17:15:36 -07:00
Poorna Krishnamoorthy
5e003549cc
Replication: Enforce DeleteMarker disable setting (#11720)
This PR also enforces DeleteReplication
disable setting
2021-03-13 10:28:35 -08:00
Poorna Krishnamoorthy
2f29719e6b
resize replication worker pool dynamically after config update (#11737) 2021-03-09 02:56:42 -08:00
Poorna Krishnamoorthy
690434514d
Avoid notification event for replicas (#11683)
Creating notification events for replica creation
is not particularly useful to send as the notification
event generated at source already includes replication
completion events.

For applications using replica cluster as failover, avoiding
duplicate notifications for replica event will allow seamless
failover.
2021-03-03 11:13:31 -08:00
Poorna Krishnamoorthy
85d2187c20
fix: ETag mismatch for large upload in replica (#11587) 2021-02-20 00:22:17 -08:00
Poorna Krishnamoorthy
2dce5d9442
fix: delete marker permanent delete replication (#11581) 2021-02-18 16:35:37 -08:00
Poorna Krishnamoorthy
8e8a792d9d
Allow delete marker replication from replica (#11566)
in the case of active-active replication.

This PR also has the following changes:

- add docs on replication design
- fix corner case of completing versioned delete on a delete marker
  when the target is down and `mc rm --vid` is performed repeatedly. Instead
  the version should still be retained in the `PENDING|FAILED` state until
  replication sync completes.
- remove `s3:Replication:OperationCompletedReplication` and
   `s3:Replication:OperationFailedReplication` from ObjectCreated 
  events type
2021-02-18 00:33:51 -08:00
Harshavardhana
7875d472bc
avoid notification for non-existent delete objects (#11514)
Skip notifications on objects that might have had
an error during deletion, this also avoids unnecessary
replication attempt on such objects.

Refactor some places to make sure that we have notified
the client before we

- notify
- schedule for replication
- lifecycle etc.
2021-02-10 22:00:42 -08:00
Poorna Krishnamoorthy
e6b4ea7618
More fixes for delete marker replication (#11504)
continuation of PR#11491 for multiple server pools and
bi-directional replication.

Moving proxying for GET/HEAD to handler level rather than
server pool layer as this was also causing incorrect proxying 
of HEAD.

Also fixing metadata update on CopyObject - minio-go was not passing
source version ID in X-Amz-Copy-Source header
2021-02-10 17:25:04 -08:00
Harshavardhana
cbf4bb62e0
fix: getPoolIdx decouple from top level options (#11512)
top-level options shouldn't be passed down for
GetObjectInfo() while verifying the objects in
different pools, this is to make sure that
we always get the value from the pool where
the object exists.
2021-02-10 11:45:02 -08:00
Poorna Krishnamoorthy
93eb549a83
fix: duplicate delete marker attempts in bi-directional replication (#11491) 2021-02-09 15:11:43 -08:00
Harshavardhana
68d299e719
fix: case-insensitive lookups for metadata (#11489)
continuation of #11487, with more changes
2021-02-08 18:12:28 -08:00
Poorna Krishnamoorthy
f9c5636c2d
fix: lookup metdata case insensitively (#11487)
while setting replication options
2021-02-08 16:19:05 -08:00
Poorna Krishnamoorthy
8e1bbd989a
replication:alloc UserDefined map before use (#11478) 2021-02-07 22:01:10 -08:00
Harshavardhana
f108873c48
fix: replication metadata comparsion and other fixes (#11410)
- using miniogo.ObjectInfo.UserMetadata is not correct
- using UserTags from Map->String() can change order
- ContentType comparison needs to be removed.
- Compare both lowercase and uppercase key names.
- do not silently error out constructing PutObjectOptions
  if tag parsing fails
- avoid notification for empty object info, failed operations
  should rely on valid objInfo for notification in all
  situations
- optimize copyObject implementation, also introduce a new 
  replication event
- clone ObjectInfo() before scheduling for replication
- add additional headers for comparison
- remove strings.EqualFold comparison avoid unexpected bugs
- fix pool based proxying with multiple pools
- compare only specific metadata

Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
2021-02-03 20:41:33 -08:00
Poorna Krishnamoorthy
fe3aca70c3
Make number of replication workers configurable. (#11379)
MINIO_API_REPLICATION_WORKERS env.var and
`mc admin config set api` allow number of replication
workers to be configurable. Defaults to half the number
of cpus available.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-02-02 16:45:06 +05:30
Poorna Krishnamoorthy
fd3f02637a
fix: replication regression due to proxying requests (#11356)
In PR #11165 due to incorrect proxying for 2 
way replication even when the object was not 
yet replicated

Additionally, fix metadata comparisons when
deciding to do full replication vs metadata copy.

fixes #11340
2021-01-27 11:22:34 -08:00
Harshavardhana
7e266293e6
fix: notify bucket replication after replication/ilm (#11343) 2021-01-25 14:04:41 -08:00
Poorna Krishnamoorthy
feaf8dfb9a
Fix replication status reported on completion (#11273)
Fixes: #11272
2021-01-13 11:52:28 -08:00
Poorna Krishnamoorthy
7824e19d20
Allow synchronous replication if enabled. (#11165)
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.

This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
2021-01-11 22:36:51 -08:00
Klaus Post
51dad1d130
Fix missing GetObjectNInfo Closure (#11243)
Review for missing Close of returned value from `GetObjectNInfo`.

This was often obscured by the stuff that auto-unlocks when reaching EOF.
2021-01-08 10:12:26 -08:00
Harshavardhana
f0808bb2e5
fix: getObject fd leaks in transition and replication code (#11237) 2021-01-06 16:13:10 -08:00
Poorna Krishnamoorthy
64bddf47d8
Pass deletemarker correctly to replicate opts (#11227)
fixes: #11180
2021-01-05 14:12:37 -08:00
Anis Elleuch
5434088c51
replication: Ensure to always use nano precision source modtime (#11135) 2020-12-18 11:37:28 -08:00
Harshavardhana
bdd094bc39
fix: avoid sending errors on missing objects on locked buckets (#10994)
make sure multi-object delete returned errors that are AWS S3 compatible
2020-11-28 21:15:45 -08:00
Poorna Krishnamoorthy
2ff655a745
Refactor replication, ILM handling in DELETE API (#10945) 2020-11-25 11:24:50 -08:00
Poorna Krishnamoorthy
39f3d5493b
Show Delete replication status header (#10946)
X-Minio-Replication-Delete-Status header shows the
status of the replication of a permanent delete of a version.

All GETs are disallowed and return 405 on this object version.
In the case of replicating delete markers.

X-Minio-Replication-DeleteMarker-Status shows the status 
of replication, and would similarly return 405.

Additionally, this PR adds reporting of delete marker event completion
and updates documentation
2020-11-21 23:48:50 -08:00
Poorna Krishnamoorthy
251c1ef6da Add support for replication of object tags, retention metadata (#10880) 2020-11-19 18:56:09 -08:00
Poorna Krishnamoorthy
0fa430c1da validate service type of target in replication/ilm transition config (#10928) 2020-11-19 18:47:33 -08:00
Harshavardhana
9a34fd5c4a Revert "Revert "Add delete marker replication support (#10396)""
This reverts commit 267d7bf0a9.
2020-11-19 18:43:58 -08:00
Poorna Krishnamoorthy
0b766288ef
fix: send replication completed event notification (#10902) 2020-11-15 22:16:41 -08:00
Harshavardhana
267d7bf0a9 Revert "Add delete marker replication support (#10396)"
This reverts commit 50c10a5087.

PR is moved to origin/dev branch
2020-11-12 11:43:14 -08:00
Poorna Krishnamoorthy
50c10a5087
Add delete marker replication support (#10396)
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).

This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
2020-11-10 15:24:14 -08:00
Harshavardhana
8df6112204
fix: avoid divide by zero error single node distributed setup (#10862) 2020-11-09 20:40:39 -08:00
Bill Thorp
4a1efabda4
Context based AccessKey passing (#10615)
A new field called AccessKey is added to the ReqInfo struct and populated.
Because ReqInfo is added to the context, this allows the AccessKey to be
accessed from 3rd-party code, such as a custom ObjectLayer.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
2020-11-04 09:13:34 -08:00
Klaus Post
a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Ritesh H Shukla
8a16a1a1a9
fix: misc fixes for bandwidth reporting amd monitoring (#10683)
* Set peer for fetch bandwidth
* Fix the limit for bandwidth that is reported.
* Reduce CPU burn from bandwidth management.
2020-10-16 09:07:50 -07:00
Ritesh H Shukla
8ceb2a93fd
fix: peer replication bandwidth monitoring in distributed setup (#10652) 2020-10-12 09:04:55 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Poorna Krishnamoorthy
907a171edd
Generalize error messages for remote targets (#10638)
This is to allow remote targets to be generalized
for replication/ILM transition

Also adding a field in BucketTarget to identify
a remote target with a label.
2020-10-08 10:54:11 -07:00
Harshavardhana
effe131090
fix: allow read unlocks to be defensive about split brains (#10637) 2020-10-07 09:15:01 -07:00
Poorna Krishnamoorthy
dbbed6f7f0
update minio-go dependency (#10634) 2020-10-06 08:37:09 -07:00
Poorna Krishnamoorthy
7fbfdceba3
Fix replication slowness (#10632)
- Increase channel buffer length
- Avoid blocking wait on replicaCh
2020-10-05 14:45:42 -07:00
poornas
e6ab4db6b8
Fix minimum replication workers started (#10560)
This PR also fixes GetReplicationConfiguration permission
in web-handlers.go to use bucket as resource
2020-09-24 12:25:41 -07:00
poornas
4c54ed8748
Close replica channel only once (#10542)
Also enforce s3:GetReplicationConfiguration permission check as a
bucket level resource.
2020-09-22 12:47:24 -07:00
Harshavardhana
17e17da00d
add parallel workers to perform replication in parallel (#10525)
set the concurrency for replication be to runtime.NumCPU()/2
2020-09-21 13:43:29 -07:00
Harshavardhana
d616d8a857
serialize replication and feed it through task model (#10500)
this allows for eventually controlling the concurrency
of replication and overally control of throughput
2020-09-16 16:04:55 -07:00
Harshavardhana
02c1a08a5b
fix: make sure to lock CopyObject for in-place updates (#10492) 2020-09-15 20:44:48 -07:00
poornas
79e21601b0
fix: web handlers to enforce replication (#10249)
This PR also preserves source ETag for replication
2020-08-12 17:32:24 -07:00
poornas
adcaa6f9de
fix: Change ListBucketTargets handler (#10217)
to list all targets across a tenant.
Also fixing some validations.
2020-08-06 17:10:21 -07:00
poornas
121164db56
fix: relax some replication validations (#10210)
Also inherit storage class from source object
if replication configuration does not have a storage
class specified for destination bucket.
2020-08-05 20:01:20 -07:00
poornas
88daaef76b
Validate object lock when setting replication config. (#10200)
Check if object lock is enabled on
destination bucket while setting replication
configuration on a object lock enabled bucket.
2020-08-04 23:02:27 -07:00
poornas
a8dd7b3eda
Refactor replication target management. (#10154)
Generalize replication target management so
that remote targets for a bucket can be
managed with ARNs. `mc admin bucket remote`
command will be used to manage targets.
2020-07-30 19:55:22 -07:00
poornas
b46ab7e921
Rename replication target handler (#10142)
Rename replication target handler to a generic bucket target handler
2020-07-28 11:50:47 -07:00
poornas
b9be841fd2
Add missing validation for replication API conditions (#10114) 2020-07-22 17:39:40 -07:00
poornas
c43da3005a
Add support for server side bucket replication (#9882) 2020-07-21 17:49:56 -07:00