Commit Graph

76 Commits

Author SHA1 Message Date
Klaus Post
2ca9c533ef
feat: implement in-progress partial bucket updates (#12279) 2021-05-19 14:38:30 -07:00
Klaus Post
229d83bb75
feat: add dynamic usage cache (#12229)
A cache structure will be kept with a tree of usages.
The cache is a tree structure where each keeps track 
of its children.

An uncompacted branch contains a count of the files 
only directly at the branch level, and contains link to 
children branches or leaves.

The leaves are "compacted" based on a number of properties.
A compacted leaf contains the totals of all files beneath it.

A leaf is only scanned once every dataUsageUpdateDirCycles,
rarer if the bloom filter for the path is clean and no lifecycles 
are applied. Skipped leaves have their totals transferred from 
the previous cycle.

A clean leaf will be included once every healFolderIncludeProb 
for partial heal scans. When selected there is a one in 
healObjectSelectProb that any object will be chosen for heal scan.

Compaction happens when either:

- The folder (and subfolders) contains less than dataScannerCompactLeastObject objects.
- The folder itself contains more than dataScannerCompactAtFolders folders.
- The folder only contains objects and no subfolders.
- A bucket root will never be compacted.

Furthermore, if a has more than dataScannerCompactAtChildren recursive 
children (uncompacted folders) the tree will be recursively scanned and the 
branches with the least number of objects will be compacted until the limit 
is reached.

This ensures that any branch will never contain an unreasonable amount 
of other branches, and also that small branches with few objects don't 
take up unreasonable amounts of space.

Whenever a branch is scanned, it is assumed that it will be un-compacted
before it hits any of the above limits. This will make the branch rebalance 
itself when scanned if the distribution of objects has changed.

TLDR; With current values: No bucket will ever have more than 10000 
child nodes recursively. No single folder will have more than 2500 child 
nodes by itself. All subfolders are compacted if they have less than 500 
objects in them recursively.

We accumulate the (non-deletemarker) version count for paths as well, 
since we are changing the structure anyway.
2021-05-11 18:36:15 -07:00
Harshavardhana
1aa5858543
move madmin to github.com/minio/madmin-go (#12239) 2021-05-06 08:52:02 -07:00
Harshavardhana
64f6020854
fix: cleanup locking, cancel context upon lock timeout (#12183)
upon errors to acquire lock context would still leak,
since the cancel would never be called. since the lock
is never acquired - proactively clear it before returning.
2021-04-29 20:55:21 -07:00
Harshavardhana
0faa4e6187
fix: make sure failed requests only to failed queue (#12196)
failed queue should be used for retried requests to
avoid cascading the failures into incoming queue, this
would allow for a more fair retry for failed replicas.

Additionally also avoid taking context in queue task
to avoid confusion, simplifies its usage.
2021-04-29 18:20:39 -07:00
Anis Elleuch
9e797532dc
lock: Always cancel the returned Get(R)Lock context (#12162)
* lock: Always cancel the returned Get(R)Lock context

There is a leak with cancel created inside the locking mechanism. The
cancel purpose was to cancel operations such erasure get/put that are
holding non-refreshable locks.

This PR will ensure the created context.Cancel is passed to the unlock
API so it will cleanup and avoid leaks.

* locks: Avoid returning nil cancel in local lockers

Since there is no Refresh mechanism in the local locking mechanism, we
do not generate a new context or cancel. Currently, a nil cancel
function is returned but this can cause a crash. Return a dummy function
instead.
2021-04-27 16:12:50 -07:00
Harshavardhana
c8050bc079
fix: sleeper behavior in data scanner (#12164)
do not apply healReplication() for ILM
expired, transitioned objects
2021-04-27 08:24:44 -07:00
Krishnan Parthasarathi
c829e3a13b Support for remote tier management (#12090)
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.

Some new additions include:

 - Admin APIs remote tier configuration management

 - Simple journal to track remote objects to be 'collected'
   This is used by object API handlers which 'mutate' object versions by
   overwriting/replacing content (Put/CopyObject) or removing the version
   itself (e.g DeleteObjectVersion).

 - Rework of previous ILM transition to fit the new model
   In the new model, a storage class (a.k.a remote tier) is defined by the
   'remote' object storage type (one of s3, azure, GCS), bucket name and a
   prefix.

* Fixed bugs, review comments, and more unit-tests

- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana
069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Anis Elleuch
c9dfa0d87b
audit: Add field to know who triggered the operation (#12129)
This is for now needed to know if an external S3 request deleted a file
or it was the scanner.

Signed-off-by: Anis Elleuch <anis@min.io>
2021-04-23 09:51:12 -07:00
Harshavardhana
a334554f99
fix: add helper for expected path.Clean behavior (#12068)
current usage of path.Clean returns "." for empty strings
instead we need `""` string as-is, make relevant changes
as needed.
2021-04-15 16:32:13 -07:00
Poorna Krishnamoorthy
d30c5d1cf0
Avoid metadata update for incoming replication failure (#12054)
This is an optimization to save IOPS. The replication
failures will be re-queued once more to re-attempt
replication. If it still does not succeed, the replication
status is set as `FAILED` and will be caught up on
scanner cycle.
2021-04-15 16:32:00 -07:00
Poorna Krishnamoorthy
47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Anis Elleuch
6b484f45c6
crawling: Apply lifecycle then decide healing action (#11563)
It is inefficient to decide to heal an object before checking its
lifecycle for expiration or transition. This commit will just reverse
the order of action: evaluate lifecycle and heal only if asked and
lifecycle resulted a NoneAction.
2021-03-31 02:15:08 -07:00
Harshavardhana
014edd3462
allow configuring scanner cycles dynamically (#11931)
This allows us to speed up or slow down sleeps
between multiple scanner cycles, helps in testing
as well as some deployments might want to run
scanner more frequently.

This change is also dynamic can be applied on
a running cluster, subsequent cycles pickup
the newly set value.
2021-03-30 13:59:02 -07:00
Harshavardhana
21cfc4aa49 Revert "xl: CreateFile shouldn't prematurely timeout (#11854)"
This reverts commit 922c7b57f5.
2021-03-23 23:47:45 -07:00
Harshavardhana
922c7b57f5
xl: CreateFile shouldn't prematurely timeout (#11854)
For large objects taking more than '3 minutes' response
times in a single PUT operation can timeout prematurely
as 'ResponseHeader' timeout hits for 3 minutes. Avoid
this by keeping the connection active during CreateFile
phase.
2021-03-22 18:25:05 -07:00
Poorna Krishnamoorthy
2f29719e6b
resize replication worker pool dynamically after config update (#11737) 2021-03-09 02:56:42 -08:00
Harshavardhana
d971061305
use listPathRaw for HealObjects() instead of expensive WalkVersions() (#11675) 2021-03-06 09:25:48 -08:00
Krishnan Parthasarathi
bcf9825082
Data usage should account for transitioned objects (#11717) 2021-03-05 14:15:53 -08:00
Anis Elleuch
7be7109471
locking: Add Refresh for better locking cleanup (#11535)
Co-authored-by: Anis Elleuch <anis@min.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-03-03 18:36:43 -08:00
Harshavardhana
9171d6ef65
rename all references from crawl -> scanner (#11621) 2021-02-26 15:11:42 -08:00
Andreas Auernhammer
d4b822d697
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.

Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.

In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.

Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.

One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.

This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
   reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.

The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 12:31:53 -08:00
Anis Elleuch
f28b063091
heal: Use healDeleteDangling global const in self healing (#11579)
A small fix, use healDeleteDangling constant instead of 'true' in the
self-healing code.
2021-02-18 15:16:20 -08:00
Harshavardhana
289e1d8b2a
fix: reduce crawler memory usage by orders of magnitude (#11556)
currently crawler waits for an entire readdir call to
return until it processes usage, lifecycle, replication
and healing - instead we should pass the applicator all
the way down to avoid building any special stack for all
the contents in a single directory.

This allows for

- no need to remember the entire list of entries per directory
  before applying the required functions
- no need to wait for entire readdir() call to finish before
  applying the required functions
2021-02-17 15:34:42 -08:00
Harshavardhana
ffea6fcf09
fix: rename crawler as scanner in config (#11549) 2021-02-17 12:04:11 -08:00