Commit Graph

372 Commits

Author SHA1 Message Date
Poorna
3a64580663
Add support for site replication healing (#14572)
heal bucket metadata and IAM entries for
sites participating in site replication from
the site with the most updated entry.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <aditya@minio.io>
2022-04-24 02:36:31 -07:00
Harshavardhana
507f993075
attempt to real resolve when there is a quorum failure on reads (#14613) 2022-04-20 12:49:05 -07:00
Harshavardhana
73a6a60785
fix: replication deleteObject() regression and CopyObject() behavior (#14780)
This PR fixes two issues

- The first fix is a regression from #14555, the fix itself in #14555
  is correct but the interpretation of that information by the
  object layer code for "replication" was not correct. This PR
  tries to fix this situation by making sure the "Delete" replication
  works as expected when "VersionPurgeStatus" is already set.

  Without this fix, there is a DELETE marker created incorrectly on
  the source where the "DELETE" was triggered.

- The second fix is perhaps an older problem started since we inlined-data
  on the disk for small objects, CopyObject() incorrectly inline's
  a non-inlined data. This is due to the fact that we have code where
  we read the `part.1` under certain conditions where the size of the
  `part.1` is less than the specific "threshold".

  This eventually causes problems when we are "deleting" the data that
  is only inlined, which means dataDir is ignored leaving such
  dataDir on the disk, that looks like an inconsistent content on
  the namespace.

fixes #14767
2022-04-20 10:22:05 -07:00
Anis Elleuch
16431d222c
heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Harshavardhana
cf94d1f1f1
do not crash readXLMetaNoData - if the xl.meta has incorrect content (#14538)
```
tmp = buf[want:]
```

Would potentially crash when `buf` is truncated for some reason
and does not have the expected bytes, this is of course considered
not normal and is an odd situation. But we do not need to crash
here instead allow for errors to be returned and let callers handle
the errors.
2022-03-14 09:07:46 -07:00
Harshavardhana
91d419ee6c
warn issues about large block I/O performance for Linux older than 4.0.0 (#14524)
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.

The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
2022-03-10 17:36:13 -08:00
Klaus Post
b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Harshavardhana
b0c84e3de7
fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483)
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.

This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
2022-03-04 20:01:26 -08:00
Harshavardhana
66afa16aed
canceled PUTs throw frivolous logs (#14475)
remote drives might throw frivolous logs,
if the caller canceled the PUT operation
in such scenarios there is no reason to log.
2022-03-04 10:31:33 -08:00
Harshavardhana
9d7648f02f
reduce unnecessary logging during speedtest (#14387)
- speedtest logs calls that were canceled
  spuriously, in situations where it should
  be ignored.

- all errors of interest are always sent back
  to the client there is no need to log them
  on the server console.

- PUT failures should negate the increments
  such that GET is not attempted on unsuccessful
  calls.

- do not attempt MRF on speedtest objects.
2022-02-23 11:59:13 -08:00
Anis Elleuch
5dcf1d13a9
ci: Always set disks as non root disks (#14389)
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
2022-02-23 10:11:33 -08:00
Krishnan Parthasarathi
5a0c0079a1
Don't add free-version on restore-object (#14340) 2022-02-17 15:05:19 -08:00
Harshavardhana
03a6e8aee2
fix: creating steep directory structure on trash folder (#14314)
weird directory structures get created on the '.trash'
folder upon server restarts, this PR fixes this.
2022-02-15 16:34:03 -08:00
Anis Elleuch
1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana
6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana
24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Harshavardhana
57118919d2
cached diskIDs are not needed for scanner healing (#14170)
This PR removes an unnecessary state that gets
passed around for DiskIDs, which is not necessary
since each disk exactly knows which pool and which
set it belongs to on a running system.

Currently cached DiskId's won't work properly
because it always ends up skipping offline disks
and never runs healing when disks are offline, as
it expects all the cached diskIDs to be present
always. This also sort of made things in-flexible
in terms perhaps a new diskID for `format.json`.
(however this is not a big issue)

This is an unnecessary requirement that healing
via scanner needs all drives to be online, instead
healing should trigger even when partial nodes
and drives are available this ensures that we
keep the SLA in-tact on the objects when disks
are offline for a prolonged period of time.
2022-01-26 08:34:56 -08:00
Krishnan Parthasarathi
ebc3627c73
further improvements to newXLStorage (#14166)
- create internal erasure volumes only if the disk is unformatted
- return a copy of format data in xlStorage.ReadAll
- parse env vars only once, to be re-used by xl-storage
2022-01-24 17:09:12 -08:00
Harshavardhana
5a9f133491
speed up startup sequence for all operations (#14148)
This speed-up is intended for faster startup times
for almost all MinIO operations. Changes here are

- Drives are not re-read for 'format.json' on a regular
  basis once read during init is remembered and refreshed
  at 5 second intervals.

- Do not do O_DIRECT tests on drives with existing 'format.json'
  only fresh setups need this check.

- Parallelize initializing erasureSets for multiple sets.

- Avoid re-reading format.json when migrating 'format.json'
  from really old V1->V2->V3

- Keep a copy of local drives for any given server in memory
  for a quick lookup.
2022-01-24 11:28:45 -08:00
Harshavardhana
70e1cbda21
allow disabling O_DIRECT in certain environments for reads (#14115)
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.

However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.

NVMe servers benefit always from keeping O_DIRECT on.
2022-01-17 08:34:14 -08:00
Klaus Post
64d4da5a37
Add Put input readahead (#14084)
When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs.

This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers.

We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from.

This improves PutObject and PutObjectParts for these cases.
2022-01-14 10:01:25 -08:00
Klaus Post
a2fd8caa69
Ignore version not found in deleteVersions (#14093)
When deleting multiple versions it "gives" up with an errFileVersionNotFound if 
a version cannot be found. This effectively skips deleting other versions 
sent in the same request. 

This can happen on inconsistent objects. We should ignore errFileVersionNotFound 
and continue with others.

We already ignore these at the caller level, this PR is continuation of 54a9877
2022-01-13 14:28:07 -08:00
Harshavardhana
f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana
38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana
737a3f0bad
fix: decommission bugfixes found during migration of .minio.sys/config (#14078) 2022-01-10 17:26:00 -08:00
Klaus Post
0e31cff762
fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038)
* Process sets concurrently.
* Disconnect context from request.
* Insert context cancellation checks.
* errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.
2022-01-06 10:47:49 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
0e3037631f
skip inconsistent shards if possible (#13945)
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.

This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
2021-12-21 10:08:26 -08:00
Harshavardhana
5f7e6d03ff
copy bucket slice to avoid skipping .minio.sys/buckets (#13912)
healing was skipping `.minio.sys/buckets` path so
essentially not healing `.usage.json` - fix this
by making a copy of `buckets` slice.
2021-12-15 09:18:09 -08:00
Harshavardhana
3b79f7e4ae
ignore if volume exists in MakeVolBulk, return other errors (#13866) 2021-12-09 15:55:42 -08:00
Harshavardhana
a7c430355a
fix: throw appropriate errors when all disks fail (#13820)
when all disks fail with same error, fail server
startup anyways - we cannot proceed.

fixes #13818
2021-12-03 09:25:17 -08:00
Klaus Post
3db931dc0e
Improve listing consistency with version merging (#13723) 2021-12-02 11:29:16 -08:00
Harshavardhana
b280a37c4d
add delete-marker proactively in DeleteObject() (#13795)
single object delete was not working properly
on a bucket when versioning was suspended,
current version 'null' object was never removed.

added unit tests to cover the behavior

fixes #13783
2021-11-30 18:30:06 -08:00
Harshavardhana
c791de0e1e
re-implement pickValidInfo dataDir, move to quorum calculation (#13681)
dataDir loosely based on maxima is incorrect and does not
work in all situations such as disks in the following order

- xl.json migration to xl.meta there may be partial xl.json's
  leftover if some disks are not yet connected when the disk
  is yet to come up, since xl.json mtime and xl.meta is
  same the dataDir maxima doesn't work properly leading to
  quorum issues.

- its also possible that XLV1 might be true among the disks
  available, make sure to keep FileInfo based on common quorum
  and skip unexpected disks with the older data format.

Also, this PR tests upgrade from older to a newer release if the 
data is readable and matches the checksum.

NOTE: this is just initial work we can build on top of this to do further tests.
2021-11-21 10:41:30 -08:00
Krishnan Parthasarathi
3da9ee15d3
Add MaxNoncurrentVersions to NoncurrentExpiration action (#13580)
This unit allows users to limit the maximum number of noncurrent 
versions of an object.

To enable this rule you need the following *ilm.json*
```
cat >> ilm.json <<EOF
{
    "Rules": [
        {
            "ID": "test-max-noncurrent",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "user-uploads/"
            },
            "NoncurrentVersionExpiration": {
                "MaxNoncurrentVersions": 5
            }
        }
    ]
}
EOF
mc ilm import myminio/mybucket < ilm.json
```
2021-11-19 17:54:10 -08:00
Klaus Post
faf013ec84
Improve performance on multiple versions (#13573)
Existing:

```go
type xlMetaV2 struct {
    Versions []xlMetaV2Version `json:"Versions" msg:"Versions"`
}
```

Serialized as regular MessagePack.

```go
//msgp:tuple xlMetaV2VersionHeader
type xlMetaV2VersionHeader struct {
	VersionID [16]byte
	ModTime   int64
	Type      VersionType
	Flags     xlFlags
}
```

Serialize as streaming MessagePack, format:

```
int(headerVersion)
int(xlmetaVersion)
int(nVersions)
for each version {
    binary blob, xlMetaV2VersionHeader, serialized
    binary blob, xlMetaV2Version, serialized.
}
```

xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct 
can easily be reused and does not contain pointers, so efficient as a 
slice (single allocation)

This allows quickly parsing everything as slices of bytes (no copy).

Versions are always *saved* sorted by modTime, newest *first*. 
No more need to sort on load.

* Allows checking if a version exists.
* Allows reading single version without unmarshal all.
* Allows reading latest version of type without unmarshal all.
* Allows reading latest version without unmarshal of all.
* Allows checking if the latest is deleteMarker by reading first entry.
* Allows adding/updating/deleting a version with only header deserialization.
* Reduces allocations on conversion to FileInfo(s).
2021-11-18 12:15:22 -08:00
Harshavardhana
886262e58a
heal legacy objects when versioning is enabled after upgrade (#13671)
legacy objects in 'xl.json' after upgrade, should have
following sequence of events - bucket should have versioning
enabled and the object should have been overwritten with
another version of an object.

this situation was not handled, which would lead to older
objects to stay perpetually with "legacy" dataDir, however
these objects were readable by all means - there weren't
converted to newer format.

This PR fixes this situation properly.
2021-11-17 15:49:12 -08:00
Harshavardhana
661b263e77
add gocritic/ruleguard checks back again, cleanup code. (#13665)
- remove some duplicated code
- reported a bug, separately fixed in #13664
- using strings.ReplaceAll() when needed
- using filepath.ToSlash() use when needed
- remove all non-Go style comments from the codebase

Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2021-11-16 09:28:29 -08:00
Harshavardhana
bb639d9f29
remove double reads delete versions (#13544)
deleting collection of versions belonging
to same object, we can avoid re-reading
the xl.meta from the disk instead purge
all the requested versions in-memory,

the tradeoff is to allocate a map to de-dup
the versions, allow disks to be read only
once per object.

additionally reduce the data transfer between
nodes by shortening msgp data values.
2021-11-01 10:50:07 -07:00
Klaus Post
c603f85488
readAllData: Reuse small file buffers (#13530)
(Re)use small buffers for small readAllData operations.
2021-10-28 17:02:22 -07:00
Krishnan Parthasarathi
939fbb3c38
ilm: Make per-tier stats available via admin-tier-info (#13381) 2021-10-23 18:38:33 -07:00
Klaus Post
23d6770ff9
Inspect: Preserve permission flags (#13490)
Preserve permission from disk files. Can help identify issues.

Refactor GetRawData function to be cleaner.
2021-10-21 11:20:13 -07:00
Harshavardhana
ac36a377b0
fix: remove deprecated jwks_url from config KV (#13477) 2021-10-20 11:31:09 -07:00
Harshavardhana
d693431183
fix: ReadFileStream should return an error when size mismatches (#13435)
offset+length should match the Size() of the individual parts
return 'errFileCorrupt' otherwise, to trigger healing of the individual 
parts do not error out prematurely when healing such bitrot's upon
successful parts being written to the client.

another issue this PR fixes is to not return and error to
the client if we have just triggered a heal on a specific
part of the object, instead continue to read all the content
and let the heal happen asynchronously later.
2021-10-13 19:49:14 -07:00
Harshavardhana
f8c5c24159
force delete should just use rename() (#13417)
use rename() instead of forced blocking
delete call, faster for large namespaces.
2021-10-12 09:24:00 -07:00
Harshavardhana
f5a55c44d4
fix: do not overwrite error on fallback. (#13415)
older content was returning '404' upon headObject()
due to swallowing of the error, make sure the
error is handling independently.

fixes #13397
2021-10-11 19:48:42 -07:00
Klaus Post
75699a3825
Add basic scanner metrics (#13317)
Add number of objects/versions/folders scanned as well as ILM action outcomes.
2021-10-02 09:31:05 -07:00
Klaus Post
bc6067d195
Add admin inspect Glob support (#13328)
* Add admin Glob support

Allow returning multiple files on inspect calls.

```
λ mc admin inspect --json local2/testbucket/nyc-taxi-data-10M.csv.zst/*

...

λ unzip -l inspect.5f0643b2.zip

Archive:  inspect.5f0643b2.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
      802  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
      802  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
      802  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
      802  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
---------                     -------
     3208                     8 files
```

Using fully recursive:

```
λ  mc admin inspect local2/testbucket/nyc-taxi-data-10M.csv.zst/**

...

Archive:  inspect.79c261cb.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/
        0  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.1
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.10
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.11
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.12
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.13
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.14
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.15
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.16
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.17
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.18
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.19
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.2
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.20
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.21
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.22
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.23
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.24
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.25
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.26
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.27
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.28
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.29
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.3
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.30
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.31
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.32
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.33
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.34
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.35
  3439368  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.36
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.4
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.5
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.6
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.7
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.8
  4194816  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.9
      802  2021-09-03 12:50   192.168.1.78:9001/a221edde-48fe-45f5-ad32-3bc7131c7659/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/
        0  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.1
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.10
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.11
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.12
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.13
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.14
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.15
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.16
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.17
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.18
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.19
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.2
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.20
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.21
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.22
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.23
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.24
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.25
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.26
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.27
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.28
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.29
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.3
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.30
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.31
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.32
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.33
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.34
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.35
  3439368  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.36
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.4
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.5
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.6
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.7
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.8
  4194816  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.9
      802  2021-09-03 12:50   192.168.1.78:9001/cb7440ef-f0d9-42a8-b137-f00f519276ca/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/
        0  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.1
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.10
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.11
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.12
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.13
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.14
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.15
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.16
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.17
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.18
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.19
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.2
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.20
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.21
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.22
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.23
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.24
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.25
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.26
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.27
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.28
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.29
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.3
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.30
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.31
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.32
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.33
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.34
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.35
  3439368  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.36
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.4
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.5
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.6
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.7
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.8
  4194816  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.9
      802  2021-09-03 12:50   192.168.1.78:9001/759cd5ac-7860-4cf3-acad-a375fcbae338/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
        0  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/
        0  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.1
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.10
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.11
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.12
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.13
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.14
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.15
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.16
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.17
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.18
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.19
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.2
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.20
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.21
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.22
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.23
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.24
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.25
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.26
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.27
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.28
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.29
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.3
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.30
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.31
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.32
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.33
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.34
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.35
  3439368  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.36
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.4
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.5
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.6
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.7
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.8
  4194816  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/18a50b3e-3c56-418e-a045-ad5c58c1d44b/part.9
      802  2021-09-09 15:56   192.168.1.78:9001/2b48619c-c2fa-4e69-839e-58fc82c1b43e/testbucket/nyc-taxi-data-10M.csv.zst/xl.meta
---------                     -------
601034920                     156 files

```

Furthermore allow `inspect` to do direct decode from `mc`, for example:

```
λ mc admin inspect --json local2/testbucket/nyc-taxi-data-10M.csv.zst/*|inspect -json
Output decrypted to inspect.5f0643b2.zip
```

- Correct error, forward non-EOF errors.
- Add some extra safety. Log FNF when no files.
- Add `xl-meta` zip support.
For `xl-meta` multiple inputs output object with names as key.
Automatically switches `xl-meta` to single-line output when multiple objects.
Add double-star wildcard support to xl-meta input.

Co-authored-by: Harshavardhana <harsha@minio.io>
2021-10-01 11:50:00 -07:00
Harshavardhana
d00ff3c453
use O_DIRECT for all ReadFileStream (#13324)
This PR also removes #13312 to ensure
that we can use a better mechanism to
handle page-cache, using O_DIRECT
even for Range GETs.
2021-09-29 16:40:28 -07:00
Harshavardhana
38027c8f52
use fadvise to control Linux page-cache (#13312)
This PR brings two optimizations mainly
for page-cache build-up and how to avoid
getting OOM killed in the process. Although
these memories are reclaimable Linux is not
fast enough to reclaim them as needed on a
very busy system. fadvise is a system call
implemented in Linux to advise page-cache to
avoid overload as we get significant amount
of requests on the server.

- FADV_SEQUENTIAL tells that all I/O from now
  is going to be sequential, allowing for more
  resposive throughput.

- FADV_NOREUSE tells kernel to start removing
  things for this 'fd' from page-cache.
2021-09-28 10:02:56 -07:00
Poorna Krishnamoorthy
c4373ef290
Add support for multi site replication (#12880) 2021-09-18 13:31:35 -07:00
Harshavardhana
1a884cd8e1
fix: deleting objects was not working after upgrades (#13242)
DeleteObject() on existing objects before `xl.json` to
`xl.meta` change were not working, not sure when this
regression was added. This PR fixes this properly.

Also this PR ensures that we perform rename of xl.json
to xl.meta only during "write" phase of the call i.e
either during Healing or PutObject() overwrites.

Also handles few other scenarios during migration where
`backendEncryptedFile` was missing deleteConfig() will
fail with `configNotFound` this case was not ignored,
which can lead to failure during upgrades.
2021-09-17 19:34:48 -07:00
Harshavardhana
5ed781a330
check for context canceled after competing for locks (#13239)
once we have competed for locks, verify if the
context is still valid - this is to ensure that
we do not start readdir() or read() calls on the
drives on canceled connections.
2021-09-17 14:11:01 -07:00
Harshavardhana
66fcd02aa2
de-couple walkMu and walkReadMu for some granularity (#13231)
This commit brings two locks instead of single lock for
WalkDir() calls on top of c25816eabc.

The main reason is to avoid contention between readMetadata()
and ListDir() calls, ListDir() can take time on prefixes that
are huge for readdir() but this shouldn't end up blocking
all readMetadata() operations, this allows for more room for
I/O while not overly penalizing all listing operations.
2021-09-17 12:14:12 -07:00
Klaus Post
bf5bfe589f
xlmeta: Recover corrupted metadata (#13205)
When unable to load existing metadata new versions 
would not be written. This would leave objects in a 
permanently unrecoverable state

Instead, start with clean metadata and write the incoming data.
2021-09-14 11:34:25 -07:00
Klaus Post
470553ff5d
Tweak readall allocation and renameData buffer reuse (#13108)
Use a single allocation for reading the file, not the growing buffer of `io.ReadAll`.

Reuse the write buffer if we can when writing metadata in RenameData.
2021-08-30 08:38:11 -07:00
Poorna Krishnamoorthy
27f895cf2c
Check pathlength before reading metadata (#13080)
fixes bug where the server returns 503 instead of 400 if 
objectName is longer than 255 characters

Fixes regression introduced in #12942
2021-08-26 16:23:12 -07:00
Harshavardhana
c11a2ac396
refactor healing to remove certain structs (#13079)
- remove sourceCh usage from healing
  we already have tasks and resp channel

- use read locks to lookup globalHealConfig

- fix healing resolver to pick candidates quickly
  that need healing, without this resolver was
  unexpectedly skipping.
2021-08-26 14:06:04 -07:00
Klaus Post
88d719689c
Synchronize bucket cycle numbers (#13058)
Synchronize bucket cycles so it is much more
likely that the same prefixes will be picked up
for scanning.

Use the global bloom filter cycle for that. 
Bump bloom filter versions to clear those.
2021-08-25 08:25:26 -07:00
Krishnan Parthasarathi
db35bcf2ce
heal: Remove transitioned objects' parts from outdated disks (#13018)
Bonus: check equality for replication and other metadata
2021-08-23 13:14:55 -07:00
Klaus Post
1080609c86
Reuse buffers when writing metadata (#13040)
Simplify returning buffers.

Tested using `warp mixed --duration=1m --obj.size=100K`:

```
Operation: DELETE
Operations: 7148 -> 7642
* Average: +6.77% (+8.1) obj/s
-------------------
Operation: GET
Operations: 32200 -> 34403
* Average: +6.74% (+3.5 MiB/s) throughput, +6.74% (+36.2) obj/s
* First Byte: Average: -105.403µs (-3%), Median: -309µs (-11%), Best: -2.7µs (-0%), Worst: +3.5637ms (+3%)
-------------------
Operation: PUT
Operations: 10741 -> 11475
* Average: +6.78% (+1.2 MiB/s) throughput, +6.78% (+12.1) obj/s
-------------------
Operation: STAT
Operations: 21465 -> 22927
* Average: +6.71% (+24.0) obj/s
```
2021-08-23 11:17:27 -07:00
Harshavardhana
0f01e7ef0f
fix: check for xl.meta as directory fallback (#13023)
Objects uploaded in this format for example

```
mc cp /etc/hosts alias/bucket/foo/bar/xl.meta
mc ls -r alias/bucket/foo/bar
```

Won't list the object, handle this scenario.
2021-08-21 00:12:29 -07:00
Klaus Post
c25816eabc
xl walk: Limit walk concurrent IO (#12885)
We are observing heavy system loads, potentially
locking the system up for periods when concurrent
listing operations are performed.

We place a per-disk lock on walk IO operations.
This will minimize the impact of concurrent listing
operations on the entire system and de-prioritize
them compared to other operations.

Single list operations should remain largely unaffected.
2021-08-18 18:10:36 -07:00
Klaus Post
24722ddd02
Remove inline data hack (#12946)
move the code down to the storage layer,
this logic decouples the inline data from the 
size parameter making it flexible and future
proof.
2021-08-13 08:25:54 -07:00
Klaus Post
89febdb3d6
Reuse small buffers (#12948)
When reading metadata allow reuse of buffers 
in certain cases. Take the low-hanging fruit.

Reduce GC overhead when listing.
2021-08-12 14:27:22 -07:00
Klaus Post
3eac02f676
Use metadata reader in ReadVersion (#12942)
Use `readMetadata` when reading version 
information without data requested. 

Reduces IO on inlined data.

Bonus: Inline compressed data as well when 
compression is enabled.
2021-08-12 10:05:24 -07:00
Harshavardhana
40a2fa8e81
fix: add more optimizations to putMetacacheObject() (#12916)
- avoid extra lookup for 'xl.meta' since we are
  definitely sure that it doesn't exist.

- use this in newMultipartUpload() as well

- also additionally do not write with O_DSYNC
  to avoid loading the drives, instead create
  'xl.meta' for listing operations without
  O_DSYNC since these are ephemeral objects.

- do the same with newMultipartUpload() since
  it gets synced when the PutObjectPart() is
  attempted, we do not need to tax newMultipartUpload()
  instead.
2021-08-10 11:12:22 -07:00
Harshavardhana
035882d292
fix: remove parentIsObject() check (#12851)
we will allow situations such as

```
a/b/1.txt
a/b
```

and

```
a/b
a/b/1.txt
```

we are going to document that this usecase is
not supported and we will never support it, if
any application does this users have to delete
the top level parent to make sure namespace is
accessible at lower level.

rest of the situations where the prefixes get
created across sets are supported as is.
2021-08-03 13:26:57 -07:00
Harshavardhana
bfbdb8f0a8
fix: incorrect O_DIRECT behavior for reads (#12811)
O_DIRECT behavior was broken and it was still
caching all the reads, this change properly fixes
this behavior.
2021-07-28 11:20:16 -07:00
Krishna Srinivas
aa0c28809b
Server side speedtest implementation (#12750) 2021-07-27 12:55:56 -07:00
Harshavardhana
0c666379fe
fix: avoid removing healed parts on dstDataPath (#12795)
destination path and old path will be similar
when healing occurs, this can lead to healed
parts being again purged leading to always an
inconsistent state on an object which might
further cause reduction in quorum eventually.
2021-07-26 15:15:34 -07:00
Harshavardhana
e124d88788
optimize listing operation concurrency (#12728)
- remove use of getOnlineDisks() instead rely on fallbackDisks()
  when disk return errors like diskNotFound, unformattedDisk
  use other fallback disks to list from, instead of paying the
  price for checking getOnlineDisks()

- optimize getDiskID() further to avoid large write locks when
  looking formatLastCheck time window

This new change allows for a more relaxed fallback for listing
allowing for more tolerance and also eventually gain more
consistency in results even if using '3' disks by default.
2021-07-24 22:03:38 -07:00
Krishnan Parthasarathi
29eea52e14
Skip transitioning of object versions if inlined (#12705) 2021-07-16 09:38:27 -07:00
Klaus Post
d6a2fe02d3
Add admin file inspector (#12635)
Download files from *any* bucket/path as an encrypted zip file.

The key is included in the response but can be separated so zip 
and the key doesn't have to be sent on the same channel.

Requires https://github.com/minio/pkg/pull/6
2021-07-09 11:29:16 -07:00
Krishnan Parthasarathi
a1df230518
Add a 'free' version to track deletion of tiered object content (#12470) 2021-06-30 19:32:07 -07:00
Harshavardhana
4669d19f2a
fix: simplify diskMap usage to keep certain checks predictable (#12519)
Bonus: also make sure that we Sanitize() the drives only during
startup of the server, but not during disk reconnects.
2021-06-16 14:26:26 -07:00
Anis Elleuch
f30c996d48
trace: Add bucket/prefix to WalkDir() tracing (#12510)
Bonus, replace os.* API with os-instrumented.go
2021-06-15 14:34:26 -07:00
Harshavardhana
31971906ff
fix: force-delete should just rename to .trash (#12499)
avoid blocking call for force-delete, instead
treat it lazily and delete in background.
2021-06-14 08:04:37 -07:00
Harshavardhana
dd2831c1a0
fix: remove parent dirs in RenameData upon failure (#12452)
- it is possible that during I/O failures we might
  leave partially written directories, make sure
  we purge them after.

- rename current data-dir (null) versionId only after
  the newer xl.meta has been written fully.

- attempt removal once for minioMetaTmpBucket/uuid/
  as this folder is empty if all previous operations
  were successful, this allows avoiding recursive os.Remove()
2021-06-07 09:35:08 -07:00
Anis Elleuch
810af07529
xl: Avoid multi-disks node to exit when one disk fails (#12423)
It makes sense that a node that has multiple disks starts when one
disk fails, returning an i/o error for example. This commit will make this
faulty tolerance available in this specific use case.
2021-06-05 09:10:32 -07:00
Poorna Krishnamoorthy
dbea8d2ee0
Add support for existing object replication. (#12109)
Also adding an API to allow resyncing replication when
existing object replication is enabled and the remote target
is entirely lost. With the `mc replicate reset` command, the
objects that are eligible for replication as per the replication
config will be resynced to target if existing object replication
is enabled on the rule.
2021-06-01 19:59:11 -07:00
Harshavardhana
1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana
81d5688d56
move the dependency to minio/pkg for common libraries (#12397) 2021-05-28 15:17:01 -07:00
Harshavardhana
4840974d7a
fix: inline data upon overwrites should be readable (#12369)
This PR fixes two bugs

- Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data
- Workaround for an existing bug on disk with latest releases to ignore fi.Data
  and instead read from the disk for non-inlined-data
- Addtionally add a reserved metadata header to indicate data is inlined for
  a given version.
2021-05-25 16:33:06 -07:00
Harshavardhana
ebf75ef10d
fix: remove all unused code (#12360) 2021-05-24 09:28:19 -07:00
Harshavardhana
0287711dc9
fix: implement readMetadata common function for re-use (#12353)
Previous PR #12351 added functions to read from the reader
stream to reduce memory usage, use the same technique in
few other places where we are not interested in reading the
data part.
2021-05-21 11:41:25 -07:00
Klaus Post
9d1b6fb37d
Add XL reader without data (#12351)
Add XL metadata reader that reads metadata only on larger files.

Use for scanning and listing for now.
2021-05-21 09:10:54 -07:00
Klaus Post
2ca9c533ef
feat: implement in-progress partial bucket updates (#12279) 2021-05-19 14:38:30 -07:00
Harshavardhana
2daba018d6
reduce allocations on multi-disk clusters (#12311)
multi-disk clusters initialize buffer pools
per disk, this is perhaps expensive and perhaps
not useful, for a running server instance. As this
may disallow re-use of buffers across sets,

this change ensures that buffers across sets
can be re-used at drive level, this can reduce
quite a lot of memory on large drive setups.
2021-05-17 17:49:48 -07:00
Harshavardhana
2ab9dc7609 do not update bloomFilters for temporary objects 2021-05-15 19:54:07 -07:00
Harshavardhana
4d876d03e8 fix: do not fail upon faulty/non-writable drives
gracefully start the server, if there are other drives
available - print enough information for administrator
to notice the errors in console.

Bonus: for really large streams use larger buffer for
writes.
2021-05-15 12:57:18 -07:00
Klaus Post
229d83bb75
feat: add dynamic usage cache (#12229)
A cache structure will be kept with a tree of usages.
The cache is a tree structure where each keeps track 
of its children.

An uncompacted branch contains a count of the files 
only directly at the branch level, and contains link to 
children branches or leaves.

The leaves are "compacted" based on a number of properties.
A compacted leaf contains the totals of all files beneath it.

A leaf is only scanned once every dataUsageUpdateDirCycles,
rarer if the bloom filter for the path is clean and no lifecycles 
are applied. Skipped leaves have their totals transferred from 
the previous cycle.

A clean leaf will be included once every healFolderIncludeProb 
for partial heal scans. When selected there is a one in 
healObjectSelectProb that any object will be chosen for heal scan.

Compaction happens when either:

- The folder (and subfolders) contains less than dataScannerCompactLeastObject objects.
- The folder itself contains more than dataScannerCompactAtFolders folders.
- The folder only contains objects and no subfolders.
- A bucket root will never be compacted.

Furthermore, if a has more than dataScannerCompactAtChildren recursive 
children (uncompacted folders) the tree will be recursively scanned and the 
branches with the least number of objects will be compacted until the limit 
is reached.

This ensures that any branch will never contain an unreasonable amount 
of other branches, and also that small branches with few objects don't 
take up unreasonable amounts of space.

Whenever a branch is scanned, it is assumed that it will be un-compacted
before it hits any of the above limits. This will make the branch rebalance 
itself when scanned if the distribution of objects has changed.

TLDR; With current values: No bucket will ever have more than 10000 
child nodes recursively. No single folder will have more than 2500 child 
nodes by itself. All subfolders are compacted if they have less than 500 
objects in them recursively.

We accumulate the (non-deletemarker) version count for paths as well, 
since we are changing the structure anyway.
2021-05-11 18:36:15 -07:00
Anis Elleuch
56d4d7b8b1
MRF: Better detection of non stable disks (#12252)
MRF does not detect when a node is disconnected and reconnected quickly
this change will ensure that MRF is alerted by comparing the last disk
reconnection timestamp with the last MRF check time.

Signed-off-by: Anis Elleuch <anis@min.io>

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-05-11 09:19:15 -07:00
Harshavardhana
764721e2c6
add root_disk threshold detection (#12259)
as there is no automatic way to detect if there
is a root disk mounted on / or /var for the container
environments due to how the root disk information
is masked inside overlay root inside container.

this PR brings an environment variable to set
root disk size threshold manually to detect the
root disks in such situations.
2021-05-08 15:40:29 -07:00
Klaus Post
254698f126
fix: minor allocation improvements in xlMetaV2 (#12133) 2021-05-07 09:11:05 -07:00
Nitish Tiwari
776589f0da Add free inode metric for Prometheus (#12225) 2021-05-06 12:50:48 -07:00
Harshavardhana
c8050bc079
fix: sleeper behavior in data scanner (#12164)
do not apply healReplication() for ILM
expired, transitioned objects
2021-04-27 08:24:44 -07:00
Poorna Krishnamoorthy
4be0f92067
Fix multipart restore to remove part match (#12161)
Part ETags are not available after multipart finalizes, removing this
check as not useful.

Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-26 18:24:06 -07:00
Krishnan Parthasarathi
c829e3a13b Support for remote tier management (#12090)
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.

Some new additions include:

 - Admin APIs remote tier configuration management

 - Simple journal to track remote objects to be 'collected'
   This is used by object API handlers which 'mutate' object versions by
   overwriting/replacing content (Put/CopyObject) or removing the version
   itself (e.g DeleteObjectVersion).

 - Rework of previous ILM transition to fit the new model
   In the new model, a storage class (a.k.a remote tier) is defined by the
   'remote' object storage type (one of s3, azure, GCS), bucket name and a
   prefix.

* Fixed bugs, review comments, and more unit-tests

- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana
069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana
2ef824bbb2
collapse two distinct calls into single RenameData() call (#12093)
This is an optimization by reducing one extra system call,
and many network operations. This reduction should increase
the performance for small file workloads.
2021-04-20 10:44:39 -07:00
Harshavardhana
1456f9f090
fix: preserve shared dataDir during suspend overwrites (#12058)
CopyObject() when shares dataDir needs to be preserved,
and upon versioning suspended overwrites should still
preserve the dataDir.
2021-04-15 08:44:05 -07:00
Harshavardhana
928ee1a7b2
remove null version dataDir upon overwrites (#12023) 2021-04-08 19:55:44 -07:00
Klaus Post
d2ac2f758e
odirectReader: handle EOF correctly (#11998)
EOF may be sent along with data so queue it up and 
return it when the buffer is empty.

Also, when reading data without direct io don't add a buffer 
that only results in extra memcopy.
2021-04-07 08:32:59 -07:00
Klaus Post
788a8bc254
Fix disk info race (#11984)
Protect updated members in xlStorage.

```
WARNING: DATA RACE
Write at 0x00c004b4ee78 by goroutine 1491:
  github.com/minio/minio/cmd.(*xlStorage).GetDiskID()
      d:/minio/minio/cmd/xl-storage.go:590 +0x1078
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).checkDiskStale()
      d:/minio/minio/cmd/xl-storage-disk-id-check.go:195 +0x84
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).StatVol()
      d:/minio/minio/cmd/xl-storage-disk-id-check.go:284 +0x16a
  github.com/minio/minio/cmd.erasureObjects.getBucketInfo.func1()
      d:/minio/minio/cmd/erasure-bucket.go:100 +0x1a5
  github.com/minio/minio/pkg/sync/errgroup.(*Group).Go.func1()
      d:/minio/minio/pkg/sync/errgroup/errgroup.go:122 +0xd7

Previous read at 0x00c004b4ee78 by goroutine 1087:
  github.com/minio/minio/cmd.(*xlStorage).CheckFile.func1()
      d:/minio/minio/cmd/xl-storage.go:1699 +0x384
  github.com/minio/minio/cmd.(*xlStorage).CheckFile()
      d:/minio/minio/cmd/xl-storage.go:1726 +0x13c
  github.com/minio/minio/cmd.(*xlStorageDiskIDCheck).CheckFile()
      d:/minio/minio/cmd/xl-storage-disk-id-check.go:446 +0x23b
  github.com/minio/minio/cmd.erasureObjects.parentDirIsObject.func1()
      d:/minio/minio/cmd/erasure-common.go:173 +0x194
  github.com/minio/minio/pkg/sync/errgroup.(*Group).Go.func1()
      d:/minio/minio/pkg/sync/errgroup/errgroup.go:122 +0xd7
```
2021-04-06 11:33:42 -07:00
Harshavardhana
5cce9361bc
fix: avoid an extra rename when there is no dataDir (#11964)
also perform globalSync() in defer when enabled
for RenameData(), to ensure all calls are flushed
to disk.
2021-04-05 08:52:28 -07:00
Harshavardhana
d46386246f
api: Introduce metadata update APIs to update only metadata (#11962)
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.

This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.

This helps reduce the chattiness for following calls..

- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy
47c09a1e6f
Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Harshavardhana
434e5c0cfe
allow preserving legacyXLv1 with inline data format (#11951)
current master breaks this important requirement
we need to preserve legacyXLv1 format, this is simply
ignored and overwritten causing a myriad of issues
by leaving stale files on the namespace etc.

for now lets still use the two-phase approach of
writing to `tmp` and then renaming the content to
the actual namespace.
2021-04-01 22:12:03 -07:00
Harshavardhana
204c610d84
do not use dataDir to reference inline data use versionID (#11942)
versionID is the one that needs to be preserved and as
well as overwritten in case of replication, transition
etc - dataDir is an ephemeral entity that changes
during overwrites - make sure that versionID is used
to save the object content.

this would break things if you are already running
the latest master, please wipe your current content
and re-do your setup after this change.
2021-04-01 13:09:23 -07:00
Klaus Post
2623338dc5
Inline small file data in xl.meta file (#11758) 2021-03-29 17:00:55 -07:00
Harshavardhana
d93c6cb9c7
use Access() instead of Lstat() for frequent use (#11911)
using Lstat() is causing tiny memory allocations,
that are usually wasted and never used, instead
we can simply uses Access() call that does 0
memory allocations.
2021-03-29 08:07:23 -07:00
Harshavardhana
d7f32ad649 xl: avoid sending Delete() remote call for fully successful runs
an optimization to avoid extra syscalls in PutObject(),
adds up to our PutObject response times.
2021-03-24 17:32:12 -07:00
Harshavardhana
410e84d273 xl: add checks for minioTmpMetaBucket in CreateFile 2021-03-24 09:36:10 -07:00
Harshavardhana
75741dbf4a
xl: remove cleanupDir instead use Delete() (#11880)
use a single call to remove directly at disk
instead of doing recursively at network layer.
2021-03-24 09:08:05 -07:00
Ritesh H Shukla
6a2ed44095 fix: optionally enable tracing posix calls 2021-03-23 22:23:08 -07:00
Anis Elleuch
98ff91b484
xl: Reduce usage of isDirEmpty() (#11838)
When an object is removed, its parent directory is inspected to check if
it is empty to remove if that is the case.

However, we can use os.Remove() directly since it is only able to remove
a file or an empty directory.
2021-03-19 15:42:01 -07:00
Anis Elleuch
4d86384dc7
xl: Remove non needed check for empty dir (#11835)
RenameData renames xl.meta and data dir and removes the parent directory
if empty, however, there is a duplicate check for empty dir, since the
parent dir of xl.meta is always the same as the data-dir.
2021-03-19 12:26:53 -07:00
Harshavardhana
b92a220db1
fix: handle weird drives sporadic read O_DIRECT behavior (#11832)
on freshReads if drive returns errInvalidArgument, we
should simply turn-off DirectIO and read normally, there
are situations in k8s like environments where the drives
behave sporadically in a single deployment and may not
have been implemented properly to handle O_DIRECT for
reads.
2021-03-18 20:16:50 -07:00
Harshavardhana
51a8619a79
[feat] Add configurable deadline for writers (#11822)
This PR adds deadlines per Write() calls, such
that slow drives are timed-out appropriately and
the overall responsiveness for Writes() is always
up to a predefined threshold providing applications
sustained latency even if one of the drives is slow
to respond.
2021-03-18 14:09:55 -07:00
Harshavardhana
60b0f2324e
storage write call path optimizations (#11805)
- write in o_dsync instead of o_direct for smaller
  objects to avoid unaligned double Write() situations
  that may arise for smaller objects < 128KiB
- avoid fallocate() as its not useful since we do not
  use Append() semantics anymore, fallocate is not useful
  for streaming I/O we can save on a syscall
- createFile() doesn't need to validate `bucket` name
  with a Lstat() call since createFile() is only used
  to write at `minioTmpBucket`
- use io.Copy() when writing unAligned writes to allow
  usage of ReadFrom() from *os.File providing zero
  buffer writes().
2021-03-17 09:38:38 -07:00
Anis Elleuch
0eb146e1b2
add additional metrics per disk API latency, API call counts #11250)
```
mc admin info --json
```

provides these details, for now, we shall eventually 
expose this at Prometheus level eventually. 

Co-authored-by: Harshavardhana <harsha@minio.io>
2021-03-16 20:06:57 -07:00
Klaus Post
fdc2f69218
truncate xl.meta files upon rewrites #11749)
If the destination files exist and is larger - junk data will be left at the end of the file.
2021-03-09 14:42:24 -08:00
Harshavardhana
d971061305
use listPathRaw for HealObjects() instead of expensive WalkVersions() (#11675) 2021-03-06 09:25:48 -08:00
Klaus Post
fa9cf1251b
Imporve healing and reporting (#11312)
* Provide information on *actively* healing, buckets healed/queued, objects healed/failed.
* Add concurrent healing of multiple sets (typically on startup).
* Add bucket level resume, so restarts will only heal non-healed buckets.
* Print summary after healing a disk is done.
2021-03-04 14:36:23 -08:00
Harshavardhana
c6a120df0e
fix: Prometheus metrics to re-use storage disks (#11647)
also re-use storage disks for all `mc admin server info`
calls as well, implement a new LocalStorageInfo() API
call at ObjectLayer to lookup local disks storageInfo

also fixes bugs where there were double calls to StorageInfo()
2021-03-02 17:28:04 -08:00
Harshavardhana
2f4af09c01 fix: alow changes to readAllData to decrement activeCount() 2021-02-28 20:09:23 -08:00
Harshavardhana
37960cbc2f
fix: avoid writing more content on network with O_DIRECT reads (#11659)
There was an io.LimitReader was missing for the 'length'
parameter for ranged requests, that would cause client to
get truncated responses and errors.

fixes #11651
2021-02-28 15:33:03 -08:00
Harshavardhana
9171d6ef65
rename all references from crawl -> scanner (#11621) 2021-02-26 15:11:42 -08:00
Harshavardhana
6386b45c08
[feat] use rename instead of recursive deletes (#11641)
most of the delete calls today spend time in
a blocking operation where multiple calls need
to be recursively sent to delete the objects,
instead we can use rename operation to atomically
move the objects from the namespace to `tmp/.trash`

we can schedule deletion of objects at this
location once in 15, 30mins and we can also add
wait times between each delete operation.

this allows us to make delete's faster as well
less chattier on the drives, each server runs locally
a groutine which would clean this up regularly.
2021-02-26 09:52:27 -08:00
Harshavardhana
b517c791e9
[feat]: use DSYNC for xl.meta writes and NOATIME for reads (#11615)
Instead of using O_SYNC, we are better off using O_DSYNC
instead since we are only ever interested in data to be
persisted to disk not the associated filesystem metadata.

For reads we ask customers to turn off noatime, but instead
we can proactively use O_NOATIME flag to avoid atime updates
upon reads.
2021-02-24 00:14:16 -08:00
Harshavardhana
18ec933085
fix: for containers use root-disk detection cleverly (#11593)
root-disk implemented currently had issues where root
disk partitions getting modified might race and provide
incorrect results, to avoid this lets rely again back on
DeviceID and match it instead.

In-case of containers `/data` is one such extra entity that
needs to be verified for root disk, due to how 'overlay'
filesystem works and the 'overlay' presents a completely
different 'device' id - using `/data` as another entity
for fallback helps because our containers describe 'VOLUME'
parameter that allows containers to automatically have a
virtual `/data` that points to the container root path this
can either be at `/` or `/var/lib/` (on different partition)
2021-02-22 10:32:21 -08:00
Harshavardhana
8778828a03
fix: read metadata in O_DIRECT if configured and supported (#11594)
reduce the page-cache pressure completely by moving
the entire read-phase of our operations to O_DIRECT,
primarily this is going to be very useful for chatty
metadata operations such as listing, scanner, ilm, healing
like operations to avoid filling up the page-cache upon
repeated runs.
2021-02-22 01:36:17 -08:00
Harshavardhana
7875d472bc
avoid notification for non-existent delete objects (#11514)
Skip notifications on objects that might have had
an error during deletion, this also avoids unnecessary
replication attempt on such objects.

Refactor some places to make sure that we have notified
the client before we

- notify
- schedule for replication
- lifecycle etc.
2021-02-10 22:00:42 -08:00
Harshavardhana
0e3211f4ad
fix: server upgrades should have more descriptive error messages (#11476)
during rolling upgrade, provide a more descriptive error
message and discourage rolling upgrade in such situations,
allowing users to take action.

additionally also rename `slashpath -> pathutil` to avoid
a slighly mis-pronounced usage of `path` package.
2021-02-08 10:15:12 -08:00
Harshavardhana
c9b0f595b9
support directory objects in listing in certain scenarios (#11452)
When a directory object is presented as a `prefix`
param our implementation tend to only list objects
present common to the `prefix` than the `prefix` itself,
to mimic AWS S3 like flat key behavior this PR ensures
that if `prefix` is directory object, it should be
automatically considered to be part of the eventual
listing result.

fixes #11370
2021-02-05 10:12:25 -08:00
Harshavardhana
f108873c48
fix: replication metadata comparsion and other fixes (#11410)
- using miniogo.ObjectInfo.UserMetadata is not correct
- using UserTags from Map->String() can change order
- ContentType comparison needs to be removed.
- Compare both lowercase and uppercase key names.
- do not silently error out constructing PutObjectOptions
  if tag parsing fails
- avoid notification for empty object info, failed operations
  should rely on valid objInfo for notification in all
  situations
- optimize copyObject implementation, also introduce a new 
  replication event
- clone ObjectInfo() before scheduling for replication
- add additional headers for comparison
- remove strings.EqualFold comparison avoid unexpected bugs
- fix pool based proxying with multiple pools
- compare only specific metadata

Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
2021-02-03 20:41:33 -08:00
Anis Elleuch
b3f81e75f6
xl: Make it clear when to create delete marker for a non existant object (#11423) 2021-02-03 10:33:43 -08:00
Anis Elleuch
6ef678663e
xl: Create a delete-marker when no other version exists (#11362)
Currently, it is not possible to create a delete-marker when xl.meta
does not exist (no version is created for that object yet). This makes a
problem for replication and mc mirroring with versioning enabled.

This also follows S3 specification.
2021-02-01 13:23:50 -08:00
Anis Elleuch
65aa2bc614
ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296)
Remove an object on the fly if there is a lifecycle rule with delete
expiry action for the corresponding object.
2021-02-01 09:52:11 -08:00
Anis Elleuch
e9ac7b0fb7
heal: Remove empty directories (#11354)
Since the introduction of __XLDIR__, an empty directory does not have a
meaning anymore in erasure mode. Make healing removes it wherever it
finds it.
2021-01-27 02:19:28 -08:00
Harshavardhana
43f973c4cf
fix: check for O_DIRECT support for reads and writes (#11331)
In-case user enables O_DIRECT for reads and backend does
not support it we shall proceed to turn it off instead
and print a warning. This validation avoids any unexpected
downtimes that users may incur.
2021-01-22 15:38:21 -08:00
Harshavardhana
d1a8f0b786
fix possible crashes on deleteMarker replication (#11308)
Delete marker can have `metaSys` set to nil, that
can lead to crashes after the delete marker has
been healed.

Additionally also fix isObjectDangling check
for transitioned objects, that do not have parts
should be treated similar to Delete marker.
2021-01-20 13:12:12 -08:00
Harshavardhana
b5049d541f
fix: reduce an extra readdir() attempted on non-legacy setups (#11301)
to verify moving content and preserving legacy content,
we have way to detect the objects through readdir()
this path is not necessary for most common cases on
newer setups, avoid readdir() to save multiple system
calls.

also fix the CheckFile behavior for most common
use case i.e without legacy format.
2021-01-19 10:01:06 -08:00
Harshavardhana
3ca6330661
fix: optimize parentDirIsObject by moving isObject to storage layer (#11291)
For objects with `N` prefix depth, this PR reduces `N` such network
operations by converting `CheckFile` into a single bulk operation.

Reduction in chattiness here would allow disks to be utilized more
cleanly, while maintaining the same functionality along with one
extra volume check stat() call is removed.

Update tests to test multiple sets scenario
2021-01-18 12:25:22 -08:00
Harshavardhana
f903cae6ff
Support variable server pools (#11256)
Current implementation requires server pools to have
same erasure stripe sizes, to facilitate same SLA
and expectations.

This PR allows server pools to be variadic, i.e they
do not have to be same erasure stripe sizes - instead
they should have SLA for parity ratio.

If the parity ratio cannot be guaranteed by the new
server pool, the deployment is rejected i.e server
pool expansion is not allowed.
2021-01-16 12:08:02 -08:00
Harshavardhana
1a5775e2e8
enable small and large file optimization (#11260)
- for large objects we found that 1MiB block for
  r/w respectively.
- for small objects we found that 128KiB block for
  r/w respectively.
2021-01-12 10:20:39 -08:00
Harshavardhana
e4e117faab
fix: enable xl.json to xl.meta only if legacy drive is found (#11255)
another optimization is renameLegacyMetadata() never needs
to validate bucket with os.Stat() again, leading to reduction
in one extra syscall.
2021-01-11 02:27:04 -08:00
Harshavardhana
4593b146be
fix: print errors only when metacache status has errors (#11248) 2021-01-08 16:52:19 +05:30
Harshavardhana
f21d650ed4
fix: readData in bulk call using messagepack byte wrappers (#11228)
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.

After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.

Benefits are seen from almost all objects from

1KiB - 32KiB

Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.

All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
2021-01-07 19:27:31 -08:00
Harshavardhana
76e2713ffe
fix: use buffers only when necessary for io.Copy() (#11229)
Use separate sync.Pool for writes/reads

Avoid passing buffers for io.CopyBuffer()
if the writer or reader implement io.WriteTo or io.ReadFrom
respectively then its useless for sync.Pool to allocate
buffers on its own since that will be completely ignored
by the io.CopyBuffer Go implementation.

Improve this wherever we see this to be optimal.

This allows us to be more efficient on memory usage.
```
   385  // copyBuffer is the actual implementation of Copy and CopyBuffer.
   386  // if buf is nil, one is allocated.
   387  func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
   388  	// If the reader has a WriteTo method, use it to do the copy.
   389  	// Avoids an allocation and a copy.
   390  	if wt, ok := src.(WriterTo); ok {
   391  		return wt.WriteTo(dst)
   392  	}
   393  	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
   394  	if rt, ok := dst.(ReaderFrom); ok {
   395  		return rt.ReadFrom(src)
   396  	}
```

From readahead package
```
// WriteTo writes data to w until there's no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
func (a *reader) WriteTo(w io.Writer) (n int64, err error) {
	if a.err != nil {
		return 0, a.err
	}
	n = 0
	for {
		err = a.fill()
		if err != nil {
			return n, err
		}
		n2, err := w.Write(a.cur.buffer())
		a.cur.inc(n2)
		n += int64(n2)
		if err != nil {
			return n, err
		}
```
2021-01-06 09:36:55 -08:00
Harshavardhana
d0027c3c41
do not use large buffers if not necessary (#11220)
without this change, there is a performance
regression for small objects GETs, this makes
the overall speed to go back to pre '59d363'
commit days.
2021-01-04 18:51:52 -08:00
Harshavardhana
c4b1d394d6
erasure: avoid io.Copy in hotpaths to reduce allocation (#11213) 2021-01-03 16:27:34 -08:00
Harshavardhana
c4131c2798
feat: Small object optimization read data in single bulk call (#11207) 2021-01-03 11:27:57 -08:00
Anis Elleuch
c9d502e6fa
parentDirIsObject() to return quickly with inexistant parent (#11204)
Rewrite parentIsObject() function. Currently if a client uploads
a/b/c/d, we always check if c, b, a are actual objects or not.

The new code will check with the reverse order and quickly quit if 
the segment doesn't exist.

So if a, b, c in 'a/b/c' does not exist in the first place, then returns
false quickly.
2021-01-02 12:01:29 -08:00
Anis Elleuch
677e80c0f8
xl: Remove check-dir in ReadVersion (#11200)
The only purpose of check-dir flag in
ReadVersion is to return 404 when
an object has xl.meta but without data.

This is causing an extract call to the disk 
which can be penalizing in case of busy system
where disks receive many concurrent access.
2021-01-02 10:35:57 -08:00
Anis Elleuch
a317d220ed
xl-storage: Do not stat bucket assuming the object exists (#11201)
In HEAD/GET, only STAT the bucket if the 
object does not exist to return the correct
error response.
2021-01-01 09:44:36 -08:00
Harshavardhana
cc457f1798
fix: enhance logging in crawler use console.Debug instead of logger.Info (#11179) 2020-12-29 01:57:28 -08:00
Harshavardhana
445a9bd827
fix: heal optimizations in crawler to avoid multiple healing attempts (#11173)
Fixes two problems

- Double healing when bitrot is enabled, instead heal attempt
  once in applyActions() before lifecycle is applied.

- If applyActions() is successful and getSize() returns proper
  value, then object is accounted for and should be removed
  from the oldCache namespace map to avoid double heal attempts.
2020-12-28 10:31:00 -08:00
Harshavardhana
c19e6ce773
avoid a crash in crawler when lifecycle is not initialized (#11170)
Bonus for static buffers use bytes.NewReader instead of
bytes.NewBuffer, to use a more reader friendly implementation
2020-12-26 22:58:06 -08:00
Harshavardhana
a773cf48d8
fix: overlapping object and prefix rejected (#11130)
fixes #11129
2020-12-18 08:51:09 -08:00
Harshavardhana
3e83643320
lifecycle improvements and additional debug logging (#11096)
Bonus change fix browser assets
2020-12-13 12:05:54 -08:00
Anis Elleuch
f164085227
xl: Always set root disk to true in test environment (#11094)
Tests environments (go test or manual testing) should always consider
the passed disks are root disks and should not rely on disk.IsRootDisk()
function. The reason is that this latter can return a false negative
when called in a busy system. However, returning a false negative will
only occur in a testing environment and not in a production, so we can
accept this trade-off for now.
2020-12-12 16:10:07 -08:00
Harshavardhana
d8c1f93de6
reject mixed drive situations with drives on root disks (#11057)
till now we used to match the inode number of the root
drive and the drive path minio would use, if they match
we knew that its a root disk.

this may not be true in all situations such as running
inside a container environment where the container might
be mounted from a different partition altogether, root
disk detection might fail.
2020-12-09 00:27:02 -08:00
Ritesh H Shukla
038bcd9079
Add replication capacity metrics support in crawler (#10786) 2020-12-07 13:47:48 -08:00
Klaus Post
a896125490
Add crawler delay config + dynamic config values (#11018) 2020-12-04 09:32:35 -08:00
Harshavardhana
96c0ce1f0c
add support for tuning healing to make healing more aggressive (#11003)
supports `mc admin config set <alias> heal sleep=100ms` to
enable more aggressive healing under certain times.

also optimize some areas that were doing extra checks than
necessary when bitrotscan was enabled, avoid double sleeps
make healing more predictable.

fixes #10497
2020-12-02 11:12:00 -08:00
Harshavardhana
bdd094bc39
fix: avoid sending errors on missing objects on locked buckets (#10994)
make sure multi-object delete returned errors that are AWS S3 compatible
2020-11-28 21:15:45 -08:00
Harshavardhana
df93102235
fix: unwrapping issues with os.Is* functions (#10949)
reduces  3 stat calls, reducing the
overall startup time significantly.
2020-11-23 08:36:49 -08:00
Poorna Krishnamoorthy
39f3d5493b
Show Delete replication status header (#10946)
X-Minio-Replication-Delete-Status header shows the
status of the replication of a permanent delete of a version.

All GETs are disallowed and return 405 on this object version.
In the case of replicating delete markers.

X-Minio-Replication-DeleteMarker-Status shows the status 
of replication, and would similarly return 405.

Additionally, this PR adds reporting of delete marker event completion
and updates documentation
2020-11-21 23:48:50 -08:00
Poorna Krishnamoorthy
1ebf6f146a Add support for ILM transition (#10565)
This PR adds transition support for ILM
to transition data to another MinIO target
represented by a storage class ARN. Subsequent
GET or HEAD for that object will be streamed from
the transition tier. If PostRestoreObject API is
invoked, the transitioned object can be restored for
duration specified to the source cluster.
2020-11-19 18:47:17 -08:00
Harshavardhana
9a34fd5c4a Revert "Revert "Add delete marker replication support (#10396)""
This reverts commit 267d7bf0a9.
2020-11-19 18:43:58 -08:00
Harshavardhana
267d7bf0a9 Revert "Add delete marker replication support (#10396)"
This reverts commit 50c10a5087.

PR is moved to origin/dev branch
2020-11-12 11:43:14 -08:00
Poorna Krishnamoorthy
50c10a5087
Add delete marker replication support (#10396)
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).

This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
2020-11-10 15:24:14 -08:00
Harshavardhana
fde3299bf3
re-use optimized readdir for isDirEmpty() (#10829)
reduces effective memory usage by an order
of magnitude, also increases performance for
small objects
2020-11-04 13:05:21 -08:00
Harshavardhana
1a1f00fa15
fix: use internode data for DisksInfo, VolsInfo in message pack (#10821)
Similar to #10775 for fewer memory allocations, since we use
getOnlineDisks() extensively for listing we should optimize it
further.

Additionally, remove all unused walkers from the storage layer
2020-11-04 10:10:54 -08:00
Klaus Post
37749f4623
Optimize FileInfo(Version) transfer (#10775)
File Info decoding, in particular, is showing up as a major 
allocator and time consumer for internode data transfers

Switch to message pack for cross-server transfers:

```
MSGP:

Size: 945 bytes

BenchmarkEncodeFileInfoMsgp-32    	 1558444	       866 ns/op	   1.16 MB/s	       0 B/op	       0 allocs/op
BenchmarkDecodeFileInfoMsgp-32    	  479968	      2487 ns/op	   0.40 MB/s	     848 B/op	      18 allocs/op

GOB:

Size: 1409 bytes

BenchmarkEncodeFileInfoGOB-32    	  333339	      3237 ns/op	   0.31 MB/s	     576 B/op	      19 allocs/op
BenchmarkDecodeFileInfoGOB-32    	   20869	     57837 ns/op	   0.02 MB/s	   16439 B/op	     428 allocs/op
```
2020-11-02 17:07:52 -08:00
Klaus Post
86e0d272f3
Reduce WriteAll allocs (#10810)
WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers 
used by `io.CopyBuffer` even if they are pooled.

Since all writers appear to write byte buffers, just send those 
instead and write directly. The files are opened through the `os` 
package so they have no special properties anyway.

This removes the alloc and copy for each operation.

REST sends content length so a precise alloc can be made.
2020-11-02 16:14:31 -08:00
Krishna Srinivas
3a2f89b3c0
fix: add support for O_DIRECT reads for erasure backends (#10718) 2020-10-30 11:04:29 -07:00
Klaus Post
a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Anis Elleuch
eb95353cb1
fix: Get/HeadObject return 404 on non quorum objects (#10753) 2020-10-26 10:30:46 -07:00
Anis Elleuch
00124c56d9
erasure: Commit data before xl.meta in RenameData() (#10734)
This will reduce the chance to have updated xl.meta without data.
2020-10-23 21:54:58 -07:00
Harshavardhana
2042d4873c
rename crawler config option to heal (#10678) 2020-10-14 13:51:51 -07:00
Klaus Post
03991c5d41
crawler: Remove waitForLowActiveIO (#10667)
Only use dynamic delays for the crawler. Even though the max wait was 1 second the number 
of waits could severely impact crawler speed.

Instead of relying on a global metric, we use the stateless local delays to keep the crawler 
running at a speed more adjusted to current conditions.

The only case we keep it is before bitrot checks when enabled.
2020-10-13 13:45:08 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Harshavardhana
2b4eb87d77
pick disks which are common maximally used (#10600)
further optimization to ensure that good disks
are always used for listing, other than healing
we only use disks that are maximally used.
2020-09-29 22:54:02 -07:00
Harshavardhana
00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Harshavardhana
7f9498f43f
fix: ignore faulty drives and continue (#10511)
drives might return different types of errors
handle them individually, and for some errors
just log an error and continue
2020-09-18 12:09:05 -07:00
Klaus Post
34859c6d4b
Preallocate (safe) slices when we know the size (#10459) 2020-09-14 20:44:18 -07:00
Klaus Post
fa01e640f5
Continous healing: add optional bitrot check (#10417) 2020-09-12 00:08:12 -07:00
Anis Elleuch
af88772a78
lifecycle: NoncurrentVersionExpiration considers noncurrent version age (#10444)
From https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html#intro-lifecycle-rules-actions

```
When specifying the number of days in the NoncurrentVersionTransition
and NoncurrentVersionExpiration actions in a Lifecycle configuration,
note the following:

It is the number of days from when the version of the object becomes
noncurrent (that is, when the object is overwritten or deleted), that
Amazon S3 will perform the action on the specified object or objects.

Amazon S3 calculates the time by adding the number of days specified in
the rule to the time when the new successor version of the object is
created and rounding the resulting time to the next day midnight UTC.
For example, in your bucket, suppose that you have a current version of
an object that was created at 1/1/2014 10:30 AM UTC. If the new version
of the object that replaces the current version is created at 1/15/2014
10:30 AM UTC, and you specify 3 days in a transition rule, the
transition date of the object is calculated as 1/19/2014 00:00 UTC.
```
2020-09-09 18:11:24 -07:00
Klaus Post
2d58a8d861
Add storage layer contexts (#10321)
Add context to all (non-trivial) calls to the storage layer. 

Contexts are propagated through the REST client.

- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.

The "dangerous" part is that now a caller disconnecting *will* propagate down,  so a 
"delete" operation will now be interrupted. In some cases we might want to disconnect 
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
2020-09-04 09:45:06 -07:00
Harshavardhana
37da0c647e
fix: delete marker compatibility behavior for suspended bucket (#10395)
- delete-marker should be created on a suspended bucket as `null`
- delete-marker should delete any pre-existing `null` versioned
  object and create an entry `null`
2020-09-02 00:19:03 -07:00
Klaus Post
3e1fb17b70
heal: Check for truncated files (#10399)
When checking parts we already do a stat for each part.

Since we have the on disk size check if it is at least what we expect.

When checking metadata check if metadata is 0 bytes.
2020-09-01 12:06:45 -07:00
Harshavardhana
a359e36e35
tolerate listing with only readQuorum disks (#10357)
We can reduce this further in the future, but this is a good
value to keep around. With the advent of continuous healing,
we can be assured that namespace will eventually be
consistent so we are okay to avoid the necessity to
a list across all drives on all sets.

Bonus Pop()'s in parallel seem to have the potential to
wait too on large drive setups and cause more slowness
instead of gaining any performance remove it for now.

Also, implement load balanced reply for local disks,
ensuring that local disks have an affinity for

- cleanupStaleMultipartUploads()
2020-08-26 19:29:35 -07:00
Harshavardhana
d19b434ffc
fix: bring back delayed leaf detection in listing (#10346) 2020-08-25 12:26:48 -07:00
Klaus Post
17a1eda702
Disregard healing disks in crawling (#10349)
When crawling never use a disk we know is healing.

Most of the change involves keeping track of the original endpoint on xlStorage
and this also fixes DiskInfo.Endpoint never being populated.

Heal master will print `data-crawl: Disk "http://localhost:9001/data/mindev/data2/xl1" is 
Healing, skipping` once on a cycle (no more often than every 5m).
2020-08-25 10:55:15 -07:00
Harshavardhana
74116204ce
handle fresh setup with mixed drives (#10273)
fresh drive setups when one of the drive is
a root drive, we should ignore such a root
drive and not proceed to format.

This PR handles this properly by marking
the disks which are root disk and they are
taken offline.
2020-08-18 14:37:26 -07:00