Commit Graph

2844 Commits

Author SHA1 Message Date
Klaus Post 650dccfa9e
cache: Only start at high watermark (#10403)
Currently, cache purges are triggered as soon as the low watermark is exceeded.
To reduce IO this should only be done when reaching the high watermark.
This simplifies checks and reduces all calls for a GC to go through
`dcache.diskSpaceAvailable(size)`. While a comment claims that 
`dcache.triggerGC <- struct{}{}` was non-blocking I don't see how 
that was possible. Instead, we add a 1 size to the queue channel 
and use channel  semantics to avoid blocking when a GC has 
already been requested.

`bytesToClear` now takes the high watermark into account to it will 
not request any bytes to be cleared until that is reached.
2020-09-02 17:48:44 -07:00
Andreas Auernhammer 9a703befe6
crypto: reduce retry delay when retrying KES requests (#10394)
This commit reduces the retry delay when retrying a request
to a KES server by:
 - reducing the max. jitter delay from 3s to 1.5s
 - skipping the random delay when there are more KES endpoints
   available.

If there are more KES endpoints we can directly retry to the request
by sending it to the next endpoint - as pointed out by @krishnasrinivas
2020-09-02 11:04:10 -07:00
Klaus Post 9a1615768d
Fix flaky TestXLStorageVerifyFile (#10398)
`TestXLStorageVerifyFile` would fail 1 in 256 if the first random character was 'a'.

Instead write 256 bytes which has 1 in 256^256 probability.
2020-09-02 09:42:24 -07:00
Harshavardhana 37da0c647e
fix: delete marker compatibility behavior for suspended bucket (#10395)
- delete-marker should be created on a suspended bucket as `null`
- delete-marker should delete any pre-existing `null` versioned
  object and create an entry `null`
2020-09-02 00:19:03 -07:00
Harshavardhana 2acb530ccd
update rulesguard with new rules (#10392)
Co-authored-by: Nitish Tiwari <nitish@minio.io>
Co-authored-by: Praveen raj Mani <praveen@minio.io>
2020-09-01 16:58:13 -07:00
Klaus Post 3e1fb17b70
heal: Check for truncated files (#10399)
When checking parts we already do a stat for each part.

Since we have the on disk size check if it is at least what we expect.

When checking metadata check if metadata is 0 bytes.
2020-09-01 12:06:45 -07:00
Klaus Post a89d6b8e3d
Fix common Windows failure (#10397)
The `getNonLoopBackIP` may grab an IP from an interface that
doesn't allow binding (on Windows), so this test consistently fails.

We exclude that specific error.
2020-09-01 10:11:15 -07:00
Klaus Post 1c085f7d1a
Fix crash on Windows when crawling (#10385)
* readDirN: Check if file is directory

`syscall.FindNextFile` crashes if the handle is a file.

`errFileNotFound` matches 'unix' functionality: d19b434ffc/cmd/os-readdir_unix.go (L106)

Fixes #10384
2020-09-01 09:33:16 -07:00
Harshavardhana 4b6585d249
support 'ldap:user' variable replacement properly (#10391)
also update `ldap.go` examples with latest
minio-go changes

Fixes #10367
2020-09-01 12:26:22 +05:30
Harshavardhana 9ffad7fceb discard empty endpoint in crypto kes
introduced in 18725679c4
2020-08-31 19:35:43 -07:00
Andreas Auernhammer 18725679c4
crypto: allow multiple KES endpoints (#10383)
This commit addresses a maintenance / automation problem when MinIO-KES
is deployed on bare-metal. In orchestrated env. the orchestrator (K8S)
will make sure that `n` KES servers (IPs) are available via the same DNS
name. There it is sufficient to provide just one endpoint.
2020-08-31 18:10:52 -07:00
Anis Elleuch ba8a8ad818
ListObjectsV1 requests unnecessarily fail with offline nodes (#10386)
ListObjectsV1 requests are actually redirected to a specific node, 
depending on the bucket name. The purpose of this behavior was
to optimize listing.

However, the current code sends a Bad Gateway error if the
target node is offline, which is a bad behavior because it means
that the list request will fail, although this is unnecessary since
we can still use the current node to list as well (the default behavior
without using proxying optimization)

Currently, you can see mint fails when there is one offline node, after
this PR, mint will always succeed.
2020-08-31 12:37:31 -07:00
Harshavardhana 102ad60dee
simplify removing temporary files (#10389) 2020-08-31 12:35:40 -07:00
Gaige B Paulsen 859ef52886
update for smartos build (solaris too) (#10378) 2020-08-31 10:19:25 -07:00
Harshavardhana e730da1438
fix: referesh JWKS public keys upon failure (#10368)
fixes #10359
2020-08-28 08:15:12 -07:00
Anis Elleuch 46ee8659b4
fix write quorum calculation for bucket operations (#10364)
When the number of disks is odd, the calculation of quorum 
for bucket operations were not correct, fix it.
2020-08-27 12:55:32 -07:00
Harshavardhana a359e36e35
tolerate listing with only readQuorum disks (#10357)
We can reduce this further in the future, but this is a good
value to keep around. With the advent of continuous healing,
we can be assured that namespace will eventually be
consistent so we are okay to avoid the necessity to
a list across all drives on all sets.

Bonus Pop()'s in parallel seem to have the potential to
wait too on large drive setups and cause more slowness
instead of gaining any performance remove it for now.

Also, implement load balanced reply for local disks,
ensuring that local disks have an affinity for

- cleanupStaleMultipartUploads()
2020-08-26 19:29:35 -07:00
Jorge Israel Peña 0a2e6d58a5
hdfs gateway handle listing single files (#10362) 2020-08-26 16:03:53 -07:00
Klaus Post 1b119557c2
getDisksInfo: Attribute failed disks to correct endpoint (#10360)
If DiskInfo calls failed the information returned was used anyway 
resulting in no endpoint being set.

This would make the drive be attributed to the local system since 
`disk.Endpoint == disk.DrivePath` in that case.

Instead, if the call fails record the endpoint and the error only.
2020-08-26 10:11:26 -07:00
Harshavardhana 7778fef6bb
update continous heal metrics appropriately for scanned items (#10352)
bonus make sure to ignore objectNotFound, and versionNotFound
errors properly at all layers, since HealObjects() returns
objectNotFound error if the bucket or prefix is empty.
2020-08-26 08:53:33 -07:00
飞雪无情 ea1803417f
Use constants for gateway names to avoid bugs caused by spelling. (#10355) 2020-08-26 08:52:46 -07:00
Harshavardhana d19b434ffc
fix: bring back delayed leaf detection in listing (#10346) 2020-08-25 12:26:48 -07:00
Klaus Post 17a1eda702
Disregard healing disks in crawling (#10349)
When crawling never use a disk we know is healing.

Most of the change involves keeping track of the original endpoint on xlStorage
and this also fixes DiskInfo.Endpoint never being populated.

Heal master will print `data-crawl: Disk "http://localhost:9001/data/mindev/data2/xl1" is 
Healing, skipping` once on a cycle (no more often than every 5m).
2020-08-25 10:55:15 -07:00
Daniel Valdivia 7d1734d033
indicate through HTTP header cluster healing in progress (#10342) 2020-08-24 15:20:50 -07:00
Harshavardhana 03ec6adfd0
fix: KES http2.0 communication support (#10341) 2020-08-24 14:37:53 -07:00
Harshavardhana 309b10f201 keep crawler cycle at 5 minutes 2020-08-24 14:05:16 -07:00
Klaus Post c097ce9c32
continous healing based on crawler (#10103)
Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914
2020-08-24 13:47:01 -07:00
Harshavardhana caad314faa
add ruleguard support, fix all the reported issues (#10335) 2020-08-24 12:11:20 -07:00
Klaus Post bc2ebe0021
Only enforce quota on success (#10339)
We should only enforce quotas if no error has been returned.

firstErr is safe to access since all goroutines have exited at this point.

If `firstErr` hasn't been set by something else return the context error if cancelled.
2020-08-24 10:15:46 -07:00
Harshavardhana 11aa393ba7
Allow region errors to be dynamic (#10323)
remove other FIXMEs as we are not planning to fix these, 
instead we will add dynamism case by case basis.

fixes #10250
2020-08-23 22:06:22 -07:00
Praveen raj Mani d0c910a6f3
Support https and basic-auth for elasticsearch notification target (#10332) 2020-08-23 09:43:48 -07:00
kannappanr d15a5ad4cc
S3 Gateway: Check for encryption headers properly (#10309) 2020-08-22 11:41:49 -07:00
Harshavardhana 95411228db
add missing cleanupStaleMultipartUploads (#10325)
fixes #10319
2020-08-21 21:39:54 -07:00
ebozduman 23774353b7
get_object() returns NoSuchKey error when object is a prefix (#10315) 2020-08-21 13:08:01 -07:00
poornas a2a5ec93d3
fix: use global context for filling cache in the background (#10308) 2020-08-20 14:23:24 -07:00
Harshavardhana 27a774cbe9
fix: FS mode should reject putBucketVersioning (#10307) 2020-08-20 13:18:06 -07:00
Klaus Post 8e6787a302
Fix TestDataUpdateTracker hanging (#10302)
Keep dataUpdateTracker while goroutine is starting.

This will ensure the object is updated one `start` returns

Tested with

```
λ go test -cpu=1,2,4,8 -test.run TestDataUpdateTracker -count=1000
PASS
ok      github.com/minio/minio/cmd      8.913s
```

Fixes #10295
2020-08-20 13:17:42 -07:00
Harshavardhana 59352d0ac2
load all blocking metadata in background (#10298)
most of this metadata already has fallbacks
and there is no good reason to load them
in blocking fashion
2020-08-20 10:38:53 -07:00
Harshavardhana 75d44b3bae
add disk for more context in bitrot errors (#10296) 2020-08-20 09:41:15 -07:00
Klaus Post 95ae6c4b49
Fix missing unlock in *healSequence.hasEnded() (#10305)
The background healing sequence would always hang when this function is called.
2020-08-20 08:48:09 -07:00
KevinSmile 0ebb73ee2e
use const instead of literals (#10292) 2020-08-19 16:43:52 -07:00
Harshavardhana c8b84a0e9e
Add nancy vulnerability scanner (#10289) 2020-08-19 14:25:21 -07:00
Ritesh H Shukla 3acb5cff45
Update code comment (#10287) 2020-08-19 14:24:58 -07:00
Harshavardhana 74116204ce
handle fresh setup with mixed drives (#10273)
fresh drive setups when one of the drive is
a root drive, we should ignore such a root
drive and not proceed to format.

This PR handles this properly by marking
the disks which are root disk and they are
taken offline.
2020-08-18 14:37:26 -07:00
Harshavardhana e4a44f6224
fix: commonPrefixes behavior in ListObjectVersions (#10286)
```
$ aws s3api --profile minio --endpoint-url http://localhost:9003 \
    list-object-versions --bucket testbucket \
    --delimiter / --prefix Veeam/Archive/

{
    "CommonPrefixes": [
        {
            "Prefix": "Veeam/Archive/003/"
        }
    ]
}
```

Also add coverage tests similar to ListObjects to
catch errors in future, skip these tests in FS mode
2020-08-18 12:19:44 -07:00
poornas 0272973175
Fix regression in web ui for retention (#10285)
Fixes: #10283 regression from PR #9259
2020-08-18 12:09:42 -07:00
Harshavardhana d2a3f92452
fix: health handler for lockers (#10280) 2020-08-18 07:27:41 -07:00
Harshavardhana ede86845e5
docs: Add policy variables for resource and conditions (#10278)
Bonus fix adds LDAP policy variable and clarifies the
usage of policy variables for temporary credentials.

fixes #10197
2020-08-17 17:39:55 -07:00
Harshavardhana e57c742674
use single dynamic timeout for most locked API/heal ops (#10275)
newDynamicTimeout should be allocated once, in-case
of temporary locks in config and IAM we should
have allocated timeout once before the `for loop`

This PR doesn't fix any issue as such, but provides
enough dynamism for the timeout as per expectation.
2020-08-17 11:29:58 -07:00
Klaus Post bb5976d727
healbucket: Send object version ID (#10263)
Based on our previous conversations I assume we should send the version
 id when healing an object.

Maybe we should even list object versions and heal all?
2020-08-17 08:25:44 -07:00