Commit Graph

155 Commits

Author SHA1 Message Date
Harshavardhana 2ef824bbb2
collapse two distinct calls into single RenameData() call (#12093)
This is an optimization by reducing one extra system call,
and many network operations. This reduction should increase
the performance for small file workloads.
2021-04-20 10:44:39 -07:00
Harshavardhana d46386246f
api: Introduce metadata update APIs to update only metadata (#11962)
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.

This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.

This helps reduce the chattiness for following calls..

- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
2021-04-04 13:32:31 -07:00
Harshavardhana 3c571472e0
avoid network read errors crashing CreateFile call (#11939)
Thanks to @dvaldivia for reproducing this
2021-03-31 18:44:45 -07:00
Harshavardhana 79564656eb
xl: CreateFile shouldn't prematurely timeout (#11878)
For large objects taking more than '3 minutes' response
times in a single PUT operation can timeout prematurely
as 'ResponseHeader' timeout hits for 3 minutes. Avoid
this by keeping the connection active during CreateFile
phase.
2021-03-24 09:05:03 -07:00
Harshavardhana 21cfc4aa49 Revert "xl: CreateFile shouldn't prematurely timeout (#11854)"
This reverts commit 922c7b57f5.
2021-03-23 23:47:45 -07:00
Harshavardhana 922c7b57f5
xl: CreateFile shouldn't prematurely timeout (#11854)
For large objects taking more than '3 minutes' response
times in a single PUT operation can timeout prematurely
as 'ResponseHeader' timeout hits for 3 minutes. Avoid
this by keeping the connection active during CreateFile
phase.
2021-03-22 18:25:05 -07:00
Harshavardhana d971061305
use listPathRaw for HealObjects() instead of expensive WalkVersions() (#11675) 2021-03-06 09:25:48 -08:00
Klaus Post fa9cf1251b
Imporve healing and reporting (#11312)
* Provide information on *actively* healing, buckets healed/queued, objects healed/failed.
* Add concurrent healing of multiple sets (typically on startup).
* Add bucket level resume, so restarts will only heal non-healed buckets.
* Print summary after healing a disk is done.
2021-03-04 14:36:23 -08:00
Harshavardhana 9171d6ef65
rename all references from crawl -> scanner (#11621) 2021-02-26 15:11:42 -08:00
Klaus Post 03172b89e2
Ensure cache has finished deserializing (#11620)
Make sure that response has been fully deserialized before returning.
2021-02-24 02:59:49 -08:00
Harshavardhana c31d2c3fdc
fix: CrawlAndGetDataUsage close pipe() before using a new one (#11600)
also additionally make sure errors during deserializer closes
the reader with right error type such that Write() end
actually see the final error, this avoids a waitGroup usage
and waiting.
2021-02-22 10:04:32 -08:00
Harshavardhana 79b6a43467
fix: avoid timed value for network calls (#11531)
additionally simply timedValue to have RWMutex
to avoid concurrent calls to DiskInfo() getting
serialized, this has an effect on all calls that
use GetDiskInfo() on the same disks.

Such as getOnlineDisks, getOnlineDisksWithoutHealing
2021-02-12 18:17:52 -08:00
Anis Elleuch b3f81e75f6
xl: Make it clear when to create delete marker for a non existant object (#11423) 2021-02-03 10:33:43 -08:00
Klaus Post 2680772d4b
Don't mark remotes online when shutting down (#11368)
Shutting down will mark remotes online when the shutdown has 
started since the context is canceled.

For example:

```
API: SYSTEM()
Time: 16:21:31 CET 01/28/2021
DeploymentID: 313b0065-c5a1-4aa3-9233-07223e77a730
Error: Storage resources are insufficient for the write operation .minio.sys/tmp/ced455c4-3d27-4bdd-95fc-b4707a179b8a/fd934ef3-8fc8-4330-abc1-f039fbbb9700/part.1 (cmd.InsufficientWriteQuorum)
       1: d:\minio\minio\cmd\data-usage.go:56:cmd.storeDataUsageInBackend()
Exiting on signal: INTERRUPT
Client http://127.0.0.1:9002/minio/lock/v5 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d3/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d2/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d1/v24 online
Client http://127.0.0.1:9002/minio/peer/v12 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d4/v24 online
```

Use a fresh context for health checks.
2021-01-28 13:38:12 -08:00
Harshavardhana f21d650ed4
fix: readData in bulk call using messagepack byte wrappers (#11228)
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.

After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.

Benefits are seen from almost all objects from

1KiB - 32KiB

Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.

All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
2021-01-07 19:27:31 -08:00
Harshavardhana d0027c3c41
do not use large buffers if not necessary (#11220)
without this change, there is a performance
regression for small objects GETs, this makes
the overall speed to go back to pre '59d363'
commit days.
2021-01-04 18:51:52 -08:00
Harshavardhana c4131c2798
feat: Small object optimization read data in single bulk call (#11207) 2021-01-03 11:27:57 -08:00
Anis Elleuch c9d502e6fa
parentDirIsObject() to return quickly with inexistant parent (#11204)
Rewrite parentIsObject() function. Currently if a client uploads
a/b/c/d, we always check if c, b, a are actual objects or not.

The new code will check with the reverse order and quickly quit if 
the segment doesn't exist.

So if a, b, c in 'a/b/c' does not exist in the first place, then returns
false quickly.
2021-01-02 12:01:29 -08:00
Anis Elleuch 677e80c0f8
xl: Remove check-dir in ReadVersion (#11200)
The only purpose of check-dir flag in
ReadVersion is to return 404 when
an object has xl.meta but without data.

This is causing an extract call to the disk 
which can be penalizing in case of busy system
where disks receive many concurrent access.
2021-01-02 10:35:57 -08:00
Harshavardhana 2eb52ca5f4
fix: heal bucket metadata right before healing bucket (#11097)
optimization mainly to avoid listing the entire
`.minio.sys/buckets/.minio.sys` directory, this
can get really huge and comes in the way of startup
routines, contents inside `.minio.sys/buckets/.minio.sys`
are rather transient and not necessary to be healed.
2020-12-13 11:57:08 -08:00
Klaus Post 4bca62a0bd
crawler: Stream bucket usage cache data (#11068)
Stream bucket caches to storage and through RPC calls.
2020-12-10 13:03:22 -08:00
Harshavardhana f794fe79e3
fix: network shutdown was not handle properly (#10927)
fixes a regression introduced in #10859, due
to the error returned by rest.Client being typed
i.e *rest.NetworkError - IsNetworkHostDown function
didn't work as expected to detect network issues.

This in-turn aggravated the situations when nodes
are disconnected leading to performance loss.
2020-11-19 13:53:49 -08:00
Harshavardhana 80b8ce89a4
remove context deadline from Delete calls (#10901) 2020-11-17 09:09:45 -08:00
Harshavardhana 17a5ff51ff
fix: move context timeout closer to network for Delete calls (#10897)
allowing for disconnects to be limited to the drive
themselves instead of disconnecting all drives.
2020-11-13 16:56:45 -08:00
Klaus Post 06899210a7
Reduce health check output (#10859)
This will make the health check clients 'silent'.
Use `IsNetworkOrHostDown` determine if network is ok so it mimics the functionality in the actual client.
2020-11-10 09:28:23 -08:00
Harshavardhana 1a1f00fa15
fix: use internode data for DisksInfo, VolsInfo in message pack (#10821)
Similar to #10775 for fewer memory allocations, since we use
getOnlineDisks() extensively for listing we should optimize it
further.

Additionally, remove all unused walkers from the storage layer
2020-11-04 10:10:54 -08:00
Klaus Post 1e11b4629f
Add remote Diskinfo caching (#10824)
Add 1 second remote disk info cache.

Should decrease need for remote calls a great deal due to how actively it is used now.
2020-11-04 08:00:18 -08:00
Klaus Post 37749f4623
Optimize FileInfo(Version) transfer (#10775)
File Info decoding, in particular, is showing up as a major 
allocator and time consumer for internode data transfers

Switch to message pack for cross-server transfers:

```
MSGP:

Size: 945 bytes

BenchmarkEncodeFileInfoMsgp-32    	 1558444	       866 ns/op	   1.16 MB/s	       0 B/op	       0 allocs/op
BenchmarkDecodeFileInfoMsgp-32    	  479968	      2487 ns/op	   0.40 MB/s	     848 B/op	      18 allocs/op

GOB:

Size: 1409 bytes

BenchmarkEncodeFileInfoGOB-32    	  333339	      3237 ns/op	   0.31 MB/s	     576 B/op	      19 allocs/op
BenchmarkDecodeFileInfoGOB-32    	   20869	     57837 ns/op	   0.02 MB/s	   16439 B/op	     428 allocs/op
```
2020-11-02 17:07:52 -08:00
Klaus Post 86e0d272f3
Reduce WriteAll allocs (#10810)
WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers 
used by `io.CopyBuffer` even if they are pooled.

Since all writers appear to write byte buffers, just send those 
instead and write directly. The files are opened through the `os` 
package so they have no special properties anyway.

This removes the alloc and copy for each operation.

REST sends content length so a precise alloc can be made.
2020-11-02 16:14:31 -08:00
Harshavardhana 4c773f7068
re-use remote transports in Peer,Storage,Locker clients (#10788)
use one transport for internode communication
2020-11-02 07:43:11 -08:00
Klaus Post e63a44b734
rest client: Expect context timeouts for locks (#10782)
Add option for rest clients to not mark a remote offline for context timeouts.

This can be used if context timeouts are expected on the call.
2020-10-29 09:52:11 -07:00
Klaus Post a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Anis Elleuch eb95353cb1
fix: Get/HeadObject return 404 on non quorum objects (#10753) 2020-10-26 10:30:46 -07:00
Harshavardhana 029758cb20
fix: retain the previous UUID for newly replaced drives (#10759)
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.

keeping reference format in-tact means UUIDs
never change once they are formatted.
2020-10-26 10:29:29 -07:00
Harshavardhana 71b97fd3ac
fix: connect disks pre-emptively during startup (#10669)
connect disks pre-emptively upon startup, to ensure we have
enough disks are connected at startup rather than wait
for them.

we need to do this to avoid long wait times for server to
be online when we have servers come up in rolling upgrade
fashion
2020-10-13 18:28:42 -07:00
Harshavardhana 1f9abbee4d
make sure to release locks upon timeout (#10596)
fixes #10418
2020-09-29 15:18:34 -07:00
Harshavardhana 00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana 66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
Harshavardhana 81caf35926
fix: reduce healthcheck interval for storage rest client (#10544) 2020-09-23 10:43:42 -07:00
Anis Elleuch 4c81201f95
fix: healing delete marker on versioned buckets (#10530)
Healing was not working correctly in the distributed mode because
errFileVersionNotFound was not properly converted in storage rest
client.

Besides, fixing the healing delete marker is not working as expected.
2020-09-21 15:16:16 -07:00
Klaus Post 2d58a8d861
Add storage layer contexts (#10321)
Add context to all (non-trivial) calls to the storage layer. 

Contexts are propagated through the REST client.

- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.

The "dangerous" part is that now a caller disconnecting *will* propagate down,  so a 
"delete" operation will now be interrupted. In some cases we might want to disconnect 
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
2020-09-04 09:45:06 -07:00
Klaus Post c097ce9c32
continous healing based on crawler (#10103)
Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914
2020-08-24 13:47:01 -07:00
Harshavardhana e7ba78beee
use GlobalContext instead of context.Background when possible (#10254) 2020-08-13 09:16:01 -07:00
Harshavardhana 1e2ebc9945
feat: time to bring back http2.0 support (#10230)
Bonus move our CI/CD to go1.14
2020-08-10 09:02:29 -07:00
Harshavardhana 0b8255529a
fix: proxies set keep-alive timeouts to be system dependent (#10199)
Split the DialContext's one for internode and another
for all other external communications especially
proxy forwarders, gateway transport etc.
2020-08-04 14:55:53 -07:00
Harshavardhana a880283593
Send the lower level error directly from GetDiskID() (#10095)
this is to detect situations of corruption disk
format etc errors quickly and keep the disk online
in such scenarios for requests to fail appropriately.
2020-07-21 13:54:06 -07:00
Klaus Post d84fc58cac
fix: CheckParts endpoint call to correct API (#10073)
CheckParts is calling the wrong endpoint, so instead of 
checking parts, it is writing metadata.
2020-07-17 10:17:59 -07:00
Harshavardhana e7d7d5232c
fix: admin info output and improve overall performance (#10015)
- admin info node offline check is now quicker
- admin info now doesn't duplicate the code
  across doing the same checks for disks
- rely on StorageInfo to return appropriate errors
  instead of calling locally.
- diskID checks now return proper errors when
  disk not found v/s format.json missing.
- add more disk states for more clarity on the
  underlying disk errors.
2020-07-13 09:51:07 -07:00
Harshavardhana 3b9fbf80ad
fix: make sure to use new restClient for healthcheck (#10026)
Without instantiating a new rest client we can
have a recursive error which can lead to
healthcheck returning always offline, this can
prematurely take the servers offline.
2020-07-11 22:19:38 -07:00
Klaus Post abd999f64a
fix: list object versions in distributed setup (#9958)
Remove calls to `WalkVersions` was calling the wrong endpoint, 
so unless quorum could be reached with local disks no results 
would ever be returned.
2020-07-02 10:29:50 -07:00
Harshavardhana f7f12b8604
fix: crash in storage rest client due to spurious query params (#9924)
regression got introduced in dee3cf2d7f
when the DeleteVersion API was changed, but the corresponding query
params were left in-tact.
2020-06-26 16:49:49 -07:00
Harshavardhana dee3cf2d7f
fix: preserve modTime for DeleteMarker on remote disks (#9905) 2020-06-23 10:20:31 -07:00
Harshavardhana 7ed1077879
Add a custom healthcheck function for online status (#9858)
- Add changes to ensure remote disks are not
  incorrectly taken online if their order has
  changed or are incorrect disks.
- Bring changes to peer to detect disconnection
  with separate Health handler, to avoid a
  rather expensive call GetLocakDiskIDs()
- Follow up on the same changes for Lockers
  as well
2020-06-17 14:49:26 -07:00
Klaus Post 3ba4804d6c
Move online status to REST client (#9808) 2020-06-16 18:59:32 -07:00
Harshavardhana 4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Klaus Post 43d6e3ae06
merge object lifecycle checks into usage crawler (#9579) 2020-06-12 10:28:21 -07:00
Harshavardhana b2db8123ec
Preserve errors returned by diskInfo to detect disk errors (#9727)
This PR basically reverts #9720 and re-implements it differently
2020-05-28 13:03:04 -07:00
Anis Elleuch 375b79f11b
storage: Implement GetDiskID request in REST server side (#9720)
GetDiskID() in storage rest client does not really issue a REST request
to the remote disk, but returns an in-memory value instead.

However, GetDiskID() should return an error when format.json is not
found or for other similar issues (unmounted disks, etc..)

GetDiskID() is only called when formatting disks and getting storage
informatio, hence this commit should not have a performance degradation.
2020-05-28 08:17:42 -07:00
Anis Elleuch 9baeda781a
fix storage info output with unordered endpoints arguments (#9610)
Shuffling arguments that we pass to MinIO server are supported. However,
when that happens, Prometheus returns wrong information about disks usage
and online/offline status.

The commit fixes the issue by avoiding relying on xl.endpoints since
it is not ordered.
2020-05-19 14:27:20 -07:00
Klaus Post c4464e36c8
fix: limit HTTP transport tuables to affordable values (#9383)
Close connections pro-actively in transient calls
2020-04-17 11:20:56 -07:00
Harshavardhana f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Bala FA 2c3e34f001
add force delete option of non-empty bucket (#9166)
passing HTTP header `x-minio-force-delete: true` would 
allow standard S3 API DeleteBucket to delete a non-empty
bucket forcefully.
2020-03-27 21:52:59 -07:00
Harshavardhana 6f992134a2
fix: startup load time by reusing storageDisks (#9210) 2020-03-27 14:48:30 -07:00
Krishna Srinivas 45b1c66195
fix: implement splunk specific listObjects when delimiter=guidSplunk (#9186) 2020-03-22 19:23:47 -07:00
Klaus Post 8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Anis Elleuch 0af62d35a0
xl: Implement posix.DeletePrefixes to enhance delete perf (#9100)
Bulk delete API was using cleanupObjectsBulk() which calls posix
listing and delete API to remove objects internal files in the
backend (xl.json and parts) one by one.

Add DeletePrefixes in the storage API to remove the content
of a directory in a single call.

Also use a remove goroutine for each disk to accelerate removal.
2020-03-11 08:56:36 -07:00
Harshavardhana 23a8411732
Add a generic Walk()'er to list a bucket, optinally prefix (#9026)
This generic Walk() is used by likes of Lifecyle, or
KMS to rotate keys or any other functionality which
relies on this functionality.
2020-02-25 21:22:28 +05:30
Harshavardhana 7ce63b3078
fix: multi-delete API write quorum failures (#8926)
multi-delete API failed with write quorum errors
under following situations

- list of files requested for delete doesn't exist
  anymore can lead to quorum errors and failure
- due to usage of query param for paths, for really
  long paths MinIO server rejects these requests as
  malformed as unexpected.

This was reproduced with warp
2020-02-01 18:11:29 -08:00
Harshavardhana 0879a4f743 rest/storage: Remove racy LastError usage (#8817)
instead perform a liveness check call to
verify if server is online and print relevant
errors.

Also introduce a StorageErr string error type
instead of errors.New() deprecate usage of
VerifyFileError, DeleteFileError for gob,
change in datastructure also requires bump in
storage REST version to v13.

Fixes #8811
2020-01-14 18:45:17 -08:00
Klaus Post 37b32199e3 Validate XL sets on format (#8779)
When formatting a set validate if a host failure will likely lead to data loss.

While we don't know what config will be set in the future 
evaluate to our best knowledge, assuming default settings.
2020-01-13 13:09:10 -08:00
Harshavardhana f68a7005c0 Improve disk formatting stage for large disk sets (#8690) 2019-12-23 16:31:03 -08:00
Anis Elleuch 555969ee42 Add data usage collect with its new admin API (#8553)
Admin data usage info API returns the following

(Only FS & XL, for now)

- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
2019-12-12 06:02:37 -08:00
Harshavardhana e9b2bf00ad Support MinIO to be deployed on more than 32 nodes (#8492)
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.

In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
2019-11-13 12:17:45 -08:00
Harshavardhana 4e63e0e372 Return appropriate errors API versions changes across REST APIs (#8480)
This PR adds code to appropriately handle versioning issues
that come up quite constantly across our API changes. Currently
we were also routing our requests wrong which sort of made it
harder to write a consistent error handling code to appropriately
reject or honor requests.

This PR potentially fixes issues

 - old mc is used against new minio release which is incompatible
   returns an appropriate for client action.
 - any older servers talking to each other, report appropriate error
 - incompatible peer servers should report error and reject the calls
   with appropriate error
2019-11-04 09:30:59 -08:00
Krishna Srinivas 980bf78b4d Detect underlying disk mount/unmount (#8408) 2019-10-25 10:37:53 -07:00
Harshavardhana 6a4ef2e48e Initialize configs correctly, move notification config (#8367)
This PR also removes deprecated tests, adds checks
to avoid races reproduced on CI/CD.
2019-10-09 11:41:15 +05:30
Harshavardhana ff5bf51952 admin/heal: Fix deep healing to heal objects under more conditions (#8321)
- Heal if the part.1 is truncated from its original size
- Heal if the part.1 fails while being verified in between
- Heal if the part.1 fails while being at a certain offset

Other cleanups include make sure to flush the HTTP responses
properly from storage-rest-server, avoid using 'defer' to
improve call latency. 'defer' incurs latency avoid them
in our hot-paths such as storage-rest handlers.

Fixes #8319
2019-10-02 01:42:15 +05:30
Harshavardhana f45977d371
Fix error handling in DeleteFileBulk storage handler (#8327)
errors.errorString() cannot be marshalled by gob
encoder, so using a slice of []error would fail
to be encoded. This leads to no errors being
generated instead gob.Decoder on the storage-client
would see an io.EOF

To avoid such bugs introduce a typed error for
handling such translations and register this type
for gob encoding support.
2019-09-30 19:01:28 -07:00
Klaus Post ff726969aa Switch to Snappy -> S2 compression (#8189) 2019-09-25 23:08:24 -07:00
Praveen raj Mani 8700945cdf Handle connection failures on webhook/url pings (#8204)
Properly handle connection failures while replaying events

Fixes #8194
2019-09-12 16:44:51 -07:00
Anis Elleuch e7b3f39064 xl: Fix verifying non streaming highway algo with a dist setup (#8230)
VerifyFile in the distributed setup does not work with
the non streaming highway hash. The reason is that the
internode mux router did not expect `storageRESTBitrotHash`
parameter.
2019-09-12 13:08:02 -07:00
Anis Elleuch 3f258062d8 bitrot: Verify file size inside storage interface (#7932) 2019-09-12 02:19:53 +05:30
Harshavardhana 6e7962bf35
Return if paths are empty in DeleteFileBulk (#8085)
This avoids a network call, also fixes an issue
when empty paths are passed the underlying call
fails with "405 Method Not Allowed".

This is reproducible when you are deleting a
non-existent object.

Fixes #8083
2019-08-15 13:15:49 -07:00
Harshavardhana 54eded2e6f Do not assume all HTTP errors as Network errors (#7983)
In situations such as when client uploading data,
prematurely disconnects from server such as pressing
ctrl-c before uploading all the data. Under this
situation in distributed setup we prematurely
disconnect disks causing a reconnect loop. This has
an adverse affect we end up leaving a lot of files
in temporary location which ideally should have been
cleaned up when Put() prematurely fails.

This is also a regression which got introduced in #7610
2019-07-29 14:48:18 -07:00
Anis Elleuch 000a60f238 xl: Heal empty parts (#7860)
posix.VerifyFile() doesn't know how to check if a file
is corrupted if that file is empty. We do have the part
size in xl.json so we pass it to VerifyFile to return
an error so healing empty parts can work properly.
2019-07-13 00:29:44 +01:00
Krishna Srinivas 58d90ed73c Avoid network transfer for bitrot verification during healing (#7375) 2019-07-08 13:51:18 -07:00
Krishna Srinivas 74e2fe0879 Return "SlowDown" to S3 clients for network related errors (#7610)
Consider errors returned by httpClient.Do() as network errors. This is because
the http clients returns different types of errors and it is hard to catch
all the error types.
2019-05-29 10:21:47 -07:00
Harshavardhana 39b3e4f9b3 Avoid using io.ReadFull() for WriteAll and CreateFile (#7676)
With these changes we are now able to peak performances
for all Write() operations across disks HDD and NVMe.

Also adds readahead for disk reads, which also increases
performance for reads by 3x.
2019-05-22 13:47:15 -07:00
Harshavardhana b3f22eac56 Offload listing to posix layer (#7611)
This PR adds one API WalkCh which
sorts and sends list over the network

Each disk walks independently in a sorted manner.
2019-05-14 13:49:10 -07:00
Anis Elleuch 9c90a28546 Implement bulk delete (#7607)
Bulk delete at storage level in Multiple Delete Objects API

In order to accelerate bulk delete in Multiple Delete objects API,
a new bulk delete is introduced in storage layer, which will accept
a list of objects to delete rather than only one. Consequently,
a new API is also need to be added to Object API.
2019-05-13 12:25:49 -07:00
poornas cf2a436bc8 Show SlowDown error message if backend is busy (#7521)
or if there are too many open file descriptors.
2019-05-02 07:09:57 -07:00
Harshavardhana f767a2538a
Optimize listing with leaf check offloaded to posix (#7541)
Other listing optimizations include

- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
2019-04-23 14:54:28 -07:00
kannappanr d2f42d830f
Lock: Use REST API instead of RPC (#7469)
In distributed mode, use REST API to acquire and manage locks instead
of RPC.

RPC has been completely removed from MinIO source.

Since we are moving from RPC to REST, we cannot use rolling upgrades as the
nodes that have not yet been upgraded cannot talk to the ones that have
been upgraded.

We expect all minio processes on all nodes to be stopped and then the
upgrade process to be completed.

Also force http1.1 for inter-node communication
2019-04-17 23:16:27 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Anis Elleuch 53011606a5 Show 401 unauthorized msg when nodes are started with different creds (#7433)
Before this commit, nodes wait indefinitely without showing any
indicate error message when a node is started with different access
and secret keys.

This PR will show '401 Unauthorized' in this case.
2019-04-02 12:25:34 -07:00
Harshavardhana a9032b52b8 Change storageRESTTimeout to 1minute (#7398) 2019-03-20 13:20:09 -07:00
Harshavardhana 396d78352d Support HTTP/2.0 (#7204)
Fixes #6704
2019-02-14 17:53:46 -08:00
Krishna Srinivas 90213ff1b2 Detect peer reboots to invalidate current storage REST clients (#7227) 2019-02-13 15:29:46 -08:00
Anis Elleuch f9fecf0e76 storage: Increase the timeout of storage REST requests (#7218)
This commit increases storage REST requests to 5 minutes, this includes
the opening TCP connection, and sending/receiving data. This will reduce
clients receiving errors when the server is under high load.
2019-02-12 23:27:33 -08:00
Harshavardhana 817269475f Make sure to drain body upon an error (#7197)
Also cleanup redundant code and use it at a common place
2019-02-06 12:07:03 -08:00