Commit Graph

146 Commits

Author SHA1 Message Date
Harshavardhana
60b0f2324e
storage write call path optimizations (#11805)
- write in o_dsync instead of o_direct for smaller
  objects to avoid unaligned double Write() situations
  that may arise for smaller objects < 128KiB
- avoid fallocate() as its not useful since we do not
  use Append() semantics anymore, fallocate is not useful
  for streaming I/O we can save on a syscall
- createFile() doesn't need to validate `bucket` name
  with a Lstat() call since createFile() is only used
  to write at `minioTmpBucket`
- use io.Copy() when writing unAligned writes to allow
  usage of ReadFrom() from *os.File providing zero
  buffer writes().
2021-03-17 09:38:38 -07:00
Harshavardhana
4d80de899a
fix: mips 32bit compilation issue (#11775)
fixes #11768
2021-03-15 06:02:09 -07:00
Harshavardhana
9ccc483df6
[feat]: change erasure coding default block size from 10MiB to 1MiB (#11721)
major performance improvements in range GETs to avoid large
read amplification when ranges are tiny and random

```
-------------------
Operation: GET
Operations: 142014 -> 339421
Duration: 4m50s -> 4m56s
* Average: +139.41% (+1177.3 MiB/s) throughput, +139.11% (+658.4) obj/s
* Fastest: +125.24% (+1207.4 MiB/s) throughput, +132.32% (+612.9) obj/s
* 50% Median: +139.06% (+1175.7 MiB/s) throughput, +133.46% (+660.9) obj/s
* Slowest: +203.40% (+1267.9 MiB/s) throughput, +198.59% (+753.5) obj/s
```

TTFB from 10MiB BlockSize
```
* First Access TTFB: Avg: 81ms, Median: 61ms, Best: 20ms, Worst: 2.056s
```

TTFB from 1MiB BlockSize
```
* First Access TTFB: Avg: 22ms, Median: 21ms, Best: 8ms, Worst: 91ms
```

Full object reads however do see a slight change which won't be
noticeable in real world, so not doing any comparisons

TTFB still had improvements with full object reads with 1MiB

```
* First Access TTFB: Avg: 68ms, Median: 35ms, Best: 11ms, Worst: 1.16s
```

v/s

TTFB with 10MiB
```
* First Access TTFB: Avg: 388ms, Median: 98ms, Best: 20ms, Worst: 4.156s
```

This change should affect all new uploads, previous uploads should
continue to work with business as usual. But dramatic improvements can
be seen with these changes.
2021-03-06 14:09:34 -08:00
Harshavardhana
ec547c0fa8
enable race detector CI for macos-latest (#11715) 2021-03-05 14:16:23 -08:00
Harshavardhana
0b9c17443e
update gopsutil to use the v3 API (#11638) 2021-03-01 00:15:46 -08:00
Harshavardhana
76e2713ffe
fix: use buffers only when necessary for io.Copy() (#11229)
Use separate sync.Pool for writes/reads

Avoid passing buffers for io.CopyBuffer()
if the writer or reader implement io.WriteTo or io.ReadFrom
respectively then its useless for sync.Pool to allocate
buffers on its own since that will be completely ignored
by the io.CopyBuffer Go implementation.

Improve this wherever we see this to be optimal.

This allows us to be more efficient on memory usage.
```
   385  // copyBuffer is the actual implementation of Copy and CopyBuffer.
   386  // if buf is nil, one is allocated.
   387  func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
   388  	// If the reader has a WriteTo method, use it to do the copy.
   389  	// Avoids an allocation and a copy.
   390  	if wt, ok := src.(WriterTo); ok {
   391  		return wt.WriteTo(dst)
   392  	}
   393  	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
   394  	if rt, ok := dst.(ReaderFrom); ok {
   395  		return rt.ReadFrom(src)
   396  	}
```

From readahead package
```
// WriteTo writes data to w until there's no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
func (a *reader) WriteTo(w io.Writer) (n int64, err error) {
	if a.err != nil {
		return 0, a.err
	}
	n = 0
	for {
		err = a.fill()
		if err != nil {
			return n, err
		}
		n2, err := w.Write(a.cur.buffer())
		a.cur.inc(n2)
		n += int64(n2)
		if err != nil {
			return n, err
		}
```
2021-01-06 09:36:55 -08:00
Harshavardhana
cb0eaeaad8
feat: migrate to ROOT_USER/PASSWORD from ACCESS/SECRET_KEY (#11185) 2021-01-05 10:22:57 -08:00
Harshavardhana
4550ac6fff
fix: refactor locks to apply them uniquely per node (#11052)
This refactor is done for few reasons below

- to avoid deadlocks in scenarios when number
  of nodes are smaller < actual erasure stripe
  count where in N participating local lockers
  can lead to deadlocks across systems.

- avoids expiry routines to run 1000 of separate
  network operations and routes per disk where
  as each of them are still accessing one single
  local entity.

- it is ideal to have since globalLockServer
  per instance.

- In a 32node deployment however, each server
  group is still concentrated towards the
  same set of lockers that partipicate during
  the write/read phase, unlike previous minio/dsync
  implementation - this potentially avoids send
  32 requests instead we will still send at max
  requests of unique nodes participating in a
  write/read phase.

- reduces overall chattiness on smaller setups.
2020-12-10 07:28:37 -08:00
Anis Elleuch
6b7ced80fe
make: Add hotfix target to generate hotfix binaries (#11053)
hotfix target will fetch the release tag prior to the latest commit and create a binary
with the same release tag plus '.hotfix' suffix

e.g.   RELEASE.2020-12-03T05-49-24Z.hotfix
2020-12-08 08:12:13 -08:00
Harshavardhana
9c53cc1b83
fix: heal multiple buckets in bulk (#11029)
makes server startup, orders of magnitude
faster with large number of buckets
2020-12-05 13:00:44 -08:00
Harshavardhana
23e8390997
fix: Allow Walk to honor load balanced drives (#10610) 2020-10-01 20:24:34 -07:00
Harshavardhana
48919de301
fix: for defer'ed deleteObject use internal context (#10463) 2020-09-11 06:39:19 -07:00
Harshavardhana
ad8b53e6d4
add mips64 support for cross compilation (#10106) 2020-07-21 23:56:14 -07:00
Harshavardhana
174f428571
add additional fdatasync before close() on writes (#9947) 2020-07-01 10:57:23 -07:00
Harshavardhana
a38ce29137
fix: simplify background heal and trigger heal items early (#9928)
Bonus fix during versioning merge one of the PR was missing
the offline/online disk count fix from #9801 port it correctly
over to the master branch from release.

Additionally, add versionID support for MRF

Fixes #9910
Fixes #9931
2020-06-29 13:07:26 -07:00
Harshavardhana
4c9de098b0
heal buckets during init and make sure to wait on quorum (#9526)
heal buckets properly during expansion, and make sure
to wait for the quorum properly such that healing can
be retried.
2020-05-06 14:25:05 -07:00
Harshavardhana
6817c5ea58
migrate mint tests to latest versions (#9424) 2020-04-22 16:06:58 -07:00
Harshavardhana
ac07df2985
start watcher after all creds have been loaded (#9301)
start watcher after all creds have been loaded
to avoid any conflicting locks that might get
deadlocked.

Deprecate unused peer calls for LoadUsers()
2020-04-08 19:00:39 -07:00
Harshavardhana
43a3778b45
fix: support object-remaining-retention-days policy condition (#9259)
This PR also tries to simplify the approach taken in
object-locking implementation by preferential treatment
given towards full validation.

This in-turn has fixed couple of bugs related to
how policy should have been honored when ByPassGovernance
is provided.

Simplifies code a bit, but also duplicates code intentionally
for clarity due to complex nature of object locking
implementation.
2020-04-06 13:44:16 -07:00
Harshavardhana
886ae15464
trimpaths when building minio binaries (#9246) 2020-04-01 10:45:11 -07:00
Harshavardhana
d8af244708
Add numeric/date policy conditions (#9233)
add new policy conditions

- NumericEquals
- NumericNotEquals
- NumericLessThan
- NumericLessThanEquals
- NumericGreaterThan
- NumericGreaterThanEquals
- DateEquals
- DateNotEquals
- DateLessThan
- DateLessThanEquals
- DateGreaterThan
- DateGreaterThanEquals
2020-04-01 00:04:25 -07:00
Harshavardhana
8fbf2b0b2a
enable compilation on Linux arm/386 (#9077) 2020-03-03 22:27:47 +03:00
Harshavardhana
ab7d3cd508
fix: Speed up multi-object delete by taking bulk locks (#8974)
Change distributed locking to allow taking bulk locks
across objects, reduces usually 1000 calls to 1.

Also allows for situations where multiple clients sends
delete requests to objects with following names

```
{1,2,3,4,5}
```

```
{5,4,3,2,1}
```

will block and ensure that we do not fail the request
on each other.
2020-02-21 11:29:57 +05:30
Harshavardhana
02acff7fac
fix: cross platform builds update simdjson-go (#9005)
Fixes #9003
2020-02-16 08:37:27 -08:00
Harshavardhana
d1144c2c7e
reference format obtained doesn't need further validation (#8964)
we don't need to validateFormats again once we have obtained
reference format, because it is possible that at this stage
another server is doing a disk heal during startup, once
in a while due to delays we get false positives and our
server doesn't start.

Format in quorum as reference format can be assumed as valid
and we proceed further, until and unless HealFormat re-inits
the disks after a successful heal.

Also use separate port for healing tests to avoid any
conflicts with regular build testing.

Fixes #8884
2020-02-13 14:01:41 -08:00
Harshavardhana
78125ee853
enable minio-java mint tests (#8990) 2020-02-13 11:46:42 -08:00
Harshavardhana
c2c5b09bb1
Avoid object names with '//' to avoid hash inconsistencies (#8946)
This is to fix a situation where an object name incorrectly
is sent with '//' in its path heirarchy, we should reject
such object names because they may be hashed to a set where
the object might not originally belong because, this can
cause situations where once object is uploaded we cannot
delete it anymore.

Fixes #8873
2020-02-06 08:29:38 +05:30
Harshavardhana
4cb6ebcfa2 test: print more relevant info in healing failure (#8895) 2020-01-27 14:56:36 +05:30
Harshavardhana
ef1aa870c5 cleanup unneeded files, update credits (#8858)
additionally add code of conduct
2020-01-20 10:38:58 -08:00
Harshavardhana
64fde1ab95
xl/zones: return errNoHealRequired when no heal is required (#8821)
Zone abstraction of object layer was returning `nil`
incorrectly under situations where disk healing is
not required. Returning `nil` is considered as healing
successful, which leads to unexpected ReloadFormat()
peer notification calls during startup.

This PR fixes this behavior properly for zones.
2020-01-15 17:19:13 -08:00
Harshavardhana
442e1698cb
heal: Avoid spinning up object healing during startup (#8819)
auto-heal disks, metadata and buckets in background but
not objects, let the auto heal kick in for objects after
the cluster has been up for a while.
2020-01-15 01:08:39 -08:00
Harshavardhana
5aa5dcdc6d
lock: improve locker initialization at init (#8776)
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation

Add further tuning parameters such as

 - DialTimeout is now 15 seconds from 30 seconds
 - KeepAliveTimeout is not 20 seconds, 5 seconds
   more than default 15 seconds
 - ResponseHeaderTimeout to 10 seconds
 - ExpectContinueTimeout is reduced to 3 seconds
 - DualStack is enabled by default remove setting
   it to `true`
 - Reduce IdleConnTimeout to 30 seconds from
   1 minute to avoid idleConn build up

Fixes #8773
2020-01-10 02:35:06 -08:00
Harshavardhana
d4a390028a node 6.x is EOL'ed upgrade to latest stable (#8702) 2019-12-26 08:27:35 +05:30
Harshavardhana
99ad445260
Avoid double for loops in notification init (#8691) 2019-12-24 13:49:48 -08:00
Harshavardhana
54431b3953 Change replica set detection for localhost on single endpoint (#8692) 2019-12-24 11:31:32 -08:00
Harshavardhana
2ab8d5e47f Enable build verification with race (#8583) 2019-12-02 15:54:26 -08:00
Harshavardhana
5d65428b29
Handle localhost distributed setups properly (#8577)
Fixes an issue reported by @klauspost and @vadmeste

This PR also allows users to expand their clusters
from single node XL deployment to distributed mode.
2019-11-26 11:42:10 -08:00
Harshavardhana
720442b1a2
Add lock expiry handler to expire state locks (#8562) 2019-11-25 16:39:43 -08:00
Harshavardhana
4e9de58675 Avoid pointer based copy, instead use Clone() (#8547)
This PR adds functional test to test expanded
cluster syntax.
2019-11-21 17:54:51 +05:30
Harshavardhana
94e5cb7576
Migrate to go1.13 to avail all new features (#8203)
Read more https://blog.golang.org/go1.13
2019-09-08 16:44:15 -07:00
poornas
7bf1caa0fe Fix broken link to go install docs (#8090) 2019-08-15 16:00:50 -07:00
Harshavardhana
b83413b167 Use GOPROXY to speed up builds (#7984)
Read more here https://proxy.golang.org proposal 
for go1.13
2019-07-30 22:27:11 +05:30
kannappanr
f409f10d18 Fix SimpleCI to use different data directory than mint (#7520)
Currently, the backend minio server uses the same data directory
as the mint test itself, causing `s3 sync` to fail often.

Now `minio` backend will use a different data directory `/data`
instead of `/mint/data`
2019-04-12 12:51:36 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana
313a3a286a Migrate to go1.12 to simplify our cmd/http package (#7302)
Simplify the cmd/http package overall by removing
custom plain text v/s tls connection detection, by
migrating to go1.12 and choose minimum version
to be go1.12

Also remove all the vendored deps, since they
are not useful anymore.
2019-04-02 18:28:39 -07:00
Harshavardhana
719d21efd8 Generate coverage across all sub-dirs (#7416) 2019-03-25 11:54:14 -07:00
Sidhartha Mani
6bc0de2a75 add go modules file and start running go 1.11 style builds (#7354) 2019-03-19 13:50:58 -07:00
Sidhartha Mani
b983da957d run gateway mint test in full mode (#7296) 2019-02-27 10:03:23 -08:00
Sidhartha Mani
e9fdea05c6 Enable CI control from repository: Add Dockerfile.simpleci (#7122) 2019-02-01 12:04:28 -08:00
Harshavardhana
8e0910ab3e Fix build issues on BSDs in pkg/cpu (#7116)
Also add a cross compile script to test always cross
compilation for some well known platforms and architectures
, we support out of box compilation of these platforms even
if we don't make an official release build.

This script is to avoid regressions in this area when we
add platform dependent code.
2019-01-22 09:27:23 +05:30