Commit Graph

134 Commits

Author SHA1 Message Date
Harshavardhana
9c85928740
add formatting message for zones in ordinals (#9596)
Unlike the message
> Formatting 2 zone, 1 set(s), 6 drives per set.

It is more readable as ordinal
> Formatting 2nd zone, 1 set(s), 6 drives per set.
2020-05-13 20:25:29 -07:00
Harshavardhana
6ac48a65cb
fix: use unused cacheMetrics code in prometheus (#9588)
remove all other unusued/deadcode
2020-05-13 08:15:26 -07:00
Klaus Post
c4464e36c8
fix: limit HTTP transport tuables to affordable values (#9383)
Close connections pro-actively in transient calls
2020-04-17 11:20:56 -07:00
Harshavardhana
f44cfb2863
use GlobalContext whenever possible (#9280)
This change is throughout the codebase to
ensure that all codepaths honor GlobalContext
2020-04-09 09:30:02 -07:00
Harshavardhana
4714958e99
fix: possible connection leaks in sets init, heal (#9263) 2020-04-03 18:06:31 -07:00
Harshavardhana
6f992134a2
fix: startup load time by reusing storageDisks (#9210) 2020-03-27 14:48:30 -07:00
Harshavardhana
ff932ca2a0
fix: log only catastrophic errors in prepare storage (#9189) 2020-03-23 07:32:18 -07:00
Harshavardhana
6a00eb10bf
fix: allow set drive count of proper divisible values (#9101)
Currently the code assumed some orthogonal requirements
which led situations where when we have a setup where
we have let's say for example 168 drives, the final
set_drive_count chosen was 14. Indeed 168 drives are
divisible by 12 but this wasn't allowed due to an
unexpected requirement to have 12 to be a perfect modulo
of 14 which is not possible. This assumption was incorrect.

This PR fixes this old assumption properly, also adds
few tests and some negative tests as well. Improvements
are seen in error messages as well.
2020-03-08 13:30:25 -07:00
Harshavardhana
792ee48d2c
add additional logging during server formatting (#9102) 2020-03-08 12:12:07 -07:00
Harshavardhana
d1144c2c7e
reference format obtained doesn't need further validation (#8964)
we don't need to validateFormats again once we have obtained
reference format, because it is possible that at this stage
another server is doing a disk heal during startup, once
in a while due to delays we get false positives and our
server doesn't start.

Format in quorum as reference format can be assumed as valid
and we proceed further, until and unless HealFormat re-inits
the disks after a successful heal.

Also use separate port for healing tests to avoid any
conflicts with regular build testing.

Fixes #8884
2020-02-13 14:01:41 -08:00
Harshavardhana
e2b3c083aa
fix: close and drain the response body always (#8847) 2020-01-21 02:46:58 -08:00
Harshavardhana
23e46f9dba
log formatting only the first time (#8846) 2020-01-17 15:39:07 -08:00
Anis Elleuch
069876e262 xl: All nodes create meta volumes in its local disks (#8786)
Meta volumes directories, tmp/, background-ops/, etc..
undr .minio.sys are created when disks are formatted
but also when the cluster is started.

However using MakeVolBulk() is not appropriate in the
case of a user migrating from a version which does not
have .minio.sys/background-ops/. The reason is that
MakeVolBulk() exits early when an error is occured:
errVolumeExists in this case, which is expected since
some directories such as tmp/ already exist.

This commit will avoid use MakeVolBulk and use MakeVol
instead.

Also the PR will make each node creates meta volumes
in its local disks and stop relying on the first disk
since the first node could be offline.
2020-01-15 12:36:52 -08:00
Harshavardhana
0879a4f743 rest/storage: Remove racy LastError usage (#8817)
instead perform a liveness check call to
verify if server is online and print relevant
errors.

Also introduce a StorageErr string error type
instead of errors.New() deprecate usage of
VerifyFileError, DeleteFileError for gob,
change in datastructure also requires bump in
storage REST version to v13.

Fixes #8811
2020-01-14 18:45:17 -08:00
Klaus Post
37b32199e3 Validate XL sets on format (#8779)
When formatting a set validate if a host failure will likely lead to data loss.

While we don't know what config will be set in the future 
evaluate to our best knowledge, assuming default settings.
2020-01-13 13:09:10 -08:00
Klaus Post
3d318bae76 init: Use constant time retries (#8769)
Exponential backoff does not seem like a good fit for
this function since we can expect a few roundtrips on
initial startup.

This retry loop get slow pretty quickly with initial
wait being 1 second and each try being double the
wait until 30 seconds is reached.

Instead simply try 2 times per second.
2020-01-08 13:37:34 -08:00
Harshavardhana
f68a7005c0 Improve disk formatting stage for large disk sets (#8690) 2019-12-23 16:31:03 -08:00
Anis Elleuch
555969ee42 Add data usage collect with its new admin API (#8553)
Admin data usage info API returns the following

(Only FS & XL, for now)

- Number of buckets
- Number of objects
- The total size of objects
- Objects histogram
- Bucket sizes
2019-12-12 06:02:37 -08:00
Harshavardhana
5d3d57c12a
Start using error wrapping with fmt.Errorf (#8588)
Use fatih/errwrap to fix all the code to use
error wrapping with fmt.Errorf()
2019-12-02 09:28:01 -08:00
Harshavardhana
4e9de58675 Avoid pointer based copy, instead use Clone() (#8547)
This PR adds functional test to test expanded
cluster syntax.
2019-11-21 17:54:51 +05:30
Harshavardhana
8392d2f510 Preserve same deploymentID on all zones (#8542) 2019-11-20 15:39:30 +05:30
Harshavardhana
347b29d059 Implement bucket expansion (#8509) 2019-11-19 17:42:27 -08:00
Harshavardhana
e9b2bf00ad Support MinIO to be deployed on more than 32 nodes (#8492)
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.

In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
2019-11-13 12:17:45 -08:00
Harshavardhana
47b13cdb80 Add etcd part of config support, add noColor/json support (#8439)
- Add color/json mode support for get/help commands
- Support ENV help for all sub-systems
- Add support for etcd as part of config
2019-10-30 00:04:39 -07:00
Harshavardhana
d48fd6fde9
Remove unusued params and functions (#8399) 2019-10-15 18:35:41 -07:00
Harshavardhana
127641731a
Parallelize initialization of storageDisks (#8288) 2019-09-27 16:47:12 -07:00
Harshavardhana
843f481eb3 Allow "tmp" directory to be not available (#8021)
Also additionally add more context to the errors
generated by filesystem, to facilitate better
debugging.
2019-08-05 11:41:29 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Krishnan Parthasarathi
93a9078b23 Assign deploymentID for first minio server in distributed setup (#7427)
- Pass local endpoints to functions fixing formatXL during startup
2019-04-02 10:50:13 -07:00
Harshavardhana
526546d588 Remove '.minio.sys/tmp' files in background (#7124)
If it does happen that we have a lot files in '.minio.sys/tmp',
minio startup might block deleting this folder. Rename and
delete in background instead to allow Minio to start serving
requests.
2019-01-25 13:33:28 -08:00
Harshavardhana
6add646130 Fix logging of initialization errors in distributed mode (#6914) 2018-12-04 10:25:56 -08:00
Harshavardhana
bfb505aa8e Refactor logging in more Go idiomatic style (#6816)
This refactor brings a change which allows
targets to be added in a cleaner way and also
audit is now moved out.

This PR also simplifies logger dependency for auditing
2018-11-19 14:47:03 -08:00
Harshavardhana
a63bc9254d Add 'disk' tag to log output to enhance 'disk not found' errors (#6460) 2018-09-13 21:42:50 -07:00
Anis Elleuch
7571582000 Print storage errors during distributed initialization (#6441)
This commit will print connection failures to other disks in other nodes
after 5 retries. It is useful for users to understand why the
distribued cluster fails to boot up.
2018-09-10 16:21:59 -07:00
Harshavardhana
e0f8b767ba Fail for critical errors early on during prepare storage (#6404) 2018-09-05 10:20:54 -07:00
kannappanr
0286e61aee Log disk not found error just once (#6059)
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.

Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf, 
which will print on any error.

Fixes #5997
2018-08-14 13:58:48 -07:00
Harshavardhana
2dede2fdc2 Add reliable RemoveAll to handle racy situations (#6227) 2018-08-06 09:45:28 +05:30
kannappanr
43cc0096fa
Add support for deployment ID (#6144)
deployment ID helps in identifying a minio deployment in the case of remote
logging targets.
2018-07-18 20:17:35 -07:00
Harshavardhana
1f07545e2a
Improve init messages for distributed setup (#5786)
Fixes #5531
2018-04-12 15:43:38 -07:00
kannappanr
57a3d9c16c
Modify fatalIf, startup and update message logging code (#5780)
Use a common logging framework to log fatalIf, startup, Info and Update
messages.
2018-04-10 09:37:14 -07:00
kannappanr
cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Harshavardhana
85a57d2021 Make sure to close the disk connections (#5752)
Since we do not re-use storageDisks after moving
the connections to object layer we should close them
appropriately otherwise we have a lot of connection
leaks and these can compound as the time goes by.

This PR also refactors the initialization code to
re-use storageDisks for given set of endpoints until
we have confirmed a valid reference format.
2018-04-04 10:28:48 +05:30
Harshavardhana
2938e332ba Fix format migration regression (#5668)
Migration regression got introduced in 9083bc152e
adding more unit tests to catch this scenario, we need to fix this by
re-writing the formats after the migration to 'V3'.

This bug only happens when a user is migrating directly from V1 to V3,
not from V1 to V2 and V2 to V3.

Added additional unit tests to cover these situations as well.

Fixes #5667
2018-03-19 21:43:00 +05:30
Krishna Srinivas
9083bc152e Flat multipart backend implementation for Erasure backend (#5447) 2018-03-15 13:55:23 -07:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Harshavardhana
1164fc60f3 Bring semantic versioning to provide for rolling upgrades (#5495)
This PR brings semver capabilities in our RPC layer to
ensure that we can upgrade the servers in rolling fashion
while keeping I/O in progress. This is only a framework change
the functionality remains the same as such and we do not
have any special API changes for now. But in future when
we bring in API changes we will be able to upgrade servers
without a downtime.

Additional change in this PR is to not abort when serverVersions
mismatch in a distributed cluster, instead wait for the quorum
treat the situation as if the server is down. This allows
for administrator to properly upgrade all the servers in the cluster.

Fixes #5393
2018-02-06 15:07:17 -08:00
ebozduman
24d9d7e5fa Removes logrus package and refactors logging messages (#5293)
This fix removes logrus package dependency and refactors the console
logging as the only logging mechanism by removing file logging support.
It rearranges the log message format and adds stack trace information
whenever trace information is not available in the error structure.
It also adds `--json` flag support for server logging.
When minio server is started with `--json` flag, all log messages are
displayed in json format, with no start-up and informational log
messages.
Fixes #5265 #5220 #5197
2018-01-17 07:24:46 -08:00
A. Elleuch
2244adff07 fix: Better printing of XL config init error (#5284) 2017-12-28 23:02:48 +05:30
Harshavardhana
819d1e80c6 Add more delays on distributed startup for slow network (#5240)
Refer #5237
2017-12-16 08:25:29 -08:00
Harshavardhana
8efa82126b
Convert errors tracer into a separate package (#5221) 2017-11-25 11:58:29 -08:00
Aditya Manthramurthy
d23ded0d83 Use retryableStorage after healing format.json (#5105)
- Previously networkStorage was being used and this lead to errors
  when listing with a down server/disk

Fixes #5089
2017-10-26 09:52:23 -07:00
Harshavardhana
b9fc4150f6 Fix preInit logic when mixed disk situations exist. (#4904)
When servers are started simultaneously across multiple
nodes or simulating a local setup, it can happen such
that one of the servers in setup reaches a following
situation where it observes

 - Some servers are formatted
 - Some servers are unformatted
 - Some servers are offline

Current state machine doesn't handle this correctly, to fix
this situation where we have unformatted, formatted and
disks offline we do not decisively know the course of
action. So we wait for the offline disks to change their state.

Once the offline disks change their state to either one of these
states we can decisively move forward.

  - nil (formatted disk)
  - errUnformattedDisk
  - Or any other error such as errCorruptedDisk.

Fixes #4903
2017-09-12 12:17:44 -07:00
Frank Wessels
98b62cbec8 Implement an offline mode for a distributed node (#4646)
Implement an offline mode for remote storage to cache the
offline status of a node in order to prevent network calls
that are bound to fail. After a time interval an attempt
will be made to restore the connection and mark the node
as online if successful.

Fixes #4183
2017-08-11 11:49:35 -07:00
Harshavardhana
e99244be02 xl: prepare storage should Abort properly. (#4542)
Current state-machine didn't honor a situation
which can arise when there is a combination of

 - formatted
 - unformatted
 - corrupted

disks - this combination invariably goes into a
mode where all servers are waiting perpetually
forever thinking we will get quorum in future.

At this point there is a distant possibility of
ever getting a quorum since we don't even have
quorum number of disks offline.

We should exit and print a proper message per disk
to indicate what went wrong and what was detected
by the server.

Refer #4477
2017-06-17 11:20:12 -07:00
Harshavardhana
f4dac979a2 server: Fix message when corrupted or unsupported format is found. (#4142)
Refer https://github.com/minio/minio/issues/4140

This is a fix to provide a little more elaborate message.
2017-04-18 10:35:17 -07:00
Bala FA
de204a0a52 Add extensive endpoints validation (#4019) 2017-04-11 15:44:27 -07:00
Aditya Manthramurthy
604417baf4 Allow cluster to start when only n/2 servers are up (#4066)
Fixes #3234.

Relaxes the quorum requirement to start the object layer, and skips
quick-healing at start-up (as no write quorum is present).
2017-04-09 00:28:27 -07:00
Bala FA
d3cb79a57c Refactor logger (#3924)
This patch fixes below

* Previously fatalIf() never writes log other than first logging target.
* quiet flag is not honored to show progress messages other than startup messages.
* Removes console package usage for progress messages.
2017-03-23 16:36:00 -07:00
Harshavardhana
310bf5bd36 auth/rpc: Make auth rpc client retry configurable. (#3695)
Currently the auth rpc client defaults to to a maximum
cap of 30seconds timeout. Make this to be configurable
by the caller of authRPCClient during initialization, if no
such config is provided then default to 30 seconds.
2017-02-07 02:16:29 -08:00
Harshavardhana
1c699d8d3f fs: Re-implement object layer to remember the fd (#3509)
This patch re-writes FS backend to support shared backend sharing locks for safe concurrent access across multiple servers.
2017-01-16 17:05:00 -08:00
Harshavardhana
8562b22823 Fix delays and iterim fix for the partial fix in #3502 (#3511)
This patch uses a technique where in a retryable storage
before object layer initialization has a higher delay
and waits for longer period upto 4 times with time unit
of seconds.

And uses another set of configuration after the disks
have been formatted, i.e use a lower retry backoff rate
and retrying only once per 5 millisecond.

Network IO error count is reduced to a lower value i.e 256
before we reject the disk completely. This is done so that
combination of retry logic and total error count roughly
come to around 2.5secs which is when we basically take the
disk offline completely.

NOTE: This patch doesn't fix the issue of what if the disk
is completely dead and comes back again after the initialization.
Such a mutating state requires a change in our startup sequence
which will be done subsequently. This is an interim fix to alleviate
users from these issues.
2016-12-30 17:08:02 -08:00
Harshavardhana
2062add05f fs/posix: On windows use helpers and init format.json properly. (#3434)
Fixes #3433
2016-12-12 15:43:41 -08:00
Harshavardhana
2d6f8153fa format: Check properly for disks in valid formats. (#3427)
There was an error in how we validated disk formats,
if one of the disk was formatted and was formatted with
FS would cause confusion and object layer would never
initialize essentially go into an infinite loop.

Validate pre-emptively and also check for FS format
properly.
2016-12-11 15:18:55 -08:00
Harshavardhana
b363709c11 caching: Optimize memory allocations. (#3405)
This change brings in changes at multiple places

 - Reuse buffers at almost all locations ranging
   from rpc, fs, xl, checksum etc.
 - Change caching behavior to disable itself
   under low memory conditions i.e < 8GB of RAM.
 - Only objects cached are of size 1/10th the size
   of the cache for example if 4GB is the cache size
   the maximum object size which will be cached
   is going to be 400MB. This change is an
   optimization to cache more objects rather
   than few larger objects.
 - If object cache is enabled default GC
   percent has been reduced to 20% in lieu
   with newly found behavior of GC. If the cache
   utilization reaches 75% of the maximum value
   GC percent is reduced to 10% to make GC
   more aggressive.
 - Do not use *bytes.Buffer* due to its growth
   requirements. For every allocation *bytes.Buffer*
   allocates an additional buffer for its internal
   purposes. This is undesirable for us, so
   implemented a new cappedWriter which is capped to a
   desired size, beyond this all writes rejected.

Possible fix for #3403.
2016-12-08 20:35:07 -08:00
Anis Elleuch
410b579e87 startup: Show elapsed time in disks format process (#3413) 2016-12-07 10:22:00 -08:00
Harshavardhana
6efee2072d objectLayer: Check for format.json in a wrapped disk. (#3311)
This is needed to validate if the `format.json` indeed exists
when a fresh node is brought online.

This wrapped implementation also connects to the remote node
by attempting a re-login. Subsequently after a successful
connect `format.json` is validated as well.

Fixes #3207
2016-11-23 15:48:10 -08:00
Harshavardhana
c91d3791f9 heal: Add healing support for bucket, bucket metadata files. (#3252)
This patch implements healing in general but it is only used
as part of quickHeal().

Fixes #3237
2016-11-16 16:42:23 -08:00
Harshavardhana
1b85302161 Fix spelling and golint errors. (#3266)
Fixes #3263
2016-11-15 18:14:23 -08:00
Harshavardhana
716316f711 Reduce number of envs and options from command line. (#3230)
Ref #3229

After review with @abperiasamy we decided to remove all the unnecessary options

- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
  since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)

Remove --ignore-disks option - this option was implemented when XL layer
 would initialize the backend disks and heal them automatically to disallow
 XL accidentally using the root partition itself this option was introduced.

This behavior has been changed XL no longer automatically initializes
`format.json`  a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs.  This patch also
serves as a guideline of limiting many ways to do the same thing.
2016-11-11 16:40:55 -08:00
Harshavardhana
3e67bfcc88 heal: Print heal command appropriately without export path. (#3208)
Fixes #3204
2016-11-09 10:50:14 -08:00
Anis Elleuch
e6965ca066 Quit initializing disks process when term signal is invoked (#3163) 2016-11-02 15:27:36 -07:00
Anis Elleuch
79601d27f2 Use endpoint url when printing disks status in distributed mode (#3151) 2016-11-02 08:51:06 -07:00
Harshavardhana
9e2d0ac50b Move to URL based syntax formatting. (#3092)
For command line arguments we are currently following

- <node-1>:/path ... <node-n>:/path

This patch changes this to

- http://<node-1>/path ... http://<node-n>/path
2016-10-27 03:30:52 -07:00
Anis Elleuch
8871eb8e1e Show offline nodes after a fixed number of init retry (#3107) 2016-10-26 16:09:06 -07:00
Krishna Srinivas
32c3a558e9 distributed-XL: Support to run one minio process per export even on the same machine. (#2999)
fixes #2983
2016-10-20 18:31:02 -07:00
Anis Elleuch
fa50312220 Avoid returning disk corrupted by servers in the middle of init all disks formats (#2964) 2016-10-17 08:39:55 -07:00
Harshavardhana
f1bc9343a1 prep: Initialization should wait instead of exit the servers. (#2872)
- Servers do not exit for invalid credentials instead they print and wait.
- Servers do not exit for version mismatch instead they print and wait.
- Servers do not exit for time differences between nodes they print and wait.
2016-10-07 11:15:55 -07:00
Harshavardhana
6494b77d41 server: Add more elaborate startup messages. (#2731)
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.

This change also does a much needed refactoring.
2016-10-05 12:48:07 -07:00
Harshavardhana
03430d0db8 tests: Add ListBucketHandler tests. (#2701)
part-3 final fix for #2412
2016-09-14 23:53:42 -07:00
Krishnan Parthasarathi
66459a4ce0 Add unit-tests for formatting disks during initialization (#2635)
* Add unit-tests for formatting disks during initialization

- Fixed corresponding code at places where it was deviating from the
  tabular spec.

* Added more test cases and simplified algo

... based on feedback from ``go test -coverprofile``.
2016-09-13 21:18:30 -07:00
Harshavardhana
ba2ba328da server: Fixes for various conditions
- Fix distributed branch to be able to run FS version.
- Fix distributed branch to be able to run XL local disks.
- Ignore initialization failures of notification and bucket
  policies, the codepath should load whatever is possible.
2016-09-13 21:18:30 -07:00
Harshavardhana
ae64b7fac8 XL: Handle object layer initialization properly.
Initialization when disk was down the network disk
reported an incorrect error rather than errDiskNotFound.

This resulted in incorrect error handling during
prepInitStorage() stage.

Fixes #2577
2016-09-13 21:18:30 -07:00
Krishnan Parthasarathi
de67bca211 Move formatting of disks out of object layer initialization (#2572) 2016-09-13 21:18:30 -07:00