Fix docs.min.io linking issues (#7710)

This commit is contained in:
Harshavardhana 2019-05-30 17:19:25 -07:00 committed by kannappanr
parent 900cc27b51
commit 0c16b1c9a7
2 changed files with 54 additions and 57 deletions

View File

@ -2,107 +2,107 @@
## **1. Cloud-native Architecture** ## **1. Cloud-native Architecture**
![cloud-native](images/image1.png "cloud native architecture") ![cloud-native](https://github.com/minio/minio/blob/master/docs/bigdata/images/image1.png?raw=true "cloud native architecture")
Kubernetes manages stateless Spark and Hive containers elastically on the compute nodes. Spark has native scheduler integration with Kubernetes. Hive, for legacy reasons, uses YARN scheduler on top of Kubernetes.
Kubernetes manages stateless Spark and Hive containers elastically on the compute nodes. Spark has native scheduler integration with Kubernetes. Hive, for legacy reasons, uses YARN scheduler on top of Kubernetes. \
\
All access to MinIO object storage is via S3/SQL SELECT API. In addition to the compute nodes, MinIO containers are also managed by Kubernetes as stateful containers with local storage (JBOD/JBOF) mapped as persistent local volumes. This architecture enables multi-tenant MinIO, allowing isolation of data between customers. All access to MinIO object storage is via S3/SQL SELECT API. In addition to the compute nodes, MinIO containers are also managed by Kubernetes as stateful containers with local storage (JBOD/JBOF) mapped as persistent local volumes. This architecture enables multi-tenant MinIO, allowing isolation of data between customers.
MinIO also supports multi-cluster, multi-site federation similar to AWS regions and tiers. Using MinIO Information Lifecycle Management (ILM), you can configure data to be tiered between NVMe based hot storage, and HDD based warm storage. All data is encrypted with per-object key. Access Control and Identity Management between the tenants are managed by MinIO using OpenID Connect or Kerberos/LDAP/AD. MinIO also supports multi-cluster, multi-site federation similar to AWS regions and tiers. Using MinIO Information Lifecycle Management (ILM), you can configure data to be tiered between NVMe based hot storage, and HDD based warm storage. All data is encrypted with per-object key. Access Control and Identity Management between the tenants are managed by MinIO using OpenID Connect or Kerberos/LDAP/AD.
## **2. Prerequisites** ## **2. Prerequisites**
* Install Hortonworks Distribution using this [guide.](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/ch_Installing_Ambari.html) * Install Hortonworks Distribution using this [guide.](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/ch_Installing_Ambari.html)
* [Setup Ambari](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/set_up_the_ambari_server.html) which automatically sets up YARN * [Setup Ambari](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_ambari-installation/content/set_up_the_ambari_server.html) which automatically sets up YARN
* [Installing Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/installing-spark/content/installing_spark.html) * [Installing Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/installing-spark/content/installing_spark.html)
* Install MinIO Distributed Server using one of the guides below. * Install MinIO Distributed Server using one of the guides below.
* [Deployment based on Kubernetes](https://docs.min.io/docs/deploy-minio-on-kubernetes.html#minio-distributed-server-deployment) * [Deployment based on Kubernetes](https://docs.min.io/docs/deploy-minio-on-kubernetes.html#minio-distributed-server-deployment)
* [Deployment based on MinIO Helm Chart](https://github.com/helm/charts/tree/master/stable/minio) * [Deployment based on MinIO Helm Chart](https://github.com/helm/charts/tree/master/stable/minio)
## **3. Configure Hadoop, Spark, Hive to use MinIO** ## **3. Configure Hadoop, Spark, Hive to use MinIO**
After successful installation navigate to the Ambari UI http://<ambari-server>:8080/ and login using the default credentials: [**_username: admin, password: admin_**] After successful installation navigate to the Ambari UI `http://<ambari-server>:8080/` and login using the default credentials: [**_username: admin, password: admin_**]
![ambari-login](images/image3.png "ambari login") ![ambari-login](https://github.com/minio/minio/blob/master/docs/bigdata/images/image3.png?raw=true "ambari login")
### **3.1 Configure Hadoop** ### **3.1 Configure Hadoop**
Navigate to **Services** -> **HDFS** -> **CONFIGS** -> **ADVANCED** as shown below Navigate to **Services** -> **HDFS** -> **CONFIGS** -> **ADVANCED** as shown below
![hdfs-configs](images/image2.png "hdfs advanced configs") ![hdfs-configs](https://github.com/minio/minio/blob/master/docs/bigdata/images/image2.png?raw=true "hdfs advanced configs")
Navigate to **Custom core-site** to configure MinIO parameters for `_s3a_` connector Navigate to **Custom core-site** to configure MinIO parameters for `_s3a_` connector
![s3a-config](images/image5.png "custom core-site") ![s3a-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image5.png?raw=true "custom core-site")
Add the following optimal entries for _core-site.xml_ to configure _s3a_ with **MinIO**. Most important options here are Add the following optimal entries for _core-site.xml_ to configure _s3a_ with **MinIO**. Most important options here are
* _fs.s3a.access.key=minio_ (Access Key to access MinIO instance, this is obtained after the deployment on k8s) * _fs.s3a.access.key=minio_ (Access Key to access MinIO instance, this is obtained after the deployment on k8s)
* _fs.s3a.secret.key=minio123_ (Secret Key to access MinIO instance, this is obtained after the deployment on k8s) * _fs.s3a.secret.key=minio123_ (Secret Key to access MinIO instance, this is obtained after the deployment on k8s)
* _fs.s3a.endpoint=http://minio-address/_ * _fs.s3a.endpoint=`http://minio-address/`_
* _fs.s3a.path.style.acces=true_ * _fs.s3a.path.style.acces=true_
![s3a-config](images/image4.png "custom core-site s3a") ![s3a-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image4.png?raw=true "custom core-site s3a")
The rest of the other optimization options are discussed in the links below The rest of the other optimization options are discussed in the links below
* [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) * [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)
* [https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html](https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html) * [https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html](https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html)
Once the config changes are applied, proceed to restart **Hadoop** services. Once the config changes are applied, proceed to restart **Hadoop** services.
![hdfs-services](images/image7.png "hdfs restart services") ![hdfs-services](https://github.com/minio/minio/blob/master/docs/bigdata/images/image7.png?raw=true "hdfs restart services")
### **3.2 Configure Spark2** ### **3.2 Configure Spark2**
Navigate to **Services** -> **Spark2** -> **CONFIGS** as shown below Navigate to **Services** -> **Spark2** -> **CONFIGS** as shown below
![spark-config](images/image6.png "spark config") ![spark-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image6.png?raw=true "spark config")
Navigate to “**Custom spark-defaults**” to configure MinIO parameters for `_s3a_` connector Navigate to “**Custom spark-defaults**” to configure MinIO parameters for `_s3a_` connector
![spark-config](images/image9.png "spark defaults") ![spark-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image9.png?raw=true "spark defaults")
Add the following optimal entries for _spark-defaults.conf_ to configure Spark with **MinIO**. Add the following optimal entries for _spark-defaults.conf_ to configure Spark with **MinIO**.
* _spark.hadoop.fs.s3a.committer.magic.enabled=true_ * _spark.hadoop.fs.s3a.committer.magic.enabled=true_
* _spark.hadoop.fs.s3a.committer.name=magic_ * _spark.hadoop.fs.s3a.committer.name=magic_
* _spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem_ * _spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem_
* _spark.hadoop.fs.s3a.path.style.access=true_ * _spark.hadoop.fs.s3a.path.style.access=true_
* _spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory_ * _spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory_
![spark-config](images/image8.png "spark custom configuration") ![spark-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image8.png?raw=true "spark custom configuration")
Once the config changes are applied, proceed to restart **Spark** services. Once the config changes are applied, proceed to restart **Spark** services.
![spark-config](images/image12.png "spark restart services") ![spark-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image12.png?raw=true "spark restart services")
### **3.3 Configure Hive** ### **3.3 Configure Hive**
Navigate to **Services** -> **Hive** -> **CONFIGS**-> **ADVANCED** as shown below Navigate to **Services** -> **Hive** -> **CONFIGS**-> **ADVANCED** as shown below
![hive-config](images/image10.png "hive advanced config") ![hive-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image10.png?raw=true "hive advanced config")
Navigate to “**Custom hive-site**” to configure MinIO parameters for `_s3a_` connector Navigate to “**Custom hive-site**” to configure MinIO parameters for `_s3a_` connector
![hive-config](images/image11.png "hive advanced config") ![hive-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image11.png?raw=true "hive advanced config")
Add the following optimal entries for `hive-site.xml` to configure Hive with **MinIO**. Add the following optimal entries for `hive-site.xml` to configure Hive with **MinIO**.
* _hive.blobstore.use.blobstore.as.scratchdir=true_ * _hive.blobstore.use.blobstore.as.scratchdir=true_
* _hive.exec.input.listing.max.threads=50_ * _hive.exec.input.listing.max.threads=50_
* _hive.load.dynamic.partitions.thread=25_ * _hive.load.dynamic.partitions.thread=25_
* _hive.metastore.fshandler.threads=50_ * _hive.metastore.fshandler.threads=50_
* _hive.mv.files.threads=40_ * _hive.mv.files.threads=40_
* _mapreduce.input.fileinputformat.list-status.num-threads=50_ * _mapreduce.input.fileinputformat.list-status.num-threads=50_
For more information about these options please visit [https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_hive_on_s3_tuning.html](https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_hive_on_s3_tuning.html) For more information about these options please visit [https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_hive_on_s3_tuning.html](https://www.cloudera.com/documentation/enterprise/5-11-x/topics/admin_hive_on_s3_tuning.html)
![hive-config](images/image13.png "hive advanced custom config") ![hive-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image13.png?raw=true "hive advanced custom config")
Once the config changes are applied, proceed to restart all Hive services. Once the config changes are applied, proceed to restart all Hive services.
![hive-config](images/image14.png "restart hive services") ![hive-config](https://github.com/minio/minio/blob/master/docs/bigdata/images/image14.png?raw=true "restart hive services")
## **4. Run Sample Applications** ## **4. Run Sample Applications**
@ -114,16 +114,16 @@ Test the Spark installation by running the following compute intensive example,
Follow these steps to run the Spark Pi example: Follow these steps to run the Spark Pi example:
* Login as user **spark**. * Login as user **spark**.
* When the job runs, the library can now use **MinIO** during intermediate processing. * When the job runs, the library can now use **MinIO** during intermediate processing.
* Navigate to a node with the Spark client and access the spark2-client directory: * Navigate to a node with the Spark client and access the spark2-client directory:
``` ```
cd /usr/hdp/current/spark2-client cd /usr/hdp/current/spark2-client
su spark su spark
``` ```
* Run the Apache Spark Pi job in yarn-client mode, using code from **org.apache.spark**: * Run the Apache Spark Pi job in yarn-client mode, using code from **org.apache.spark**:
``` ```
./bin/spark-submit --class org.apache.spark.examples.SparkPi \ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
@ -150,9 +150,9 @@ WordCount is a simple program that counts how often a word occurs in a text file
The following example submits WordCount code to the Scala shell. Select an input file for the Spark WordCount example. We can use any text file as input. The following example submits WordCount code to the Scala shell. Select an input file for the Spark WordCount example. We can use any text file as input.
* Login as user **spark**. * Login as user **spark**.
* When the job runs, the library can now use **MinIO** during intermediate processing. * When the job runs, the library can now use **MinIO** during intermediate processing.
* Navigate to a node with Spark client and access the spark2-client directory: * Navigate to a node with Spark client and access the spark2-client directory:
``` ```
cd /usr/hdp/current/spark2-client cd /usr/hdp/current/spark2-client
@ -196,7 +196,7 @@ Type :help for more information.
scala> scala>
``` ```
* At the _scala>_ prompt, submit the job by typing the following commands, Replace node names, file name, and file location with your values: * At the _scala>_ prompt, submit the job by typing the following commands, Replace node names, file name, and file location with your values:
``` ```
scala> val file = sc.textFile("s3a://testbucket/testdata") scala> val file = sc.textFile("s3a://testbucket/testdata")
@ -231,4 +231,3 @@ Found 3 items
-rw-rw-rw- 1 spark spark 4956 2019-05-04 01:36 s3a://testbucket/wordcount/part-00000 -rw-rw-rw- 1 spark spark 4956 2019-05-04 01:36 s3a://testbucket/wordcount/part-00000
-rw-rw-rw- 1 spark spark 5616 2019-05-04 01:36 s3a://testbucket/wordcount/part-00001 -rw-rw-rw- 1 spark spark 5616 2019-05-04 01:36 s3a://testbucket/wordcount/part-00001
``` ```

View File

@ -19,9 +19,8 @@ Copy the existing private key and public certificate to the `certs` directory. T
* **Linux:** `${HOME}/.minio/certs` * **Linux:** `${HOME}/.minio/certs`
* **Windows:** `%%USERPROFILE%%\.minio\certs` * **Windows:** `%%USERPROFILE%%\.minio\certs`
> NOTE: Location of custom certs directory can be specified using `--certs-dir` command line option. **Note:**
* Location of custom certs directory can be specified using `--certs-dir` command line option.
**Note:**
* Inside the `certs` directory, the private key must by named `private.key` and the public key must be named `public.crt`. * Inside the `certs` directory, the private key must by named `private.key` and the public key must be named `public.crt`.
* A certificate signed by a CA contains information about the issued identity (e.g. name, expiry, public key) and any intermediate certificates. The root CA is not included. * A certificate signed by a CA contains information about the issued identity (e.g. name, expiry, public key) and any intermediate certificates. The root CA is not included.
@ -29,10 +28,10 @@ Copy the existing private key and public certificate to the `certs` directory. T
This section describes how to generate a self-signed certificate using various tools: This section describes how to generate a self-signed certificate using various tools:
3.1 [Use generate_cert.go to Generate a Certificate](#using-go) * 3.1 [Use generate_cert.go to Generate a Certificate](#using-go)
3.2 [Use OpenSSL to Generate a Certificate](#using-open-ssl) * 3.2 [Use OpenSSL to Generate a Certificate](#using-open-ssl)
3.3 [Use OpenSSL (with IP address) to Generate a Certificate](#using-open-ssl-with-ip) * 3.3 [Use OpenSSL (with IP address) to Generate a Certificate](#using-open-ssl-with-ip)
3.4 [Use GnuTLS (for Windows) to Generate a Certificate](#using-gnu-tls) * 3.4 [Use GnuTLS (for Windows) to Generate a Certificate](#using-gnu-tls)
**Note:** **Note:**
* MinIO only supports keys and certificates in PEM format on Linux and Windows. * MinIO only supports keys and certificates in PEM format on Linux and Windows.
@ -65,7 +64,6 @@ Use one of the following methods to generate a certificate using `openssl`:
* 3.2.2 [Generate a private key with RSA](#generate-private-key-with-rsa) * 3.2.2 [Generate a private key with RSA](#generate-private-key-with-rsa)
* 3.2.3 [Generate a self-signed certificate](#generate-a-self-signed-certificate) * 3.2.3 [Generate a self-signed certificate](#generate-a-self-signed-certificate)
#### 3.2.1 <a name="generate-private-key-with-ecdsa"></a>Generate a private key with ECDSA. #### 3.2.1 <a name="generate-private-key-with-ecdsa"></a>Generate a private key with ECDSA.
Use the following command to generate a private key with ECDSA: Use the following command to generate a private key with ECDSA: