Install demo setup of gpf in Google Cloud Kubernetes cluster¶
Setup gcloud cli access to Google Cloud¶
This guide makes use of gcloud command line interface to Google Cloud. If you already have installed and initialized the gcloud command line interface you can skip the following two steps.
Install gcloud cli¶
Follow the official how-to on installing gcloud cli here.
Initializing gcloud cli¶
Follow the official how-to on initializing gcloud cli here.
Create a Google Cloud project¶
To create a Google Cloud Kubernetes cluster there needs to be a Google
Cloud project that the cluster will be associated with. If you already
have a Google Cloud project, you can use its name for PROJECT and
skip the creation of a new one.
$ PROJECT="<project-name>" # for example "gpf-deployment"
$ gcloud projects create "$PROJECT"
Create a kubernetes cluster¶
This guide makes use of an autopilot kubernetes cluster. If you already have an existing autopilot cluster, you can use it instead of creating a new one.
$ CLUSTER_NAME="<cluster-name>" # for example "autopilot-cluster-11"
$ COMPUTE_REGION="<compute-region>" # for example "europe-west3"
$ gcloud beta container \
--project "$PROJECT" \
clusters create-auto "$CLUSTER_NAME" \
--region "$COMPUTE_REGION" \
--release-channel "regular" \
--network "projects/$PROJECT/global/networks/default" \
--subnetwork "projects/$PROJECT/regions/$COMPUTE_REGION/subnetworks/default" \
--cluster-ipv4-cidr "/17" \
--binauthz-evaluation-mode=DISABLED
Install kubectl and configure cluster access¶
To deploy the gpf workload on the cluster you need to configure access to it via kubectl - follow the official how-to here.
Provision and setup GPF¶
Provision GPF on the cluster¶
To provision gpf with all the required kubernetes objects the following
block of code must be copy and pasted as a single command without the
leading dollar sign ($).
The provisioning may take up to 10-20 minutes depending on current cluster state.
$ kubectl apply -f - <<'EOT'
---
apiVersion: v1
kind: Namespace
metadata:
name: gpf
namespace: gpf
---
apiVersion: v1
kind: Service
metadata:
labels:
org.seqpipe.service: gpf
name: gpf
namespace: gpf
spec:
ports:
- name: "80"
port: 80
targetPort: 80
selector:
org.seqpipe.service: gpf
type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
labels:
org.seqpipe.service: mysql
name: mysql
namespace: gpf
spec:
ports:
- name: "3306"
port: 3306
targetPort: 3306
selector:
org.seqpipe.service: mysql
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
org.seqpipe.service: pvc-gpf-data
name: pvc-gpf-data
namespace: gpf
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard-rwo
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
org.seqpipe.service: mysql
name: mysql
namespace: gpf
spec:
replicas: 1
selector:
matchLabels:
org.seqpipe.service: mysql
template:
metadata:
labels:
org.seqpipe.service: mysql
spec:
containers:
- args:
- mysqld
- --character-set-server=utf8
- --collation-server=utf8_bin
- --default-authentication-plugin=mysql_native_password
env:
- name: MYSQL_DATABASE
value: gpf
- name: MYSQL_PASSWORD
value: secret
- name: MYSQL_ROOT_PASSWORD
value: secret
- name: MYSQL_USER
value: seqpipe
image: mysql:8
name: mysql
ports:
- containerPort: 3306
protocol: TCP
hostname: mysql
restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
org.seqpipe.service: gpf
name: gpf
namespace: gpf
spec:
replicas: 1
selector:
matchLabels:
org.seqpipe.service: gpf
strategy:
type: Recreate
template:
metadata:
labels:
org.seqpipe.service: gpf
spec:
initContainers:
- env:
- name: DAE_DB_DIR
value: /data
- name: DAE_PHENODB_DIR
value: /data-phenodb
- name: GRR_DEFINITION_FILE
value: /cache/grr_definition.yaml
- name: WDAE_LOG_DIR
value: /logs
- name: GPF_PREFIX
value: gpf
- name: WDAE_ALLOWED_HOST
value: '*'
- name: WDAE_DEBUG
value: "True"
- name: WDAE_PREFIX
value: gpf
- name: WDAE_PUBLIC_HOSTNAME
value: gpf
- name: WDAE_SECRET_KEY
value: '"123456789012345678901234567890123456789012345678901234567890"'
- name: WDAE_DB_HOST
value: mysql
- name: WDAE_DB_NAME
value: gpf
- name: WDAE_DB_PASSWORD
value: secret
- name: WDAE_DB_PORT
value: "3306"
- name: WDAE_DB_USER
value: seqpipe
name: gpf-init
image: iossifovlab/iossifovlab-gpf-full:2024.3.2
command:
- "/bin/bash"
args:
- "-c"
- "while ! [ -e /data/DONE ]; do sleep 1; done"
volumeMounts:
- mountPath: /data
name: pvc-gpf-data
- mountPath: /data-phenodb
name: pvc-gpf-data
- mountPath: /cache
name: pvc-gpf-data
containers:
- env:
- name: DAE_DB_DIR
value: /data
- name: DAE_PHENODB_DIR
value: /data-phenodb
- name: GRR_DEFINITION_FILE
value: /cache/grr_definition.yaml
- name: WDAE_LOG_DIR
value: /logs
- name: GPF_PREFIX
value: gpf
- name: WDAE_ALLOWED_HOST
value: '*'
- name: WDAE_DEBUG
value: "True"
- name: WDAE_PREFIX
value: gpf
- name: WDAE_PUBLIC_HOSTNAME
value: gpf
- name: WDAE_SECRET_KEY
value: '"123456789012345678901234567890123456789012345678901234567890"'
- name: WDAE_DB_HOST
value: mysql
- name: WDAE_DB_NAME
value: gpf
- name: WDAE_DB_PASSWORD
value: secret
- name: WDAE_DB_PORT
value: "3306"
- name: WDAE_DB_USER
value: seqpipe
image: iossifovlab/iossifovlab-gpf-full:2024.3.2
name: gpf
resources:
requests:
memory: "16Gi"
limits:
memory: "20Gi"
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- mountPath: /data
name: pvc-gpf-data
- mountPath: /data-phenodb
name: pvc-gpf-data
- mountPath: /cache
name: pvc-gpf-data
hostname: gpf
restartPolicy: Always
volumes:
- name: pvc-gpf-data
persistentVolumeClaim:
claimName: pvc-gpf-data
EOT
Note the gpf service External IP¶
To access the GPF web interface take a note of the External IP of the gpf service via the following command. The External IP will be used in a later step as well.
$ kubectl get services --namespace=gpf gpf
If executing the command gives an error of no such object or the value
in the External IP column is <pending> the service is not yet fully
provisioned - the provisioning can take up to 20 minutes.
Perform initial gpf configuration in the gpf-init container¶
To configure gpf there are steps that need to be executed in the shell
of the provisioned gpf-init initialization container. To enter the
shell use the following command:
$ kubectl exec --stdin --tty --namespace=gpf --container=gpf-init deployment.apps/gpf -- /bin/bash
If executing the command gives an error of no such object or container the initialization container is not yet fully provisioned - the provisioning can take up to 20 minutes.
Fetch public gpf De Novo data¶
For demo purposes a public gpf dataset is used - De Novo. Run the following commands to fetch the data in the GPF data directory:
$ cd /data
$ apt-get update
$ apt-get install -y python3-pip git
$ pip install dvc-ssh
$ rmdir "lost+found"
$ git clone https://github.com/iossifovlab/data-hg38-public.git .
$ ssh-keygen -f ~/.ssh/id_rsa -N ''
$ ssh-copy-id -p 2020 seqpipe@nemo.seqpipe.org
$ dvc pull -r nemo
Configure genomic resources caching¶
To configure genomic resources caching the following block of code must
be copy and pasted as a single command without the leading dollar sign
($).
$ cat > /cache/grr_definition.yaml <<'EOT'
type: group
children:
- id: "seqpipe"
type: "url"
url: "https://grr.seqpipe.org"
cache_dir: "/cache"
- id: "default"
type: "url"
url: "https://www.iossifovlab.com/distribution/public/genomic-resources-repository"
EOT
Initialize MySQL database¶
To initialize the mysql database use the following command:
$ wdaemanage.py migrate
Configure gpf admin user¶
To configure the gpf administrator user use the following command:
$ wdaemanage.py user_create admin@iossifovlab.com -p secret -g any_dataset:any_user:admin
Configure the GPF frontend application¶
To configure the GPF frontend application you must use the previously
noted IP address as the ENDPOINT_IP.
$ ENDPOINT_IP="<the-noted-public-endpoint-ip>" # for example "35.53.23.11"
$ wdaemanage.py createapplication --user 1 \
--redirect-uris "http://$ENDPOINT_IP/gpf/accounts/login" \
--name "GPF Genotypes and Phenotypes in Families" --client-id gpfjs \
public authorization-code --skip-authorization
Mark initialization as done¶
Use the following command to signal that the initialization is complete and the GPF can be started. After executing the command the remote shell session to the initialization container will be terminated.
$ touch /data/DONE
Go to the deployed gpf instance¶
Open a browser and go to http://<the-noted-public-endpoint-ip>/gpf/.
It may take a few minutes after the initialization is makred as done for
the GPF endpoint to appear.