Install demo setup of gpf in Google Cloud Kubernetes cluster

Setup gcloud cli access to Google Cloud

This guide makes use of gcloud command line interface to Google Cloud. If you already have installed and initialized the gcloud command line interface you can skip the following two steps.

Install gcloud cli

Follow the official how-to on installing gcloud cli here.

Initializing gcloud cli

Follow the official how-to on initializing gcloud cli here.

Create a Google Cloud project

To create a Google Cloud Kubernetes cluster there needs to be a Google Cloud project that the cluster will be associated with. If you already have a Google Cloud project, you can use its name for PROJECT and skip the creation of a new one.

$ PROJECT="<project-name>" # for example "gpf-deployment"
$ gcloud projects create "$PROJECT"

Create a kubernetes cluster

This guide makes use of an autopilot kubernetes cluster. If you already have an existing autopilot cluster, you can use it instead of creating a new one.

$ CLUSTER_NAME="<cluster-name>" # for example "autopilot-cluster-11"
$ COMPUTE_REGION="<compute-region>" # for example "europe-west3"
$ gcloud beta container \
    --project "$PROJECT" \
    clusters create-auto "$CLUSTER_NAME" \
    --region "$COMPUTE_REGION" \
    --release-channel "regular" \
    --network "projects/$PROJECT/global/networks/default" \
    --subnetwork "projects/$PROJECT/regions/$COMPUTE_REGION/subnetworks/default" \
    --cluster-ipv4-cidr "/17" \
    --binauthz-evaluation-mode=DISABLED

Install kubectl and configure cluster access

To deploy the gpf workload on the cluster you need to configure access to it via kubectl - follow the official how-to here.

Provision and setup GPF

Provision GPF on the cluster

To provision gpf with all the required kubernetes objects the following block of code must be copy and pasted as a single command without the leading dollar sign ($).

The provisioning may take up to 10-20 minutes depending on current cluster state.

$ kubectl apply -f - <<'EOT'
---
apiVersion: v1
kind: Namespace
metadata:
  name: gpf
  namespace: gpf

---
apiVersion: v1
kind: Service
metadata:
  labels:
    org.seqpipe.service: gpf
  name: gpf
  namespace: gpf
spec:
  ports:
    - name: "80"
      port: 80
      targetPort: 80
  selector:
    org.seqpipe.service: gpf
  type: LoadBalancer

---
apiVersion: v1
kind: Service
metadata:
  labels:
    org.seqpipe.service: mysql
  name: mysql
  namespace: gpf
spec:
  ports:
    - name: "3306"
      port: 3306
      targetPort: 3306
  selector:
    org.seqpipe.service: mysql

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    org.seqpipe.service: pvc-gpf-data
  name: pvc-gpf-data
  namespace: gpf
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard-rwo
  resources:
    requests:
      storage: 10Gi

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    org.seqpipe.service: mysql
  name: mysql
  namespace: gpf
spec:
  replicas: 1
  selector:
    matchLabels:
      org.seqpipe.service: mysql
  template:
    metadata:
      labels:
        org.seqpipe.service: mysql
    spec:
      containers:
        - args:
            - mysqld
            - --character-set-server=utf8
            - --collation-server=utf8_bin
            - --default-authentication-plugin=mysql_native_password
          env:
            - name: MYSQL_DATABASE
              value: gpf
            - name: MYSQL_PASSWORD
              value: secret
            - name: MYSQL_ROOT_PASSWORD
              value: secret
            - name: MYSQL_USER
              value: seqpipe
          image: mysql:8
          name: mysql
          ports:
            - containerPort: 3306
              protocol: TCP
      hostname: mysql
      restartPolicy: Always

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    org.seqpipe.service: gpf
  name: gpf
  namespace: gpf
spec:
  replicas: 1
  selector:
    matchLabels:
      org.seqpipe.service: gpf
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        org.seqpipe.service: gpf
    spec:
      initContainers:
        - env:
            - name: DAE_DB_DIR
              value: /data
            - name: DAE_PHENODB_DIR
              value: /data-phenodb
            - name: GRR_DEFINITION_FILE
              value: /cache/grr_definition.yaml
            - name: WDAE_LOG_DIR
              value: /logs
            - name: GPF_PREFIX
              value: gpf
            - name: WDAE_ALLOWED_HOST
              value: '*'
            - name: WDAE_DEBUG
              value: "True"
            - name: WDAE_PREFIX
              value: gpf
            - name: WDAE_PUBLIC_HOSTNAME
              value: gpf
            - name: WDAE_SECRET_KEY
              value: '"123456789012345678901234567890123456789012345678901234567890"'
            - name: WDAE_DB_HOST
              value: mysql
            - name: WDAE_DB_NAME
              value: gpf
            - name: WDAE_DB_PASSWORD
              value: secret
            - name: WDAE_DB_PORT
              value: "3306"
            - name: WDAE_DB_USER
              value: seqpipe
          name: gpf-init
          image: iossifovlab/iossifovlab-gpf-full:2024.3.2
          command:
            - "/bin/bash"
          args:
            - "-c"
            - "while ! [ -e /data/DONE ]; do sleep 1; done"
          volumeMounts:
            - mountPath: /data
              name: pvc-gpf-data
            - mountPath: /data-phenodb
              name: pvc-gpf-data
            - mountPath: /cache
              name: pvc-gpf-data

      containers:
        - env:
            - name: DAE_DB_DIR
              value: /data
            - name: DAE_PHENODB_DIR
              value: /data-phenodb
            - name: GRR_DEFINITION_FILE
              value: /cache/grr_definition.yaml
            - name: WDAE_LOG_DIR
              value: /logs
            - name: GPF_PREFIX
              value: gpf
            - name: WDAE_ALLOWED_HOST
              value: '*'
            - name: WDAE_DEBUG
              value: "True"
            - name: WDAE_PREFIX
              value: gpf
            - name: WDAE_PUBLIC_HOSTNAME
              value: gpf
            - name: WDAE_SECRET_KEY
              value: '"123456789012345678901234567890123456789012345678901234567890"'
            - name: WDAE_DB_HOST
              value: mysql
            - name: WDAE_DB_NAME
              value: gpf
            - name: WDAE_DB_PASSWORD
              value: secret
            - name: WDAE_DB_PORT
              value: "3306"
            - name: WDAE_DB_USER
              value: seqpipe
          image: iossifovlab/iossifovlab-gpf-full:2024.3.2
          name: gpf
          resources:
            requests:
              memory: "16Gi"
            limits:
              memory: "20Gi"
          ports:
            - containerPort: 80
              protocol: TCP
          volumeMounts:
            - mountPath: /data
              name: pvc-gpf-data
            - mountPath: /data-phenodb
              name: pvc-gpf-data
            - mountPath: /cache
              name: pvc-gpf-data
      hostname: gpf
      restartPolicy: Always
      volumes:
        - name: pvc-gpf-data
          persistentVolumeClaim:
            claimName: pvc-gpf-data
EOT

Note the gpf service External IP

To access the GPF web interface take a note of the External IP of the gpf service via the following command. The External IP will be used in a later step as well.

$ kubectl get services --namespace=gpf gpf

If executing the command gives an error of no such object or the value in the External IP column is <pending> the service is not yet fully provisioned - the provisioning can take up to 20 minutes.

Perform initial gpf configuration in the gpf-init container

To configure gpf there are steps that need to be executed in the shell of the provisioned gpf-init initialization container. To enter the shell use the following command:

$ kubectl exec --stdin --tty --namespace=gpf --container=gpf-init deployment.apps/gpf -- /bin/bash

If executing the command gives an error of no such object or container the initialization container is not yet fully provisioned - the provisioning can take up to 20 minutes.

Fetch public gpf De Novo data

For demo purposes a public gpf dataset is used - De Novo. Run the following commands to fetch the data in the GPF data directory:

$ cd /data
$ apt-get update
$ apt-get install -y python3-pip git
$ pip install dvc-ssh
$ rmdir "lost+found"
$ git clone https://github.com/iossifovlab/data-hg38-public.git .
$ ssh-keygen -f ~/.ssh/id_rsa -N ''
$ ssh-copy-id -p 2020 seqpipe@nemo.seqpipe.org
$ dvc pull -r nemo

Configure genomic resources caching

To configure genomic resources caching the following block of code must be copy and pasted as a single command without the leading dollar sign ($).

$ cat > /cache/grr_definition.yaml <<'EOT'
type: group
children:
- id: "seqpipe"
  type: "url"
  url: "https://grr.seqpipe.org"
  cache_dir: "/cache"

- id: "default"
  type: "url"
  url: "https://www.iossifovlab.com/distribution/public/genomic-resources-repository"
EOT

Initialize MySQL database

To initialize the mysql database use the following command:

$ wdaemanage.py migrate

Configure gpf admin user

To configure the gpf administrator user use the following command:

$ wdaemanage.py user_create admin@iossifovlab.com -p secret -g any_dataset:any_user:admin

Configure the GPF frontend application

To configure the GPF frontend application you must use the previously noted IP address as the ENDPOINT_IP.

$ ENDPOINT_IP="<the-noted-public-endpoint-ip>" # for example "35.53.23.11"
$ wdaemanage.py createapplication --user 1 \
    --redirect-uris "http://$ENDPOINT_IP/gpf/accounts/login" \
    --name "GPF Genotypes and Phenotypes in Families" --client-id gpfjs \
    public authorization-code --skip-authorization

Mark initialization as done

Use the following command to signal that the initialization is complete and the GPF can be started. After executing the command the remote shell session to the initialization container will be terminated.

$ touch /data/DONE

Go to the deployed gpf instance

Open a browser and go to http://<the-noted-public-endpoint-ip>/gpf/. It may take a few minutes after the initialization is makred as done for the GPF endpoint to appear.