Setup GPF development environment and production data¶
Install conda¶
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
Install Node.js via NVM¶
Install NVM:
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
Afterwards, restart your terminal and install the latest version of Node.js:
nvm install node
Clone repositories¶
Clone the gpf and gpfjs repositories:
git clone git@github.com:iossifovlab/gpf.git
git clone git@github.com:iossifovlab/gpfjs.git
Install the dependencies for gpfjs:
cd gpfjs
npm install
Create GPF conda environment¶
From inside of the gpf directory run:
mamba env create --name gpf --file ./environment.yml
mamba env update --name gpf --file ./dev-environment.yml
conda activate gpf
for d in dae wdae; do (cd $d; pip install -e .); done
Configure your default genomics resources repository¶
When working, GPF needs various genomic resources like reference genome, gene models, gene properties, etc.
By default GPF will fetch these resources from the default genomic resources repository without caching them. This can be slow.
For development it is recommended to use a caching repository. To this end
create a file named .grr_definition.yaml in your home directory
with the following content:
type: group
children:
- id: "seqpipe"
type: "url"
url: "https://grr.seqpipe.org"
cache_dir: "<path to cache dir>"
- id: "default"
type: "url"
url: "https://www.iossifovlab.com/distribution/public/genomic-resources-repository"
cache_dir: "<path to cache dir>"
Replace <path to cache dir> with a directory on your local filesystem
that is suitable for caching the large genomic resource data.
Install dvc¶
You will need a separate virtual environment with dvc and dvc-ssh installed.
mamba create -n dvc -c conda-forge dvc dvc-ssh
Setup production data¶
Note
To use the data-hg38-production instance configuration you are will need
access to Seqpipe’s internal network (either working on an office computer or using a VPN).
Clone the production data git repositories:
git clone git@github.com:iossifovlab/data-hg38-production.git git clone git@github.com:seqpipe/data-phenodb-production.git
Pull data via dvc:
conda activate dvc cd data-hg38-production dvc pull -r nemo cd ../data-phenodb-production dvc pull -r nemo
Extract the phenotype data:
cd data-phenodb-production ./extract_phenodbs.sh
Setup a
setenv.shscript with the following contents:conda activate gpf export DAE_DB_DIR=<path to data-hg38-production> export DAE_PHENODB_DIR=<path to data-phenodb-production> export GPF_PREFIX=gpfjs
You can place this script wherever you want. Afterwards, source it:
source setenv.sh
Navigate to
data-hg38-productionand run the following script to adjust your instance configuration:./scripts/adjust_seqpipe_minimal.sh
There are other adjustment scripts available inside
scripts, which will configure different subsets of data.
Running GPF instance¶
Run
gpfjscd gpfjs ng serve
In a new terminal, run
gpfSource your environment file:
source setenv.sh
Run the following script to initialize everything needed to run the wdae server. This script only needs to be ran once:
cd gpf/wdae/wdae reset_dev.sh
Finally, run the server:
./wdaemanage.py runserver
Running dae unit tests in parallel¶
cd gpf/dae
unset DAE_DB_DIR
export PYTHONHASHSEED=0
pytest -n 10 dae tests