Welcome to gwas-norm#

The gwas-norm repostory and package aims to make is as simple as possible to standardise the format of GWAS summary statistics files so they can be used in downstream applications in a uniform way.

There is a massive amount of GWAS summary level data in the public domain. This covers many different phenotyes from molecular trails such as eQTLs and pQTLs to disease traits. Unfortunately, there has never really been an agreed standard for data sharing of GWAS summary level data and the available datasets exhibit substantial heterogeneity and are of varying quality, with some being directly usable and others bordering on deliberate obfuscation of the data within them.

gwas-norm provides an interface where users can define the attributes of the dataset they want to normalise, and use that definition to perform the normalisation in a scalable way. With gwas-norm you can normalise a single dataset in an interactive shell, or you can scale it up to many thousands of datasets (for example an eQTL/pQTL study) and run in parallel with hpc-gwas-norm. It will handle some of the most common scenarios and produce a flat file with:

  1. A standardised column order

  2. Uniform genome assembly

  3. Variant IDs

  4. Basic functional annotations

  5. Independent variant assessment (not implemented yet).

What is gwas-norm#

When you install gwas-norm, you get access to both programs that you can run from the unix command-line and a python application programming interface (API) that you can use to interact with generalisable components of the package, or use to integrate gwas summary statistic normalisation into your own pipelines.

Whist it is anticipated that most users will want to use only the command line scripts, gwas-norm and hpc-gwas-norm (the code formatting of gwas-norm distinguishes the program from the package of the same name), however, both the command line endpoints and the API are documented.

Next steps…#

To get gwas-norm up and running you will want to:

  1. Install the package

  2. Setup your configuration file

  3. Build a mapping file

Contents#

Programmer reference

Indices and tables#