`gwas_norm.metadata` sub-package#

`gwas_norm.metadata.gwas_data`#

The root of the XML metadata.

class gwas_norm.metadata.gwas_data.GwasData(studies=None, metadata_file=None, file_check=True, root_source_dir=None, root_norm_dir=None)#

Bases: _XmlBase

The root class for describing GWAS metadata.

Parameters:

studies (list of gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile optional, default: NoneType) – Any existing study objects that need to be added to the GwasData object during initialisation.
metadata_file (str or File, optional, default: NoneType) – The path to an existing metadata file or a previously opened file that will be added to the GwasData object.
file_check (bool, optional, default: True) – Perform checks on the presence of input files and calculate the MD5 of any input files. Please note if the MD5 is not added to the any files then this will give an error.
root_source_dir (str, optional, default: NoneType) – The root directory where all source study data is located. If not set then the environment variable GWAS_SOURCE_DATA_ROOT is used. However, this is only used/required if the study_source_dir is a relative path.
root_norm_dir (list or str, optional, default: NoneType) – The root directory where all normalised study data is located. If not set then the environment variable GWAS_DEST_DATA_ROOT is used. However, this is only used/required if the study_norm_dir is a relative path.

Notes

Has functionality for adding study elements, reading and writing metadata description files (XML format). When initialising, both studies and an existing metadata file can be given and all will be added to the initialised gwas_norm.metadata.gwas_data.GwasData object.

ROOT_TAG = 'gwas_data'#: The name of the root XML element tag name (str)

property n_studies#: Return the number of studies (a synonym for len()) (int)

get_study_by_name(name)#

Return a study with the name.

Parameters:: name (str) – The study name to get.
Returns:: study – The study matching the name.
Return type:: gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile
Raises:: KeyError – If a study with that name does not exist.

get_analysis_by_name(name)#

Return an analysis with the name.

Parameters:: name (str) – The analysis name to get.
Returns:: analysis – The analyses matching the name.
Return type:: list of gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.KeyAnalysis
Raises:: KeyError – If an analysis with that name does not exist.

property n_analyses#: Return the total number of analyses (int)

property root_source_dir#

Get the source root directory (str or NoneType).

Notes

This is a root directory where all the study directories to be normalised should be located. Must be an absolute directory. If it is not defined then the environment variable GWAS_SOURCE_DATA_ROOT is returned (if defined).

property root_norm_dir#

Get the normalised root directory (str or NoneType).

Notes

This is a root directory where all the study directories that have been normalised should be located. Must be an absolute directory. If it is not defined then the environment variable GWAS_DEST_DATA_ROOT is returned (if defined).

property file_check#: Get the file checking status (bool).

property studies#: Return a list of all the study objects in the GwasData object. (list of gwas_norm.metadata.study.Study)

add_study(study, error=False)#

Add a study object to the GwasData object.

Parameters:

study (gwas_norm.metadata.study.Study) – A study to add, must be an instance of subclass of gwas_norm.metadata.study.Study
error (bool, optional, default: False) – Raise a KeyError if an existing GwasData study has the same name as the one being added. If set to False then the addition will silently fail

Raises:

KeyError – If the study being added has the same name as an existing study in the GwasData object.

Notes

This instigates a binding process where the GwasData object is also bound as a parent of the study object.

remove_study(study)#

Remove a study from the GwasData object.

Parameters:: study (gwas_norm.metadata.study.Study) – The study to remove, note that this does not have to be the exact same object. The removal is done based on the study name.
Returns:: study – The actual gwas_norm.metadata.study.Study object that has been removed from the GwasData object. If no matching Study object was found then NoneType is returned.
Return type:: gwas_norm.metadata.study.Study or NoneType

read(infile)#

Read an input metadata file.

Parameters:

infile (str) – An input metadata file name.

Raises:

KeyError – If a study of the same name already exists in the GwasData object or if no study elements were found in the file.
ValueError – If the file extension is unknown (i.e. not .xml)

write(outfile)#

Write all studies to an output XML file.

Parameters:: outfile (str) – The output file path to write to, currently only XML is supported and the output file must have a .xml file extension.
Raises:: IndexError – If no Study objects were found in the GwasData object

to_xml()#

Generate an lxml.etree.Element object for all the attributes within the gwas data object.

Returns:: gwas_data_element – An element representing the gwas_data.
Return type:: lxml.etree.Element

classmethod from_xml(element, **kwargs)#

load data from an already available XML element.

Parameters:: element (lxml.etree.Element) – The element should have the tag name gwas_data.
Returns:: gwas_data – A GwasData object built from the element.
Return type:: gwas_norm.metadata.gwas_data.GwasData
Raises:: KeyError – If the tag name of the element is not gwas_data.

Notes

In general, the user should use the object read method.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: parse_class – A class of type gwas_norm.metadata.gwas_data.GwasData
Return type:: class
Raises:: KeyError – If the appropriate class can’t be found for the tag.

`gwas_norm.metadata.study`#

Implementation of Study classes.

class gwas_norm.metadata.study.Study(*args, file_check=True, **kwargs)#

Bases: _BaseStudy

The study class, for use in studies where the analyses are contained in separate files (gwas_norm.metadata.analysis.AnalysisFile objects).

Parameters:

study_name (str) – The name of the study, this should be unique for each study. This will be converted to all lowercase and have spaces replaced with underscores.
study_source_dir (str) – The root directory of the study that contains the un-normalised source files. If it is a relative path then it is assumed it is relative to the root directory in the parent gwas_norm.metadata.gwas_data.GwasData, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.
study_norm_dir (str) – The root directory of the study that contains the normalised files. If it is a relative path then it is assumed it is relative to the root directory in the parent gwas_norm.metadata.gwas_data.GwasData, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.
source_genome_assembly (str) – The genome assembly of the study (and by extension, analyses within the study).
target_genome_assemblies (#) –
need (# Any target genome assemblies. only required if any liftovers) –
out. (# to be carried) –
pubmed_id (int or NoneType, optional, default: NoneType) – The pubmed identifier. If NoneType then a dummy pubmed ID of 00000000 is used instead.
consortium (str or NoneType, optional, default: NoneType) – Any consortium name for the study.
analyses (list of (gwas_norm.metadata.analysis.Analysis or gwas_norm.metadata.analysis.KeyAnalysis), optional, default: NoneType) – Analysis objects that are associated with the study.
url (str, optional, default: NoneType) – The info url to be associated with the study.
metafiles (list or str, optional, default: NoneType) – The metafile to be associated with the study.

Raises:

ValueError – If any of the study_name, source_root or source_genome_assembly are NoneType or ''.

CONSORTIUM_TAG = 'consortium'#: The XML tag for the consortium data (str).

METAFILE_TAG = 'metafile'#: The XML tag for the metafile data (str).

PRIVATE_ATTRIBUTE = 'private'#: The XML attribute for the private status (str).

PUBMED_ID_TAG = 'pubmed_id'#: The XML tag for the pubmed ID data (str).

SOURCE_GENOME_ASSEMLY_TAG = 'source_genome_assembly'#: The XML tag for the source genome assembly data (str).

STUDY_ID_TAG = 'study_id'#: The XML tag name for the study ID (str).

STUDY_NAME_TAG = 'study_name'#: The XML tag name for the study name (str).

STUDY_NORM_DIR_TAG = 'study_norm_dir'#: The XML tag for the study normalised directory (str).

STUDY_SOURCE_DIR_TAG = 'study_source_dir'#: The XML tag name for the study source directory (str).

URL_TAG = 'url'#: The XML tag for the study webpage (str).

add_metafile(metafile)#

Add a metafile to the study.

Parameters:: metafile (str) – The metafile path. It can be an absolute or a relative path. In the later case, it is converted to an absolute path.
Raises:: ValueError – If a metadata file with the same path already exists.

property analyses#

Get all the analyses within the study list of (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis)

Raises:: AttributeError – If no analyses have been defined.

bind(parent)#

Bind a study with a parent gwas_norm.metadata.gwas_data.GwasData object.

Parameters:: parent (gwas_norm.metadata.gwas_data_obj.GwasData) – A parental GwasData object.
Raises:: KeyError – If the study is already bound to a different GwasData object.

Notes

The binding instigates a reciprocal adding of the study object to the GwasData studies, so binding is in both directions. A study can only be bound to a single GwasData parent.

check_old_analysis_ids()#

Perform a check of old analysis IDs and warn of there are duplicated IDs.

Parameters:: s (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) –

property chksum#: Get the BSD checksum based on the study_name (str).

create_analysis_xml(study_e)#

Convert all of the analyses within the study to XML elements.

Parameters:: study_e (lxml.etree.Element) – The study element to add the converted analyses to.
Returns:: study – The same study element that was passed with the analyses added.
Return type:: lxml.etree.Element
Raises:: ValueError – If there are no analyses within the study.

get_analysis_by_name(name)#

Return an analysis with the name.

Parameters:: name (str) – The analysis name to get.
Returns:: analysis – The analysis matching the name.
Return type:: gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
Raises:: KeyError – If an analysis with that name does not exist.

classmethod get_class(element)#

Helper function that will determine the required study class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A XML element, it is expected to have the tag name study or study_file.
Returns:: study_class – The relevant study class for the element.
Return type:: class of (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)
Raises:: KeyError – If the element does not have the tag name study or study_file.

property info#: Get the info object stored in the study (gwas_norm.metadata.info.Info).

invalidate()#: Invalidate the study.

invalidate_analyses()#: Invalidate all the analyses in the study. This will also invalidate the study.

property is_validated#: Determine if the study has been validated (bool).

property metafile_abspath#: Get the absolute paths of any metafiles associated with the study (list of str).

property metafiles#: Get any metafile paths associated with the study (list of str).

property name#: Get the study name. This is an alias for study_name - that is designed to be a common property across studies and analysis (str).

property parent#

Get study parent (gwas_norm.metadata.gwas_data.GwasData).

Raises:: AttributeError – If no parent gwas data object has been defined.

property private#: Get the is private status of the study (bool)

property pubmed_id#

Get the pubmed ID (str).

Notes

The pubmed_id, is treated as a string but should be castable to an int. If the pubmed_id is not known a dummy pubmed_id of 00000000 is used.

refresh_analysis_data()#

The study keeps an internal cache of analyis names/IDs. This will loop through all the analyses within the study and refresh these.

Notes

Typically, this will be called by an analysis when it has it’s name /ID changed.

remove_analysis(analysis_obj)#

Remove an analysis from the study.

Parameters:: analyses (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis) – The analyses being removed from the study.
Returns:: removed_analysis – The analysis removed or NoneType if the analysis does not exist in the study.
Return type:: gwas_norm.metadata.analysis_obj.AnalysisFile or gwas_norm.metadata.analysis_obj.KeyAnalysis or NoneType

repr_attr_str()#

Generate an array of strings that can be used in to print an objects contents.

Returns:: attrs – key/value strings representing the contents of the objects.
Return type:: list of str

property source_genome_assembly#: Get the source genome assembly (str).

property study_id#: Get the ID for the study, if not set then an ID will be generated (int)

property study_name#: Get the study name. Study names are made lowercase and have the spaces replaced with _ (underscores) (str).

property study_norm_absolute_dir#

Return the absolute directory path for the normalised study. Irrespective of if it has been set via a relative path (str).

Raises:: FileNotFoundError – If the study_norm_dir is a relative path and no root path is available from the parent.

property study_norm_dir#: Get the study normalised directory path (str).

property study_source_absolute_dir#

Return the absolute directory path for the study source directory. Irrespective of if it has been set via a relative path (str).

Raises:: FileNotFoundError – If the study_source_dir is a relative path and no root path is available in the parent.

property study_source_dir#: Get the study source root directory (str).

unbind()#

Bind a study with a parent GwasData object. This also removes the study from the parent.

Returns:: parent – The parent object that has been unbound.
Return type:: gwas_norm.metadata.gwas_data_obj.GwasData

ROOT_TAG = 'study'#: The name of the root XML element tag name (str)

n_file_holders()#: Get the number of file holder objects contained in the study (int)

property file_check#: Get the file checking status (bool).

validate()#: Validate the study. This respects the file checking parameter.

add_analysis(analysis_obj)#

Add an analyses to the study. This causes a reciprocal bind on the analysis.

Parameters:: analyses_obj (gwas_norm.metadata.analysis_obj.KeyAnalysis) – The analyses being added to the study.
Raises:: KeyError – If the analysis already exists in the study.

to_xml()#

Convert the study and all of it’s attributes to an XML element.

Returns:: study – A study element built from the study object and it’s attributes.
Return type:: lxml.etree.Element

classmethod from_xml(element, **kwargs)#

Generate a gwas_norm.metadata.study.Study object from an lxml.etree.Element with the tag name study.

Parameters:: element (lxml.etree.Element) – The element should have the tag name study.
Returns:: study – A study object built from all the tags in the study element.
Return type:: gwas_norm.metadata.study_obj.Study
Raises:: KeyError – If the name of the element is not expected. Also, if the source_genome_assembly attribute is not defined. Or if there are no analysis elements associated with the study.

class gwas_norm.metadata.study.StudyFile(study_name, study_source_dir, source_genome_assembly, analysis_type, effect_type, units=None, files=None, cohort=None, file_check=True, **kwargs)#

Bases: _BaseStudy, FileHolderMixin

A representation of a study object where all of the analyses are in a single file (gwas_norm.metadata.analysis.KeyAnalysis objects).

Parameters:

study_name (str) – The name of the study, this should be unique for each study. This will be converted to all lowercase and have spaces replaced with underscores.
study_source_dir (str) – The root directory of the study that contains the un-normalised source files. If it is a relative path then it is assumed it is relative to the root directory in the parent gwas_norm.metadata.gwas_data.GwasData, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.
study_norm_dir (str) – The root directory of the study that contains the normalised files. If it is a relative path then it is assumed it is relative to the root directory in the parent gwas_norm.metadata.gwas_data.GwasData, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.
source_genome_assembly (str) – The genome assembly of the study (and by extension, analyses within the study).
analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
pubmed_id (int or NoneType, optional, default: NoneType) – The pubmed identifier. If NoneType then a dummy pubmed ID of 00000000 is used instead.
consortium (str or NoneType, optional, default: NoneType) – Any consortium name for the study.
analyses (list of (gwas_norm.metadata.analysis.Analysis or gwas_norm.metadata.analysis.KeyAnalysis), optional, default: NoneType) – Analysis objects that are associated with the study.
url (str, optional, default: NoneType) – The info url to be associated with the study.
metafiles (list or str, optional, default: NoneType) – The metafile to be associated with the study.
units (str, optional, default: NoneType) – The units of all the analysis within the StudyFile.
files (list of gwas_norm.metadata.file.GwasFile, optional, default: NoneType) – Study level files. Study level files are for data such as GTEX data where multiple gene analysis are in the same file.
cohort (gwas_norm.metadata.cohort_obj.Cohort or gwas_norm.metadata.cohort_obj.CaseControlCohort or gwas_norm.metadata.cohort_obj.SampleCohort), optional, default: NoneType) – The cohort that the study was performed in.

Raises:

ValueError – If any of the study_name, source_root or source_genome_assembly are NoneType or ''.

ANALYSIS_TYPE_TAG = 'analysis_type'#: The name of the analysis type tag/element in the XML file (str).

CONSORTIUM_TAG = 'consortium'#: The XML tag for the consortium data (str).

EFFECT_TYPE_TAG = 'effect_type'#: The name of the effect type tag/element in the XML file (str).

FILE_CLASS = None#: The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)

METAFILE_TAG = 'metafile'#: The XML tag for the metafile data (str).

PRIVATE_ATTRIBUTE = 'private'#: The XML attribute for the private status (str).

PUBMED_ID_TAG = 'pubmed_id'#: The XML tag for the pubmed ID data (str).

SOURCE_GENOME_ASSEMLY_TAG = 'source_genome_assembly'#: The XML tag for the source genome assembly data (str).

STUDY_ID_TAG = 'study_id'#: The XML tag name for the study ID (str).

STUDY_NAME_TAG = 'study_name'#: The XML tag name for the study name (str).

STUDY_NORM_DIR_TAG = 'study_norm_dir'#: The XML tag for the study normalised directory (str).

STUDY_SOURCE_DIR_TAG = 'study_source_dir'#: The XML tag name for the study source directory (str).

UNITS_TAG = 'units'#: The name of the units tag/element in the XML file (str).

URL_TAG = 'url'#: The XML tag for the study webpage (str).

add_file(gwas_file, error=False)#

Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.

Parameters:

gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.

Raises:

TypeError – If the gwas_file is not the correct type.
KeyError – If the GWAS file is specified already and error is True.

add_metafile(metafile)#

Add a metafile to the study.

Parameters:: metafile (str) – The metafile path. It can be an absolute or a relative path. In the later case, it is converted to an absolute path.
Raises:: ValueError – If a metadata file with the same path already exists.

property analyses#

Get all the analyses within the study list of (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis)

Raises:: AttributeError – If no analyses have been defined.

property analysis_type#: Get the analysis type (str).

bind(parent)#

Bind a study with a parent gwas_norm.metadata.gwas_data.GwasData object.

Parameters:: parent (gwas_norm.metadata.gwas_data_obj.GwasData) – A parental GwasData object.
Raises:: KeyError – If the study is already bound to a different GwasData object.

Notes

The binding instigates a reciprocal adding of the study object to the GwasData studies, so binding is in both directions. A study can only be bound to a single GwasData parent.

check_old_analysis_ids()#

Perform a check of old analysis IDs and warn of there are duplicated IDs.

Parameters:: s (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) –

property chksum#: Get the BSD checksum based on the study_name (str).

property cohort#

Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).

Notes

The cohort associated with the analysis or NoneType if no cohort has been set.

create_analysis_type_xml(element)#

Generate all the analysis type XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_analysis_xml(study_e)#

Convert all of the analyses within the study to XML elements.

Parameters:: study_e (lxml.etree.Element) – The study element to add the converted analyses to.
Returns:: study – The same study element that was passed with the analyses added.
Return type:: lxml.etree.Element
Raises:: ValueError – If there are no analyses within the study.

create_cohort_xml(element)#

create cohort specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_effect_type_xml(element)#

Generate the XML element for the effect type.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_files_xml(element)#

create file specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_units_xml(element)#

Generate the units XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_xml(element)#

Generate all the XML elements relating to objects that hold files. This wraps all other create_* methods in the mixin.

Parameters:

element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.

property effect_type#: Get the effect type (str or NoneType).

property file_check#: Get the file checking status (bool).

file_repr_attr_str()#

Called by the __repr__ of host objects to supply a key=value string of the attributes and their values relating to the mixin.

Returns:: attr_str – Each string is an attribute and value for printing.
Return type:: list of str

property files#: Get all the associated files (list of gwas_norm.metadata.file.GwasFile).

get_analysis_by_name(name)#

Return an analysis with the name.

Parameters:: name (str) – The analysis name to get.
Returns:: analysis – The analysis matching the name.
Return type:: gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
Raises:: KeyError – If an analysis with that name does not exist.

classmethod get_class(element)#

Helper function that will determine the required study class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A XML element, it is expected to have the tag name study or study_file.
Returns:: study_class – The relevant study class for the element.
Return type:: class of (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)
Raises:: KeyError – If the element does not have the tag name study or study_file.

property info#: Get the info object stored in the study (gwas_norm.metadata.info.Info).

init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#

Initialise all the attributes that a file handling object needs.

Parameters:

analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.

Notes

This is usually called from the __init__ method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.

invalidate()#: Invalidate the study.

invalidate_analyses()#: Invalidate all the analyses in the study. This will also invalidate the study.

property is_validated#: Determine if the study has been validated (bool).

property metafile_abspath#: Get the absolute paths of any metafiles associated with the study (list of str).

property metafiles#: Get any metafile paths associated with the study (list of str).

property n_files#: Get the number of associated files (int).

property name#: Get the study name. This is an alias for study_name - that is designed to be a common property across studies and analysis (str).

property parent#

Get study parent (gwas_norm.metadata.gwas_data.GwasData).

Raises:: AttributeError – If no parent gwas data object has been defined.

classmethod parse_files(element, **kwargs)#

Parse any file elements out from the XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
Returns:: gwas_files – GWAS file objects.
Return type:: list of (gwas_norm.metadata.file.GwasFile)
Raises:: KeyError – If no file elements can be found in the parent element.

classmethod parse_xml(element, **kwargs)#

Parse the file associated data from the XML element.

Parameters:

element (lxml.etree.Element) – The parent XML element to parse the elements from.

Returns:

analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.

Raises:

KeyError – If any of the required elements can be found in the parent element.

property private#: Get the is private status of the study (bool)

property pubmed_id#

Get the pubmed ID (str).

Notes

The pubmed_id, is treated as a string but should be castable to an int. If the pubmed_id is not known a dummy pubmed_id of 00000000 is used.

refresh_analysis_data()#

The study keeps an internal cache of analyis names/IDs. This will loop through all the analyses within the study and refresh these.

Notes

Typically, this will be called by an analysis when it has it’s name /ID changed.

remove_files(gwas_files)#

Remove one or more gwas file from this object.

Parameters:: gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.

repr_attr_str()#

Generate an array of strings that can be used in to print an objects contents.

Returns:: attrs – key/value strings representing the contents of the objects.
Return type:: list of str

property source_genome_assembly#: Get the source genome assembly (str).

property study_id#: Get the ID for the study, if not set then an ID will be generated (int)

property study_name#: Get the study name. Study names are made lowercase and have the spaces replaced with _ (underscores) (str).

property study_norm_absolute_dir#

Return the absolute directory path for the normalised study. Irrespective of if it has been set via a relative path (str).

Raises:: FileNotFoundError – If the study_norm_dir is a relative path and no root path is available from the parent.

property study_norm_dir#: Get the study normalised directory path (str).

property study_source_absolute_dir#

Return the absolute directory path for the study source directory. Irrespective of if it has been set via a relative path (str).

Raises:: FileNotFoundError – If the study_source_dir is a relative path and no root path is available in the parent.

property study_source_dir#: Get the study source root directory (str).

unbind()#

Bind a study with a parent GwasData object. This also removes the study from the parent.

Returns:: parent – The parent object that has been unbound.
Return type:: gwas_norm.metadata.gwas_data_obj.GwasData

property units#: Get the units (str or NoneType)

ROOT_TAG = 'study_file'#: The name of the root XML element tag name (str)

validate()#: Validate the study. This ensures all of the component analyses are validated.

n_file_holders()#: Get the number of file holder objects contained in the study (int)

on_file_added(file_obj, **kwargs)#

Callback for when a file has been added.

Parameters:: file_obj (gwas_norm.metadata.file.GwasFile) – The file object being added.

on_files_removed(file_obj, **kwargs)#

Callback for when a file has been removed at the study level.

This is an expensive operation as all the KeyAnalysis objects are checked for validity against remaining files.

Parameters:: file_objs (list of gwas_norm.metadata.file.GwasFile) – The file objects being removed.
Raises:: KeyError – If the remaining files

add_analysis(analysis_obj)#

Callback for when an analysis has been added.

Parameters:: analysis_obj (gwas_norm.metadata.analysis.KeyAnalysis) – The file object being added.
Returns:: added – An indicator if the analysis was added to the study.
Return type:: bool

remove_analysis(analysis_obj)#

Callback for when an analysis has been added.

Parameters:: analysis_obj (gwas_norm.metadata.analysis.KeyAnalysis) – The file object being added.
Returns:: removed – The analysis that was removed.
Return type:: gwas_norm.metadata.analysis.KeyAnalysis

to_xml()#

Convert the study file and all of it’s attributes to an XML element.

Returns:: study – A study element built from the study object and it’s attributes.
Return type:: lxml.etree.Element
Raises:: IndexError – If the number of files associated with the study is 0.

classmethod from_xml(element, **kwargs)#

Generate a gwas_norm.metadata.study.StudyFile object from an lxml.etree.Element with the tag name study_file.

Parameters:: element (lxml.etree.Element) – The element should have the tag name study.
Returns:: study – A study file object built from all the tags in the study_file element.
Return type:: gwas_norm.metadata.study.StudyFile
Raises:: KeyError – If the name of the element is not expected. Also, if the source_genome_assembly attribute is not defined. Or if there are no analysis elements associated with the study.

`gwas_norm.metadata.analysis`#

Classes representing analyses.

class gwas_norm.metadata.analysis.KeyAnalysis(analysis_name, keys=None, **kwargs)#

Bases: _BaseAnalysis

A representation of a ‘keyed’ analysis.

This is an analysis type that is not associated with any files and has one or more key values that flag the respective rows in a StudyFile.

Parameters:

analysis_name (str) – The name of the analysis. All analysis names are made lowercase and have spaces substituted for underscores.
keys (list of (gwas_norm.metadata.column.Column, str)) – One or more values that will uniquely ID rows belonging to analysis the within the parent study file. The first element of the nested tuple is the column and the second is the column value.
phenotype (gwas_norm.metadata.phenotype.Phenotype, optional, default: NoneType) – The phenotype description associated with the analysis.
caveat (gwas_norm.metadata.phenotype.Caveat, optional, default: NoneType) – The caveat description associated with the analysis.
tests (list of gwas_norm.metadata.test.Test, optional, default: NoneType) – One of more tests that should be applied to the analysis.
info (gwas_norm.metadata.info.Info, optional, default: NoneType) – Columns or definitions that represent the info data for the analysis.

ROOT_TAG = 'key_analysis'#: The name of the root XML element tag name (str)

KEY_TAG = 'key'#: The XML tag name for analysis key values (str).

ANALYSIS_ID_TAG = 'analysis_id'#: The XML tag name for the analysis ID (str)

ANALYSIS_NAME_TAG = 'analysis_name'#: The XML tag name for the analysis name (str)

add_test(test)#

Add a test element to the analysis.

Parameters:: test (gwas_norm.metadata.test.Test) – The test to add.

property analysis_id#: Get the ID for the analysis, if not set then an ID will be generated (int)

property analysis_name#: Get the analysis name (str).

bind(parent)#

Bind the analysis with a parent Study object.

Parameters:: parent (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) – A parental study object.
Raises:: KeyError – If the analysis is already bound to a different study object. Call Analysis.unbind() first.

Notes

The binding instigates a reciprocal adding of the analysis object to the study object, so binding is in both directions. An analysis can only be bound to a single study parent.

property caveat#: Get the caveat definition associated with the analysis (gwas_norm.metadata.phenotype.Caveat or NoneType).

property chksum#

Get the BSD checksum based on the analysis_name.

Returns:: chksum – A 5 character BSD checksum of the analysis name.
Return type:: str

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: parse_class – A class of type gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
Return type:: class
Raises:: KeyError – If the appropriate class can’t be found for the tag.

property has_caveat#: Has the analysis got a caveat associated with it (bool).

property has_phenotype#: Has the analysis got a phenotype associated with it (bool).

has_test(chr_name, start_pos)#

Determine if the analysis has any tests matching chr_name, start_pos.

Parameters:

chr_name (str) – A chr_name.
start_pos (int) – The start position.

Returns:

test_present – True if tests exist for this chromosome/start position, False if not.

Return type:

bool

property info#: Get the analysis info data (gwas_norm.metadata.info.Info or NoneType).

property info_columns#: Get all the input file columns that will contribute towards the analysis info fields (list of gwas_norm..metadata.column.Column)

property info_defs#

Get all the definitions that will contribute towards the analysis info fields (list of gwas_norm.metadata.phenotype.Definition).

Notes

These may be defined within the info field or attributes of phenotypes or caveats.

invalidate()#: Invalidate the study.

property is_validated#: Validate the analysis.

property n_tests#: Get the number of tests associated with the analysis (int).

property name#: Get the analysis name, this is an alias of analysis_name (str).

property parent#

Get the parent study object. The parent is set with the bind method (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)

Raises:: AttributeError – If no parent study has been defined.

classmethod parse_caveats(element)#

Will parse out XML documenting the caveats.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name caveat.
Returns:: caveat_data – The caveat object to add to the analysis.
Return type:: gwas_norm.metadata.phenotype.Caveat

classmethod parse_info_data(element)#

Will parse out XML documenting the info data.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name info.
Returns:: info_data – The parsed info object.
Return type:: gwas_norm.metadata.info.Info

classmethod parse_phenotypes(element)#

Will parse out XML documenting the phenotype data.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name phenotype.
Returns:: phenotype_data – The phenotype object to add to the analysis.
Return type:: gwas_norm.metadata.phenotype.Phenotype

classmethod parse_tests(element)#

Will parse out XML documenting the tests to perform on the analysis.

Parameters:: element (lxml.etree.Element) – The element should contain sub-elements with the tag name test.
Returns:: tests – A list of test objects to add to the analysis.
Return type:: list of gwas_norm.metadata.test.Test

property phenotype#: Get the phenotype definition associated with the analysis (gwas_norm.metadata.phenotype.Phenotype or NoneType).

property tests#

Get the tests associated with the analysis (list of gwas_norm.metadata.test.Test).

Notes

The list will be empty if there are no associated tests. The returned list is a copy of the list stored in the analysis (although the actual Test objects are not copies).

unbind()#

Remove a parent study from this analysis. This also removes the analysis from the study parent.

Returns:: parent – The parent object that has been unbound.
Return type:: gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile

VALUE_TAG = 'value'#: The XML tag name for key values (str).

repr_attr_str()#

Used to output a list of strings containing attribute=value for the attributes of this object.

This is used in various __repr__ methods.

Returns:: attr_str – String representation of the objects attributes and values.
Return type:: list of str

property keys#

Get key-values for the analysis.

Returns:: key_values – The first element of the nested tuple is the column and the second is the column value.
Return type:: list of (gwas_norm.metadata.column.Column, str)

Notes

These are values that will uniquely ID rows belonging to the analysis within the parent study file.

add_key(key_column, key_value)#

Add a key value to the analysis object.

Parameters:

key_column (gwas_norm.metadata.column.Column) – A key column in the input file where the key value should be located.
key_value (str) – A key value to be added to the analysis

Raises:

ValueError – If the value is an empty string '' or all spaces or NoneType.

Notes

These will be used to ID rows belonging to the analysis when parsing through a StudyFile. The key will be built in the order that the keys are added to the analysis.

validate()#: Validate the analysis. This respects the file checking parameter.

to_xml()#

Generate an lxml.etree.Element object for all the attributes within the analysis.

Returns:: key_analysis_element – An element representing an analysis that can be used in a larger XML structure.
Return type:: lxml.etree.Element

classmethod from_xml(element, **kwargs)#

Generate a KeyAnalysis object from an lxml.etree.Element with the tag name key_analysis.

Parameters:: element (lxml.etree.Element) – The element should have the tag name key_analysis.
Returns:: key_analysis – An analysis object built from all the elements in the analysis object.
Return type:: gwas_norm.metadata.analysis.KeyAnalysis
Raises:: KeyError – If the name of the element is not key_analysis.

class gwas_norm.metadata.analysis.AnalysisFile(analysis_name, analysis_type, effect_type, units=None, files=None, cohort=None, file_check=True, **kwargs)#

Bases: _BaseAnalysis, FileHolderMixin

A representation of an AnalysisFile type. This is an analysis that is directly associated with one or more data files.

Parameters:

analysis_name (str) – The unique name of the analysis. The analysis name will be made in to a lowercase string and spaces will be replaced with underscores _.
analysis_type (str) – The analyses type for the analysis. Will be applied to any analysis, that do not have an analysis_type specified
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified
units (str, optional, default: NoneType) – The units for the analysis
files (list of gwas_norm.metadata.file.GwasFile, optional, default: NoneType) – Analysis files. Analysis level files are for data such as full GWAS data for a disease as opposed to study level files such as GTEX data where multiple gene analysis are in the same file.
cohort (gwas_norm.metadata.cohort.Cohort, optional, default: NoneType) – The cohort description.
phenotype (gwas_norm.metadata.phenotype.Phenotype, optional, default: NoneType) – The phenotype description associated with the analysis
caveat (gwas_norm.metadata.phenotype.Caveat, optional, default: NoneType) – The caveat description associated with the analysis.
tests (list of gwas_norm.metadata.test.Test, optional, default: NoneType) – One of more tests that should be applied to the analysis. Tests are not implemented yet.

ANALYSIS_ID_TAG = 'analysis_id'#: The XML tag name for the analysis ID (str)

ANALYSIS_NAME_TAG = 'analysis_name'#: The XML tag name for the analysis name (str)

ANALYSIS_TYPE_TAG = 'analysis_type'#: The name of the analysis type tag/element in the XML file (str).

EFFECT_TYPE_TAG = 'effect_type'#: The name of the effect type tag/element in the XML file (str).

FILE_CLASS = None#: The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)

UNITS_TAG = 'units'#: The name of the units tag/element in the XML file (str).

add_file(gwas_file, error=False)#

Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.

Parameters:

gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.

Raises:

TypeError – If the gwas_file is not the correct type.
KeyError – If the GWAS file is specified already and error is True.

add_test(test)#

Add a test element to the analysis.

Parameters:: test (gwas_norm.metadata.test.Test) – The test to add.

property analysis_id#: Get the ID for the analysis, if not set then an ID will be generated (int)

property analysis_name#: Get the analysis name (str).

property analysis_type#: Get the analysis type (str).

bind(parent)#

Bind the analysis with a parent Study object.

Parameters:: parent (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) – A parental study object.
Raises:: KeyError – If the analysis is already bound to a different study object. Call Analysis.unbind() first.

Notes

The binding instigates a reciprocal adding of the analysis object to the study object, so binding is in both directions. An analysis can only be bound to a single study parent.

property caveat#: Get the caveat definition associated with the analysis (gwas_norm.metadata.phenotype.Caveat or NoneType).

property chksum#

Get the BSD checksum based on the analysis_name.

Returns:: chksum – A 5 character BSD checksum of the analysis name.
Return type:: str

property cohort#

Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).

Notes

The cohort associated with the analysis or NoneType if no cohort has been set.

create_analysis_type_xml(element)#

Generate all the analysis type XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_cohort_xml(element)#

create cohort specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_effect_type_xml(element)#

Generate the XML element for the effect type.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_files_xml(element)#

create file specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_units_xml(element)#

Generate the units XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_xml(element)#

Generate all the XML elements relating to objects that hold files. This wraps all other create_* methods in the mixin.

Parameters:

element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.

property effect_type#: Get the effect type (str or NoneType).

property file_check#: Get the file checking status (bool).

file_repr_attr_str()#

Called by the __repr__ of host objects to supply a key=value string of the attributes and their values relating to the mixin.

Returns:: attr_str – Each string is an attribute and value for printing.
Return type:: list of str

property files#: Get all the associated files (list of gwas_norm.metadata.file.GwasFile).

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: parse_class – A class of type gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
Return type:: class
Raises:: KeyError – If the appropriate class can’t be found for the tag.

property has_caveat#: Has the analysis got a caveat associated with it (bool).

property has_phenotype#: Has the analysis got a phenotype associated with it (bool).

has_test(chr_name, start_pos)#

Determine if the analysis has any tests matching chr_name, start_pos.

Parameters:

chr_name (str) – A chr_name.
start_pos (int) – The start position.

Returns:

test_present – True if tests exist for this chromosome/start position, False if not.

Return type:

bool

property info#: Get the analysis info data (gwas_norm.metadata.info.Info or NoneType).

property info_columns#: Get all the input file columns that will contribute towards the analysis info fields (list of gwas_norm..metadata.column.Column)

property info_defs#

Get all the definitions that will contribute towards the analysis info fields (list of gwas_norm.metadata.phenotype.Definition).

Notes

These may be defined within the info field or attributes of phenotypes or caveats.

init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#

Initialise all the attributes that a file handling object needs.

Parameters:

analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.

Notes

This is usually called from the __init__ method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.

invalidate()#: Invalidate the study.

property is_validated#: Validate the analysis.

property n_files#: Get the number of associated files (int).

property n_tests#: Get the number of tests associated with the analysis (int).

property name#: Get the analysis name, this is an alias of analysis_name (str).

property parent#

Get the parent study object. The parent is set with the bind method (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)

Raises:: AttributeError – If no parent study has been defined.

classmethod parse_caveats(element)#

Will parse out XML documenting the caveats.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name caveat.
Returns:: caveat_data – The caveat object to add to the analysis.
Return type:: gwas_norm.metadata.phenotype.Caveat

classmethod parse_files(element, **kwargs)#

Parse any file elements out from the XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
Returns:: gwas_files – GWAS file objects.
Return type:: list of (gwas_norm.metadata.file.GwasFile)
Raises:: KeyError – If no file elements can be found in the parent element.

classmethod parse_info_data(element)#

Will parse out XML documenting the info data.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name info.
Returns:: info_data – The parsed info object.
Return type:: gwas_norm.metadata.info.Info

classmethod parse_phenotypes(element)#

Will parse out XML documenting the phenotype data.

Parameters:: element (lxml.etree.Element) – The element should have sub-elements with the tag name phenotype.
Returns:: phenotype_data – The phenotype object to add to the analysis.
Return type:: gwas_norm.metadata.phenotype.Phenotype

classmethod parse_tests(element)#

Will parse out XML documenting the tests to perform on the analysis.

Parameters:: element (lxml.etree.Element) – The element should contain sub-elements with the tag name test.
Returns:: tests – A list of test objects to add to the analysis.
Return type:: list of gwas_norm.metadata.test.Test

classmethod parse_xml(element, **kwargs)#

Parse the file associated data from the XML element.

Parameters:

element (lxml.etree.Element) – The parent XML element to parse the elements from.

Returns:

analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.

Raises:

KeyError – If any of the required elements can be found in the parent element.

property phenotype#: Get the phenotype definition associated with the analysis (gwas_norm.metadata.phenotype.Phenotype or NoneType).

remove_files(gwas_files)#

Remove one or more gwas file from this object.

Parameters:: gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.

property tests#

Get the tests associated with the analysis (list of gwas_norm.metadata.test.Test).

Notes

The list will be empty if there are no associated tests. The returned list is a copy of the list stored in the analysis (although the actual Test objects are not copies).

unbind()#

Remove a parent study from this analysis. This also removes the analysis from the study parent.

Returns:: parent – The parent object that has been unbound.
Return type:: gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile

property units#: Get the units (str or NoneType)

ROOT_TAG = 'analysis'#: The name of the root XML element tag name (str)

validate()#: Validate the analysis.

property study_source_absolute_dir#

Return the parent study source directory absolute path (str).

Notes

This provides a uniform interface for file_holder methods to access the route path without having to know the type of the class they are joined with.

repr_attr_str()#

Get a list of strings representing the core attributes of the object and there values.

Returns:: attr_str – The core attributes handled by the base class, and their values.
Return type:: list of str

Notes

This is used by the __repr__ of the base class and can be called by any subclasses in their __repr__ methods.

on_file_added(file_obj, **kwargs)#

Callback for when a file has been added.

Parameters:: file_obj (gwas_norm.metadata.file.GwasFile) – The file object being added.

to_xml()#

Generate an lxml.etree.Element object for all the attributes within the analysis. This will have the tag name analysis.

Returns:: analysis_element – An element representing an analysis that can be used in a larger XML structure.
Return type:: lxml.etree.Element

classmethod from_xml(element, **kwargs)#

Generate an Analysis object from an lxml.etree.Element with the tag name analysis.

Parameters:: element (lxml.etree.Element) – The element should have the tag name analysis
Returns:: analysis_obj – An analysis object built from all the elements in the analysis object.
Return type:: gwas_norm.metadata.analysis.AnalysisFile
Raises:: ValueError – If no file elements are associated with the analysis element.

`gwas_norm.metadata.phenotype`#

Classes for building phenotype structures.

class gwas_norm.metadata.phenotype.Phenotype(definition, reference_string=None)#

Bases: _BasePhenotype

A representation of a phenotype.

Parameters:: definition (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or or gwas_norm.metadata.phenotype.And) – The either a single phenotype definition or a composite one.

ROOT_TAG = 'phenotype'#: The root XML tag for the class (str)

classmethod from_xml(element)#

Read the phenotype definitions from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the phenotype.
Returns:: phenotype_definition – The parsed phenotype definition.
Return type:: gwas_norm.metadata.phenotype.Phenotype

REFERENCE_STRING_TAG = 'reference_string'#: The XML tag name for a phenotype/caveat reference string (str).

property definition#: Get the phenotype/caveat definition (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat).

property flat_definition#: Get the phenotype/caveat definition, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name phenotype or caveat.
Returns:: class – The relevant class for the element.
Return type:: class of (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat)
Raises:: KeyError – If the element does not have the required tag name.

property info_defs#: Return any definitions that have been tagged as an info definition. (list of gwas_norm.metadata.phenotype.Definition).

property reference_string#: Get the phenotype/caveat reference string (str).

to_xml()#

Write all the child elements out to a <phenotype>/<caveat> XML element.

Returns:: element – The XML element representing the <phenotype>/<caveat>.
Return type:: lxml.etree.Element

class gwas_norm.metadata.phenotype.Caveat(definition, reference_string=None)#

Bases: _BasePhenotype

A representation of a caveat.

A caveat is defined as anything that will alter the interpretation of the phenotype associations/effect sizes.

Parameters:: definition (gwas_norm.metadata.phenotype_obj.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or or gwas_norm.metadata.phenotype.And) – The either a single cohort definition or a composite one.

ROOT_TAG = 'caveat'#: The root XML tag for the class (str)

classmethod from_xml(element)#

Read the caveat definitions from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the caveat.
Returns:: caveat_definition – The parsed caveat definition.
Return type:: gwas_norm.metadata.phenotype.Caveat

REFERENCE_STRING_TAG = 'reference_string'#: The XML tag name for a phenotype/caveat reference string (str).

property definition#: Get the phenotype/caveat definition (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat).

property flat_definition#: Get the phenotype/caveat definition, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name phenotype or caveat.
Returns:: class – The relevant class for the element.
Return type:: class of (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat)
Raises:: KeyError – If the element does not have the required tag name.

property info_defs#: Return any definitions that have been tagged as an info definition. (list of gwas_norm.metadata.phenotype.Definition).

property reference_string#: Get the phenotype/caveat reference string (str).

to_xml()#

Write all the child elements out to a <phenotype>/<caveat> XML element.

Returns:: element – The XML element representing the <phenotype>/<caveat>.
Return type:: lxml.etree.Element

class gwas_norm.metadata.phenotype.Definition(name, info=False, map_to=None, dtype=None)#

Bases: _XmlBase, InfoHolderMixin

A definition (name and type) of a phenotype, caveat or synonym.

Parameters:

name (str) – The definition name.
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: text) – If info is true map_to indicates that the definition should be known as the map_to value in the info field. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.

ROOT_TAG = 'definition'#: The root XML tag for the class (str)

UNDEF_TYPE = 'text'#: The name of a type attribute that has not been set (str)

property reference_string#: Get the definition reference string, this is the same as the name (str).

property flat_definition#: Get the definition as a list with a single definition object. (list of gwas_norm.metadata.phenotype.Definition).

to_xml()#

Write the definition out to an XML element.

Returns:: definition_element – The XML element representing the definition. Has the tag name <definition>
Return type:: lxml.etree.Element

classmethod from_xml(element)#

Read the definition from an XML element.

Parameters:: definition_element (lxml.etree.Element) – The XML element representing the definition. Has the tag name <definition>.
Returns:: definition – The definition object.
Return type:: gwas_norm.metadata.phenotype.Definition

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name definition.
Returns:: class – The relevant class for the element.
Return type:: class of gwas_norm.metadata.phenotype.Definition
Raises:: KeyError – If the element does not have the required tag name.

DATA_TYPE_ATTRIBUTE = 'dtype'#: The name of the data type attribute of the column (str)

INFO_ATTRIBUTE = 'info'#: The name of the info attribute of the column (str)

MAP_TO_ATTRIBUTE = 'map_to'#: The name of the key attribute of the column (str)

property dstruct#: Get the data structure value. C is a scalar. A is an array, (str).

property dtype#: Get the dtype value. S is a string value. F is a float, I is an integer (str).

equals(other)#: Determine equality against another InfoHolderMixin containing object. This is based on map_to, dtype and dstruct values matching

classmethod get_attributes(element)#

Get the attributes from an XML element.

Parameters:

element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.

Returns:

info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the map_to value in the info field and not as the name.
dtype (str) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array.

property info#: Get the is info output value (bool).

init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#

Initialise all of the info related values for the mixin.

Parameters:

info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the map_to value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.
all_info_false (bool, optional, default: False) – If this is set to True, then if info is False and the map_to is defined, then map_to it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.

property map_to#: Get the column name remapping value (str or NoneType).

set_attributes(element)#

Set the attributes into an XML element.

Parameters:: element (lxml.etree.Element) – The element to add the attributes.

class gwas_norm.metadata.phenotype.Synonym(*synonyms)#

Bases: _XmlBase

A container for phenotype definitions that are synonyms of the same thing.

Parameters:: *synonyms – One or more gwas_norm.metadata.phenotype.Definition objects.

ROOT_TAG = 'synonym'#: The root element for the class (str)

property reference_string#: Get the Synonym reference string, this is the first added synonym (str).

property flat_definition#: Get the synonym definitions, flattened to a list of definition objects (list of gwas_norm.metadata.phenotype.Definition).

property synonyms#: Get all the synonym definitions (list of gwas_norm.metadata.phenotype.Definition).

add(s)#

Add a phenotype definition to the synonym.

Parameters:

s (gwas_norm.metadata.phenotype.Definition) – The synonym definition being added.

Raises:

TypeError – If the definition is not an instance of gwas_norm.metadata.phenotype.Definition.
KeyError – If the definition has already been defined in the synonym.

to_xml()#

Write all the synonyms out to an XML element.

Returns:: synonym_element – The XML element representing the synonym. Has the tag name <synonym>.
Return type:: lxml.etree.Element
Raises:: IndexError – If there is < 2 definitions in the synonym.

classmethod from_xml(element)#

Read the definition from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the synonyms. Should have the tag name <synonym>.
Returns:: synonyms – The synonym object.
Return type:: gwas_norm.metadata.phenotype.Synonym

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name synonym.
Returns:: class – The relevant class for the element.
Return type:: class of gwas_norm.metadata.phenotype.Synonym
Raises:: KeyError – If the element does not have the required tag name.

class gwas_norm.metadata.phenotype.And(*contents)#

Bases: _AndOrBase

A phenotype <and> statement.

Parameters:: *contents – One or more instances of: gwas_norm.metadata.phenotype.Definition gwas_norm.metadata.phenotype.Synonym gwas_norm.metadata.phenotype.Or

property contents#

Get all the and/or definitions.

Returns:

contents_list (list of)
(gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The contents of the And/Or statement.

property flat_definition#: Get the synonym definitions, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name and/or.
Returns:: class – The relevant class for the element.
Return type:: class of (gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or)
Raises:: KeyError – If the element does not have the required tag name.

classmethod parse_xml(element)#

Parse the contents of the <and>/<or> definiton from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the <and>/<or> definition. Should have the tag name <and>, <or>.
Returns:: object – The And/Or object.
Return type:: gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or
Raises:: IndexError – If there is < 2 definition in the And/Or.

to_xml()#

Write all the components of the And/Or statement out to an XML element.

Returns:: element – The XML element representing the and/or statement. Has the tag name <and>/<or>.
Return type:: lxml.etree.Element
Raises:: IndexError – If there is < 2 definition in the And/Or.

ROOT_TAG = 'and'#: The root element for the class (str)

property reference_string#: Get the And reference string (str)

add(s)#

Add an element to the And statement.

Parameters:

s (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The element to add.

Raises:

TypeError – If the element being added is not of the expected type.
IndexError – If an Or or Synonym element is being added then it must have > 1 definition within it.

classmethod from_xml(element)#

Read the <and> from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the and statement. Should have the tag name <and>.
Returns:: and_obj – The and object.
Return type:: gwas_norm.metadata.phenotype.And

class gwas_norm.metadata.phenotype.Or(*contents)#

Bases: _AndOrBase

A phenotype Or statement.

Parameters:: *contents – One or more instances of: gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.And

property contents#

Get all the and/or definitions.

Returns:

contents_list (list of)
(gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The contents of the And/Or statement.

property flat_definition#: Get the synonym definitions, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name and/or.
Returns:: class – The relevant class for the element.
Return type:: class of (gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or)
Raises:: KeyError – If the element does not have the required tag name.

classmethod parse_xml(element)#

Parse the contents of the <and>/<or> definiton from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the <and>/<or> definition. Should have the tag name <and>, <or>.
Returns:: object – The And/Or object.
Return type:: gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or
Raises:: IndexError – If there is < 2 definition in the And/Or.

to_xml()#

Write all the components of the And/Or statement out to an XML element.

Returns:: element – The XML element representing the and/or statement. Has the tag name <and>/<or>.
Return type:: lxml.etree.Element
Raises:: IndexError – If there is < 2 definition in the And/Or.

ROOT_TAG = 'or'#: The root element for the class (str)

property reference_string#: Get the Or reference string (str)

add(s)#

Add an element to the Or statement.

Parameters:

s (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.And) – The element to add.

Raises:

TypeError – If the element being added is not of the expected type.
IndexError – If an And or Synonym element is being added then it must have > 1 definition within it.

classmethod from_xml(element)#

Read the <or> from an XML element.

Parameters:: element (lxml.etree.Element) – The XML element representing the or statement. Should have the tag name <or>
Returns:: or_obj – The Or object.
Return type:: gwas_norm.metadata.phenotype.Or

`gwas_norm.metadata.cohort`#

Cohort XML elements.

class gwas_norm.metadata.cohort.SampleSizeMixin#

Bases: _BaseSample

A mix in to add sample size storage and methods.

Notes

The sample size can be expressed in real integer values (i.e. 20000) or a proportional value (i.e. 0.2). Whilst the proportional value does not make sense on it’s own, if this is in a population within a cohort then it will be expected that other populations will also be expressed as proportions.

TYPE = 'sample'#: The type of the population, i.e. case_control, sample or NoneType (str)

N_SAMPLES_TAG = 'n_samples'#: The name of the element containing the number of samples (str)

property n_samples#: Get the number of samples (int or float).

create_nsamples_xml(element)#

Write number of samples element to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_samples elements.

classmethod parse_xml(element)#

This determines if the element has any number of sample definitions and if it does it parses them.

Parameters:: element (lxml.etree.Element) – The element potentially containing a n_samples.
Returns:: nsamples – The number of samples. If no n_samples are found then this will be NoneType.
Return type:: int or float or NoneType

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

class gwas_norm.metadata.cohort.CaseControlMixin#

Bases: _BaseSample

A mix in to add case/control size storage and methods.

TYPE = 'case_control'#: The type of the class, i.e. case_control, sample or NoneType (str)

N_CASES_TAG = 'n_cases'#: The name of the XML tag containing the number of cases (str)

N_CONTROLS_TAG = 'n_controls'#: The name of the XML tag containing the number of controls (str)

property n_samples#: Get the number of samples set, this is the sum of cases+controls (int or float).

property n_cases#: Get the number of cases (int or float).

property n_controls#: Get the number of controls (int or float).

create_case_xml(element)#

Write case control elements to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_cases and n_controls elements.

classmethod parse_xml(element)#

This determines if the element has any case/control definitions and if it does it will error check and parse them.

Parameters:

element (lxml.etree.Element) – The element potentially containing a n_cases and n_controls elements.

Returns:

ncases (int or float or NoneType) – The number of cases. If no n_cases, n_contols elements are found then this will be NoneType.
ncontrols (int or float or NoneType) – The number of controls. If no n_cases, n_contols elements are found then this will be NoneType.

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

class gwas_norm.metadata.cohort.LdReference(name, weight, pop_names=None)#

Bases: _PopReference

The LD reference population container.

Parameters:

name (str) – The name for the population reference group.
weight (float) – The weighting this population group should be given to the overall reference.
pops_names (list of str, optional, default: NoneType) – The population names that can be used interchangeably to represent this reference population. These are applied hierarchically with the topmost in the list being tried before the bottom of the list.

Notes

This is a representation of all the population groups that can be used interchangeably to represent a component of an LD reference group.

ROOT_TAG = 'ld_ref'#: The name of the root XML element tag name, this should be overridden by sub-classes (str)

NAME_TAG = 'name'#: The tag name for a reference population name (str)

REF_POP_TAG = 'ref_pop'#: The tag name for a reference population tag (str)

WEIGHT_TAG = 'weight'#: The tag name for a reference population weight (str)

add_pop(pop)#

Add a population to the population list.

Parameters:: pop (str) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

classmethod from_xml(element)#

Generate a reference population object.

Parameters:: element (lxml.etree.Element) – The reference population element.
Returns:: ref_pop_obj – A reference population object.
Return type:: gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference
Raises:: KeyError – If the name of the element is recognised.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: ref_pop_class – The relevant reference population class for the element.
Return type:: class of (gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference)
Raises:: KeyError – If the element does not have the required tag name.

property name#: Get the reference population name (str).

property pops#: Return the population names (list of str).

property refpops#: Return the population names (list of str).

remove_pop(pop)#

Remove a population from the population list.

Parameters:: pop (str) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

reset_pops()#: Reset the population list to empty.

to_xml()#

Generate a XML element for the reference population.

Returns:: element – The XML element representation the cohort.
Return type:: lxml.etree.Element

property weight#: Get the reference population weight (str).

class gwas_norm.metadata.cohort.FreqReference(name, weight, pop_names=None)#

Bases: _PopReference

The base allele frequency reference population container.

Parameters:

name (str) – The name for the population reference group.
weight (float) – The weighting this population group should be given to the overall reference.
pops_names (list of str, optional, default: NoneType) – The population names that can be used interchangeably to represent this reference population. These are applied hierarchically with the topmost in the list being tried before the bottom of the list.

Notes

This is a representation of all the population groups that can be used interchangeably to represent a component of a frequency reference population group.

ROOT_TAG = 'allele_freq_ref'#: The name of the root XML element tag name, this should be overridden by sub-classes (str)

NAME_TAG = 'name'#: The tag name for a reference population name (str)

REF_POP_TAG = 'ref_pop'#: The tag name for a reference population tag (str)

WEIGHT_TAG = 'weight'#: The tag name for a reference population weight (str)

add_pop(pop)#

Add a population to the population list.

Parameters:: pop (str) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

classmethod from_xml(element)#

Generate a reference population object.

Parameters:: element (lxml.etree.Element) – The reference population element.
Returns:: ref_pop_obj – A reference population object.
Return type:: gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference
Raises:: KeyError – If the name of the element is recognised.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: ref_pop_class – The relevant reference population class for the element.
Return type:: class of (gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference)
Raises:: KeyError – If the element does not have the required tag name.

property name#: Get the reference population name (str).

property pops#: Return the population names (list of str).

property refpops#: Return the population names (list of str).

remove_pop(pop)#

Remove a population from the population list.

Parameters:: pop (str) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

reset_pops()#: Reset the population list to empty.

to_xml()#

Generate a XML element for the reference population.

Returns:: element – The XML element representation the cohort.
Return type:: lxml.etree.Element

property weight#: Get the reference population weight (str).

class gwas_norm.metadata.cohort.Population(name, freq_pops=None, ld_pops=None)#

Bases: _XmlBase

A representation of a population where the sample size is not known.

Parameters:

name (str) – A free text name for the population group.
freq_pops (list of gwas_norm.metadata.cohorts.FreqReference, optional, default: NoneType) – A hierarchical list of reference populations names that will be used to obtain allele frequency estimates if not provided by the study.
ld_pops (list of gwas_norm.metadata.cohorts.LdReference, optional, default: NoneType) – A hierarchical list of reference populations names that will be used to obtain allele LD estimates.

Notes

If > 1 freq_pop or LD reference is provided these will have a weighting attached to them that indicates their weight in the overall frequency or LD estimate. Within each reference the populations names will be applied hierarchically.

ROOT_TAG = 'population'#: The name of the root XML element tag name (str)

TYPE = None#: The type of the population, i.e. case_control, sample or NoneType (NoneType)

NAME_TAG = 'name'#: The name of the XML tag containing the free text population name (str)

property freq_pops#: Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).

property ld_pops#: Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).

add_ld_pop(pop)#

Add a population to the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

add_freq_pop(pop)#

Add a population to the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

remove_ld_pop(pop)#

Remove a population from the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

remove_freq_pop(pop)#

Remove a population from the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

reset_ld_pops()#: Reset the LD populations to empty.

reset_freq_pops()#: Reset the allele frequency populations to empty.

to_xml()#

Generate a XML element for the population.

Returns:: element – The XML element representation the cohort.
Return type:: lxml.etree.Element

classmethod from_xml(element)#

Generate a population object.

Parameters:

element (lxml.etree.Element) – The element should have the tag name population.

Returns:

population_obj – A population object built from all the elements in the population element. The exact class will depend in the element within the population element.

Return type:

gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation

Raises:

KeyError – If the name of the element is not population.
ValueError – If both n_cases or n_controls are not defined.

Notes

The returned object will be of type Population, CaseControlPopulation or SamplePopulation depending on if the lxml.etree.Element has the tag name population, case_control_population or sample_population respectively.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: population_class – The relevant population class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
Raises:: KeyError – If the element does not have the required tag name.

class gwas_norm.metadata.cohort.CaseControlPopulation(name, ncases, ncontrols, **kwargs)#

Bases: CaseControlMixin, Population

A representation of a population where the number of cases and controls are defined.

Parameters:

name (str) – A free text name for the population group.
ncases (int or float) – The number of cases.
ncontrols (int or float) – The number of controls.
freq_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain allele frequency estimates.
ld_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain LD estimates.

Notes

The hierarchy of the ld_pops and freq_pops refers to the order they are used. If the data is not available in the first population in the list, then the next one should be used until the hierarchy is exhausted.

ROOT_TAG = 'case_control_population'#: The name of the root XML element tag name (str)

property n_cases#: Get the number of cases (int or float).

property n_controls#: Get the number of controls (int or float).

to_xml()#

Generate a XML element for the case control population.

Returns:: element – The XML element representation of the case control population.
Return type:: lxml.etree.Element

NAME_TAG = 'name'#: The name of the XML tag containing the free text population name (str)

N_CASES_TAG = 'n_cases'#: The name of the XML tag containing the number of cases (str)

N_CONTROLS_TAG = 'n_controls'#: The name of the XML tag containing the number of controls (str)

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

TYPE = 'case_control'#: The type of the class, i.e. case_control, sample or NoneType (str)

add_freq_pop(pop)#

Add a population to the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

add_ld_pop(pop)#

Add a population to the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

create_case_xml(element)#

Write case control elements to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_cases and n_controls elements.

property freq_pops#: Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).

classmethod from_xml(element)#

Generate a population object.

Parameters:

element (lxml.etree.Element) – The element should have the tag name population.

Returns:

population_obj – A population object built from all the elements in the population element. The exact class will depend in the element within the population element.

Return type:

gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation

Raises:

KeyError – If the name of the element is not population.
ValueError – If both n_cases or n_controls are not defined.

Notes

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: population_class – The relevant population class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
Raises:: KeyError – If the element does not have the required tag name.

property ld_pops#: Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).

property n_samples#: Get the number of samples set, this is the sum of cases+controls (int or float).

classmethod parse_xml(element)#

This determines if the element has any case/control definitions and if it does it will error check and parse them.

Parameters:

element (lxml.etree.Element) – The element potentially containing a n_cases and n_controls elements.

Returns:

ncases (int or float or NoneType) – The number of cases. If no n_cases, n_contols elements are found then this will be NoneType.
ncontrols (int or float or NoneType) – The number of controls. If no n_cases, n_contols elements are found then this will be NoneType.

remove_freq_pop(pop)#

Remove a population from the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

remove_ld_pop(pop)#

Remove a population from the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

reset_freq_pops()#: Reset the allele frequency populations to empty.

reset_ld_pops()#: Reset the LD populations to empty.

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

class gwas_norm.metadata.cohort.SamplePopulation(name, nsamples, **kwargs)#

Bases: SampleSizeMixin, Population

A representation of a population where the total sample size is defined.

Parameters:

name (str) – A free text name for the population group.
nsamples (int or float) – The number of samples.
freq_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain allele frequency estimates.
ld_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain LD estimates.

Notes

ROOT_TAG = 'sample_population'#: The name of the root XML element tag name (str)

property n_samples#: Get the number of samples (int or float).

to_xml()#

Generate a XML element for the sample population.

Returns:: element – The XML element representation the sample population.
Return type:: lxml.etree.Element

NAME_TAG = 'name'#: The name of the XML tag containing the free text population name (str)

N_SAMPLES_TAG = 'n_samples'#: The name of the element containing the number of samples (str)

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

TYPE = 'sample'#: The type of the population, i.e. case_control, sample or NoneType (str)

add_freq_pop(pop)#

Add a population to the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

add_ld_pop(pop)#

Add a population to the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.

create_nsamples_xml(element)#

Write number of samples element to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_samples elements.

property freq_pops#: Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).

classmethod from_xml(element)#

Generate a population object.

Parameters:

element (lxml.etree.Element) – The element should have the tag name population.

Returns:

population_obj – A population object built from all the elements in the population element. The exact class will depend in the element within the population element.

Return type:

gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation

Raises:

KeyError – If the name of the element is not population.
ValueError – If both n_cases or n_controls are not defined.

Notes

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: population_class – The relevant population class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
Raises:: KeyError – If the element does not have the required tag name.

property ld_pops#: Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).

classmethod parse_xml(element)#

This determines if the element has any number of sample definitions and if it does it parses them.

Parameters:: element (lxml.etree.Element) – The element potentially containing a n_samples.
Returns:: nsamples – The number of samples. If no n_samples are found then this will be NoneType.
Return type:: int or float or NoneType

remove_freq_pop(pop)#

Remove a population from the allele frequency population list.

Parameters:: pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

remove_ld_pop(pop)#

Remove a population from the LD population list.

Parameters:: pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.

reset_freq_pops()#: Reset the allele frequency populations to empty.

reset_ld_pops()#: Reset the LD populations to empty.

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

class gwas_norm.metadata.cohort.Cohort(populations=None, name=None)#

Bases: _XmlBase

A representation of a cohort, where the samples sizes or cases/controls are not defined in the population groups.

Parameters:

population (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)) – One or more populations, the exact class will depend in the information available. If no population is known then gwas_norm.metadata.cohort.Population should be used, if any other sample number data is known the other classes should be used as appropriate. However, gwas_norm.metadata.cohort.CaseControlPopulation can’t be mixed with gwas_norm.metadata.cohort.SamplePopulation. However, both can be mixed with gwas_norm.metadata.cohort_obj.Population.
name (str, optional, default: NoneType) – An overall free text name for the cohort.

ROOT_TAG = 'cohort'#: The name of the root XML element tag name (str)

NAME_TAG = 'name'#: The name of the tag describing the cohort name (str)

TYPE = None#: The type of the cohort, i.e. case_control, sample or NoneType (str)

property n_samples#: Return the number of samples in all populations (int).

property pops#: Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).

property name#: Return the cohort name (str or NoneType).

add_population(population)#

Add a population to the cohort.

Parameters:: populaion (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation) – The population to add. The exact class will depend in the information available. If no population is known then gwas_norm.metadata.cohort_obj.Population should be used, if any other sample number data is known the other classes should be used as appropriate. However, gwas_norm.metadata.cohort_obj.CaseControlPopulation can’t be mixed with gwas_norm.metadata.cohort_obj.SamplePopulation. However, both can be mixed with gwas_norm.metadata.cohort_obj.Population.

Notes

Populations will only be added if their name and class type is different from any existing populations in the cohort.

to_xml()#

Generate a XML element for the cohort.

Returns:: element – The XML element representation the cohort.
Return type:: lxml.etree.Element

classmethod from_xml(element)#

Generate a cohort object from an lxml.etree.Element.

Parameters:

element (lxml.etree.Element) – The element should have the tag name cohort,
sample_cohort. (case_control_cohort or) –

Returns:

cohort – A cohort object built from all the elements in the cohort elements.

Return type:

gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort

Raises:

KeyError – If the tag name of the element is not a recognised cohort tag.

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name cohort or case_control_cohort or sample_cohort.
Returns:: cohort_class – The relevant cohort class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
Raises:: KeyError – If the element does not have the required tag name.

class gwas_norm.metadata.cohort.CaseControlCohort(ncases, ncontrols, populations=None, name=None)#

Bases: CaseControlMixin, Cohort

A representation of a cohort where the number of cases and controls are defined.

Parameters:

ncases (int or float) – The number of cases.
ncontrols (int or float) – The number of controls.
populations (list of gwas_norm.metadata.cohort.Population) – One or more populations. Note this must be populations that do not have any sample sizes defined.

Notes

A cohort where the actual sample size of individual population groups within the cohort is unknown and only an aggregate sample size is known.

NAME_TAG = 'name'#: The name of the tag describing the cohort name (str)

N_CASES_TAG = 'n_cases'#: The name of the XML tag containing the number of cases (str)

N_CONTROLS_TAG = 'n_controls'#: The name of the XML tag containing the number of controls (str)

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

TYPE = 'case_control'#: The type of the class, i.e. case_control, sample or NoneType (str)

create_case_xml(element)#

Write case control elements to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_cases and n_controls elements.

classmethod from_xml(element)#

Generate a cohort object from an lxml.etree.Element.

Parameters:

element (lxml.etree.Element) – The element should have the tag name cohort,
sample_cohort. (case_control_cohort or) –

Returns:

cohort – A cohort object built from all the elements in the cohort elements.

Return type:

gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort

Raises:

KeyError – If the tag name of the element is not a recognised cohort tag.

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name cohort or case_control_cohort or sample_cohort.
Returns:: cohort_class – The relevant cohort class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
Raises:: KeyError – If the element does not have the required tag name.

property n_samples#: Get the number of samples set, this is the sum of cases+controls (int or float).

property name#: Return the cohort name (str or NoneType).

classmethod parse_xml(element)#

This determines if the element has any case/control definitions and if it does it will error check and parse them.

Parameters:

element (lxml.etree.Element) – The element potentially containing a n_cases and n_controls elements.

Returns:

ncases (int or float or NoneType) – The number of cases. If no n_cases, n_contols elements are found then this will be NoneType.
ncontrols (int or float or NoneType) – The number of controls. If no n_cases, n_contols elements are found then this will be NoneType.

property pops#: Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

ROOT_TAG = 'case_control_cohort'#: The name of the root XML element tag name (str)

property n_cases#: Get the number of cases (int or float).

property n_controls#: Get the number of controls (int or float).

add_population(population)#

Add a population to the cohort.

Parameters:: populaion (gwas_norm.metadata.cohort.Population) – The population to add. Note this must be a population that do does not have any sample sizes defined.

Notes

Populations will only be added if their name and class type is different from any existing populations in the cohort.

to_xml()#

Generate a XML element for the case control cohort.

Returns:: element – The XML element representation the case control cohort.
Return type:: lxml.etree.Element

class gwas_norm.metadata.cohort.SampleCohort(nsamples, populations=None, name=None)#

Bases: SampleSizeMixin, Cohort

A representation of a cohort where the number of samples is defined.

Parameters:

nsamples (int or float) – The number of samples.
populations (list of gwas_norm.metadata.cohort.Population) – One or more populations. Note this must be populations that do not have any sample sizes defined.

Notes

A cohort where the actual sample size of individual population groups within the cohort is unknown and only an aggregate sample size is known.

NAME_TAG = 'name'#: The name of the tag describing the cohort name (str)

N_SAMPLES_TAG = 'n_samples'#: The name of the element containing the number of samples (str)

PROPORTION_TYPE = 'proportion'#: Constant indicating a proportional sample value (str)

REAL_TYPE = 'real'#: Constant indicating a real integer sample value (str)

TYPE = 'sample'#: The type of the population, i.e. case_control, sample or NoneType (str)

create_nsamples_xml(element)#

Write number of samples element to the given element.

Parameters:: element (lxml.etree.Element) – The element to write n_samples elements.

classmethod from_xml(element)#

Generate a cohort object from an lxml.etree.Element.

Parameters:

element (lxml.etree.Element) – The element should have the tag name cohort,
sample_cohort. (case_control_cohort or) –

Returns:

cohort – A cohort object built from all the elements in the cohort elements.

Return type:

gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort

Raises:

KeyError – If the tag name of the element is not a recognised cohort tag.

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name cohort or case_control_cohort or sample_cohort.
Returns:: cohort_class – The relevant cohort class for the element.
Return type:: class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
Raises:: KeyError – If the element does not have the required tag name.

property name#: Return the cohort name (str or NoneType).

classmethod parse_xml(element)#

This determines if the element has any number of sample definitions and if it does it parses them.

Parameters:: element (lxml.etree.Element) – The element potentially containing a n_samples.
Returns:: nsamples – The number of samples. If no n_samples are found then this will be NoneType.
Return type:: int or float or NoneType

property pops#: Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).

reset_seen_values()#

Reset previously seen sample values given when setting the value_type.

Notes

This can be used if the user wants to pass a sample_value of a different type (which will normally raise a ValueError).

property value_type#: Get the sample value type (str).

ROOT_TAG = 'sample_cohort'#: The name of the root XML element tag name (str)

property n_samples#: Get the number of samples (int or float).

add_population(population)#

Add a population to the cohort.

Parameters:: populaion (gwas_norm.metadata.cohort.Population) – The population to add. Note this must be a population that do does not have any sample sizes defined.

Notes

Populations will only be added if their name and class type is different from any existing populations in the cohort.

to_xml()#

Generate a XML element for the sample cohort.

Returns:: element – The XML element representation the sample cohort.
Return type:: lxml.etree.Element

`gwas_norm.metadata.file`#

The representation of gwas summary stat file metadata.

class gwas_norm.metadata.file.FileHolderMixin#

Bases: object

A mixin designed to implement the logic for objects handling files (currently this is the StudyFile and AnalysisFile objects).

Notes

Any class using this mixin requires the on_file_added/on_files_removed methods to be implemented, these callbacks allow object specific behaviour to occur when a file is being added/removed. These accept a single file and a list of files respectively.

EFFECT_TYPE_TAG = 'effect_type'#: The name of the effect type tag/element in the XML file (str).

ANALYSIS_TYPE_TAG = 'analysis_type'#: The name of the analysis type tag/element in the XML file (str).

UNITS_TAG = 'units'#: The name of the units tag/element in the XML file (str).

FILE_CLASS = None#: The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)

file_repr_attr_str()#

Called by the __repr__ of host objects to supply a key=value string of the attributes and their values relating to the mixin.

Returns:: attr_str – Each string is an attribute and value for printing.
Return type:: list of str

init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#

Initialise all the attributes that a file handling object needs.

Parameters:

analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.

Notes

This is usually called from the __init__ method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.

property cohort#

Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).

Notes

The cohort associated with the analysis or NoneType if no cohort has been set.

property files#: Get all the associated files (list of gwas_norm.metadata.file.GwasFile).

property n_files#: Get the number of associated files (int).

property file_check#: Get the file checking status (bool).

property analysis_type#: Get the analysis type (str).

property effect_type#: Get the effect type (str or NoneType).

property units#: Get the units (str or NoneType)

add_file(gwas_file, error=False)#

Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.

Parameters:

gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.

Raises:

TypeError – If the gwas_file is not the correct type.
KeyError – If the GWAS file is specified already and error is True.

remove_files(gwas_files)#

Remove one or more gwas file from this object.

Parameters:: gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.

create_effect_type_xml(element)#

Generate the XML element for the effect type.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_analysis_type_xml(element)#

Generate all the analysis type XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_units_xml(element)#

Generate the units XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the element to.
Returns:: element – The parent XML element with the element added.
Return type:: lxml.etree.Element

create_files_xml(element)#

create file specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_cohort_xml(element)#

create cohort specific XML elements in the parental element.

Parameters:: element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
Returns:: element – The parent XML element with the file elements added.
Return type:: lxml.etree.Element

Notes

This is designed to add XML elements to a study/analysis element that can has file parameters.

create_xml(element)#

Generate all the XML elements relating to objects that hold files. This wraps all other create_* methods in the mixin.

Parameters:

element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.

classmethod parse_xml(element, **kwargs)#

Parse the file associated data from the XML element.

Parameters:

element (lxml.etree.Element) – The parent XML element to parse the elements from.

Returns:

analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.

Raises:

KeyError – If any of the required elements can be found in the parent element.

classmethod parse_files(element, **kwargs)#

Parse any file elements out from the XML element.

Parameters:: element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
Returns:: gwas_files – GWAS file objects.
Return type:: list of (gwas_norm.metadata.file.GwasFile)
Raises:: KeyError – If no file elements can be found in the parent element.

class gwas_norm.metadata.file.GwasFile(relative_path, column_map, chrpos_spec=None, parent=None, comment_char=None, pvalue_logged=False, compression=None, skiplines=0, md5_chksum=None, encoding='utf-8', file_check=True, has_header=True, keys=None, info=None, **csv_kwargs)#

Bases: _XmlBase

The base class for a representation of an input GWAS file. Do not use directly.

Parameters:

relative_path (str) – The relative path to the GWAS file, relative to the study_source_dir in the study object.
column_map (dict) – The keys should be standard column names and the values should be GWAS file column names,
chrpos_spec (gwas_norm.normalise.ChrPosSpec, optional, default: NoneType) – The specification of columns in a combined chromosome position column. Whilst this is optional, an error will be raised if it is NoneType and a chrpos column is defined in the column_map.
parent (gwas_norm.metadata.study.StudyFile or gwas_norm.metadata.analysis.AnalysisFile, optional, default: NoneType) – The parent object that will hold the file. A reciprocal bind is initiated in the parent.
comment_char (str, optional, default: NoneType) – A character that is treated as a comment either at the start of a line or at the start of the file.
pvalue_logged (bool, optional, default: False) – Is the pvalue in the data file -log10 transformed.
compression (bool, optional, default: NoneType) – The type of compression in the file.
skiplines (int, optional, default: 0) – A fixed number of rows to skip before looking for the header. Any comment rows are not included in this. i.e. skip lines from the start of the file then look for comment lines.
md5_chksum (str, optional, default: NoneType) – The 32 character checksum of the file. If not supplied then it is calculated upon XML writing or can be calculated with gwas_norm.metadata.file.GwasFile.check_file.
encoding (str, optional, default: NoneType) – The encoding of the file. The default is NoneType and this will mean utf-8.
file_check (bool, optional, default: False) – Should file checks be performed. If this is False and md5_chksum is not defined then an error is raised. If True this will calculate the MD5 hash. If md5_chksum is NoneType then the calculated md5_chksum is used when the XML file is written. If md5_chksum is supplied then the supplied md5_chksum is compared to the calculated one. File checks will also test for the presence of a header and check the delimiter and valid compression (where possible).
has_header (bool, optional, default: True) – Does the input file have a header?
keys (list of gwas_norm.metadata.column.Column, optional, default: NoneType) – Any key columns that have been defined in the file. Key columns are used to define row keys for a GWAS file, this is these must be set if the file is being added to a gwas_norm.metadata.study.StudyFile object. Key columns must be present in the header of a file.
info (gwas_norm.metadata.info.Info, optional, default: NoneType) – Any file-level info definitions/columns. Info columns must be present in the header of the file.
**csv_kwargs – Any arguments used in a csv dialect to read in a csv file. These are: currently only delimiter and lineterminator are supported for XML writing, the others that are stored but not supported yet in XML writing are doublequote, escapechar, quotechar, quoting, skipinitialspace,``strict`` or dialect. Note however, that unlike csv the delimiter defaults to a tab (`` ``).

ROOT_TAG = 'file'#: The name of the root XML element tag (str)

RELATIVE_PATH_TAG = 'relative_path'#: The name of the relative file path XML tag (str)

MD5_TAG = 'md5_chksum'#: The name of the MD5 chksum XML tag (str)

COMMENT_CHAR_TAG = 'comment_char'#: The name of the comment character XML tag (str)

SKIPLINES_TAG = 'skiplines'#: The name of the skiplines XML tag (str)

PVALUE_LOGGED_TAG = 'pvalue_logged'#: The name of the pvalue is log transformed XML tag (str)

COMPRESSION_TAG = 'compression'#: The name of the file compression XML tag (str)

ENCODING_TAG = 'encoding'#: The name of the file encoding tag XML (str)

CHR_POS_SPEC_TAG = 'chrpos_spec'#: The name of the chromosome position XML tag (str)

COLUMNS_TAG = 'columns'#: The XML tag for the GWAS file columns tag in the XML document (str)

HAS_HEADER_TAG = 'has_header'#: The XML tag for indicating of the GWAS file has a header row (str)

KEYS_TAG = 'keys'#: The XML tag for indicating key columns in the GWAS file (str)

INFO_ATTRIBUTE = 'info'#: The attribute name indicating a mapping column should be used in the info field (str)

MAP_TO_ATTRIBUTE = 'map_to'#: The attribute name indicating a mapping column should be used in the info field and be mapped to a different name (str)

CSV_DOUBLE_QUOTE_TAG = 'doublequote'#: The XML tag name of the csv argument element for double quotes (str)

CSV_ESCAPE_CHAR_TAG = 'escapechar'#: The XML tag name of the csv argument element for escape character (str)

CSV_QUOTE_CHAR_TAG = 'quotechar'#: The XML tag name of the csv argument element for quote character (str)

CSV_QUOTING_TAG = 'quoting'#: The XML tag name of the csv argument element for quoting (str)

CSV_SKIP_INIT_WHITESPACE_TAG = 'skipinitialspace'#: The XML tag name of the csv argument element for skipping initial whitespace (str)

CSV_STRICT_TAG = 'strict'#: The XML tag name of the csv argument element for strict (str)

CSV_LINE_TERMINATOR_TAG = 'lineterminator'#: The XML tag name of the csv argument element for the line terminator (str)

CSV_DELIMITER_TAG = 'delimiter'#: The XML tag name of the csv argument element for the delimiter (str)

property is_validated#: Validate the analysis.

validate()#: Validate the analysis.

invalidate()#: Invalidate the study.

property csv_kwargs#: Get the CSV keyword arguments, if these have not been set then we use the dialect that was detected during the header extraction if nothing has been set then we return an empty dictionary (dict).

property has_header#: Should the input file be expected to have a header (bool).

property header_is_known#: Flag to indicate if the header is known to the file. This is can be used to check if the header has been read from the file or defined by the user without having to call file.header which may instigate a call to get the header from the file (if file_check is True) (bool).

property header#: Get the file header. If the file is not expected to have a header then this will be a list of column numbers the sample length as the first row (list of (str or int)).

property normaliser#: Get the normaliser object (should be available after column_map is set) (gwas_norm.normalise.Normaliser or NoneType).

property compression#: Get the compression (str).

property skiplines#: Get the compression (int).

property chrpos_spec#: Get the chrpos_spec (gwas_norm.normalise.ChrPosSpec or NoneType).

property column_map#: Get the whole column mapping dictionary. Note this will always return a copy of the column map dictionary (dict).

property md5_chksum#: Get the MD5 checksum for the file (str).

property is_checked#: Has the MD5 of the file been checked (bool).

property absolute_path#

Get the absolute path to the file (str).

Raises:: FileNotFoundError – If no parent object has been bound to the file object.

property basename#: Get the basename to the gwas file (str).

property relative_path#: Get the relative path to the gwas file (str).

property keys#: Get any defined key columns, this will be an empty list of none have been defined (list of gwas_norm.metadata.column.Column)

property info#: Get any other info definitions or columns. These are set as info elements in the XML and differ from info_columns, that are attributes set on the column mappings (gwas_norm.metadata.info.Info)

property info_columns#: Return any info columns that have been defined in the file column mappings, or any key columns. If none have been defined this will be an empty list. (list of (gwas_norm.metadata.column.MappingColumn or gwas_norm.metadata.column.Column)).

set_header(header)#

An explicit setter for the header.

Note it is not recommended to set the header directly but might be useful if you do not have access to the GWAS files directly when you are building the XML files. This will only work if file checking is disabled.

Parameters:

header (list of (str or int)) – The header for the file. If has_header is False this should be a list of ints.

Raises:

ValueError – If file_check is True.
TypeError – If the header is not all ints (has_header=False) or strings (has_header=True).

check_file()#

Perform a battery of checks on the file.

Raises:: ValueError – If the user has defined the MD5 and it does not match that which is calculated by the file checks.

Notes

This is designed to highlight any differences between the spec of the file as defined in the file object and those of the actual file itself. First the compression is tested to see if the file is GZIP. The MD5 of the file is then checked. If not defined in the file object then it is set. if it is defined then it is checked for consistency.

If any inconsistencies are detected then warnings are issued rather than errors, i.e. we will believe the user as header detection is approximate.

bind(parent)#

Bind a gwas file object with a parent object.

Parameters:: parent (gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.StudyFile) – A parental object.
Raises:: KeyError – If the study is already bound to a different study/analysis object.

Notes

The binding instigates a reciprocal adding of the file object to the parent.

unbind()#

Unbind a gwas file from a parent object. This also removes the gwas_file from the parent.

Returns:: parent – The unbound parental object.
Return type:: gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.StudyFile

to_xml()#

Convert the gwas file and all of it’s attributes to an XML element.

Returns:: gwas_file_element – A file element built from the gwas file object and it’s attributes.
Return type:: lxml.etree.Element

classmethod from_xml(element, **kwargs)#

Parse the data from an XML element (parsed using lxml.etree).

Parameters:: element (lxml.Element) – An lxml element that must have the root name file
Returns:: file_element – A file object parsed from the XML.
Return type:: gwas_norm.metadata.file.GwasFile.
Raises:: KeyError – If the tag name of the root element is not file or key_file.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: file_class – The relevant file class for the element.
Return type:: class of gwas_norm.metadata.file.GwasFile
Raises:: KeyError – If the element does not have the tag name file or key_file.

`gwas_norm.metadata.test`#

Enable test GWAS associations to be given in the XML. When encountered, after normalisation these are compared to the normalisation output to give a report on if they are the expected values.

class gwas_norm.metadata.test.Test(chr_name, start_pos, effect_type, effect_size, effect_allele, other_allele=None, standard_error=None, pvalue=None, pvalue_logged=False, var_id=None)#

Bases: _XmlBase

A class representing Test objects and moving them to/from XML.

Parameters:

chr_name (str) – The chromosome name to be compared. Whilst this is tested by definition it will always pass as this is the way the tests are selected.
start_pos (int) – The start position to be compared. Whilst this is tested by definition it will always pass as this is the way the tests are selected.
effect_type (str) – The effect_type that will be used to transform the effect_size so it can be compared, valid values are, beta, log_or and or. Odds ratios are log transformed.
effect_size (float) – The effect size that will be compared with the data in the effect_size column.
effect_allele (str) – The effect allele that will be compared to the data in the effect_allele column.
other_allele (str, optional, default: NoneType) – The other allele that will be compared to the data in the other_allele column
standard_error (float, optional, default: NoneType) – The standard error that will be compared to the data in the standard_error column
pvalue (float or str, optional, default: NoneType) – The p-value that will be compared to the data in the pvalue column. This is expected to be a float but could be a string if precision is an issue. The pvalues are -log10 transformed internally using the decimal package.
pvalue_logged (bool, optional, default: False) – Is the pvalue -log10 transformed. Even if False, the pvalue test is carried out on -log10 transformed pvalues.
var_id (str, optional, default: NoneType) – The variant ID. Typically, this could be a variant rsID.

Notes

Input/Output XML looks like this:

<test>
    <id/>
    <effect_allele/>
    <effect_size/>
    <effect_type/>
    <standard_error/>
    <pvalue/>
</test>

This also has methods for performing tests against the expected values in the Test object.

ROOT_TAG = 'test'#: The name of the root XML element tag name (str)

CHR_NAME_TAG = 'chr_name'#: The tag name for the chromosome name in the XML (str).

START_POS_TAG = 'start_pos'#: The tag name for the start position in the XML (str).

EFFECT_TYPE_TAG = 'effect_type'#: The tag name for the effect type in the XML (str).

EFFECT_SIZE_TAG = 'effect_size'#: The tag name for the effect size in the XML (str).

EFFECT_ALLELE_TAG = 'effect_allele'#: The tag name for the effect allele in the XML (str).

VAR_ID_TAG = 'var_id'#: The tag name for the variant ID in the XML (str).

OTHER_ALLELE_TAG = 'other_allele'#: The tag name for the other allele in the XML (str).

STANDARD_ERROR_TAG = 'standard_error'#: The tag name for the standard error in the XML (str).

PVALUE_TAG = 'pvalue'#: The tag name for the pvalue in the XML (str).

PVALUE_LOGGED_TAG = 'pvalue_logged'#: The tag name for the pvalue is logged in the XML (str).

EFFECT_SIZE_DELTA = 0.0001#: Allowed difference between the expected effect size and the normalised one (float)

STANDARD_ERROR_DELTA = 0.0001#: Allowed difference between the expected standard error and the normalised one (float)

LOG10_PVALUE_DELTA = 0.05#: Allowed difference between the expected -log10(pvalue) and the normalised one (float)

TEST_PASS = 'PASS'#: Constant for a test pass (str)

TEST_FAIL = 'FAIL'#: Constant for a test fail (str)

class TestResult(test_id, test_type, expected_value, observed_value, delta, test_outcome)#

Bases: tuple

A container for the result of a test (namedtuple)

count(value, /)#: Return number of occurrences of value.

delta#: Alias for field number 4

expected_value#: Alias for field number 2

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

observed_value#: Alias for field number 3

test_id#: Alias for field number 0

test_outcome#: Alias for field number 5

test_type#: Alias for field number 1

test_row(row)#

Apply the test to a row.

Parameters:: row (list of str or int) – The row to use for test lookup. This should have the elements represented in the constants.STANDARD_COLUMNS (in exactly the same order).
Returns:: test_results – A list of the results from all the comparisons.
Return type:: list of gwas_norm.metadata.test.Test.TestResult

test_chr_name(chr_name)#

Test the chromosome name against the expected value.

Parameters:: chr_name (str) – The chromosome name to test.
Returns:: test_result – The test result.
Return type:: gwas_norm.metadata.test.Test.TestResult

test_start_pos(start_pos)#

Test the start position against the expected value.

Parameters:: start_pos (int) – The start position to test.
Returns:: test_result – The test result.
Return type:: gwas_norm.metadata.test.Test.TestResult

test_effects(effect_allele, other_allele, effect_size)#

Test the effect allele/other allele and effect size against the expected values.

Parameters:

effect_allele (str) – The effect allele to test.
other_allele (str) – The other (non-effect) allele to test.
effect_size (float) – The effect size to test.

Returns:

test_results – The test results.

Return type:

list of gwas_norm.metadata.test.Test.TestResult

Notes

In addition to testing the difference of the effect_size, an additional test of the effect sign is carried out. Also, this may swap the input effect/other allele and effect direction to carry out the test as there is no guarantee that the users effect allele is the same, in this way the effect size will fail if the alleles are wrong.

test_standard_error(standard_error)#

Test the standard error against the expected value.

Parameters:: standard_error (float) – The standard error to test.
Returns:: test_result – The test result.
Return type:: gwas_norm.metadata.test.Test.TestResult

test_pvalue(pvalue)#

Test the pvalue against the expected value.

Parameters:: pvalue (float) – The pvalue to test. Important, this should be -log10 transformed.
Returns:: test_result – The test result.
Return type:: gwas_norm.metadata.test.Test.TestResult

test_var_id(var_id)#

Test the variant identifier against the expected value.

Parameters:: var_id (str) – The variant identifier to test.
Returns:: test_result – The test result.
Return type:: gwas_norm.metadata.test.Test.TestResult

Notes

variant identifiers may fail quite a lot as they are re-mapped against the latest dbSNP ids.

to_xml()#

Generate a XML element for the Test object.

Returns:: test – A test element built from the Test object and it’s attributes.
Return type:: lxml.etree.Element

classmethod from_xml(element)#

Generate a gwas_norm.metadata.test_obj.Test object from the data in the XML element.

Parameters:: element (lxml.etree.Element) – The element should have the tag name test
Returns:: test – The Test object built from all the element.
Return type:: gwas_norm.metadata.test.Test
Raises:: KeyError – If the name of the element is not test

classmethod get_class(element)#

Helper method that will determine the required file class for parsing based on the root tag in the element.

Parameters:: element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name test.
Returns:: class – The relevant cohort class for the element.
Return type:: class of gwas_norm.metadata.test.Test
Raises:: KeyError – If the element does not have the required tag name.

`gwas_norm.metadata.info`#

Info columns and definitions.

class gwas_norm.metadata.info.Info(columns=None, definitions=None)#

Bases: _XmlBase

A representation of the info elements within the analysis definition.

Parameters:

columns (list of gwas_norm.metadata.column.Column, optional, default: NoneType) – The columns that will be used in the info field.
definitions (list of gwas_norm.metadata.phenotype.Definition, optional, default: NoneType) – The static definitions that will be used in the info field.

Notes

The key and info attributes on the columns added to the Info object do not have any functionality with in the Info object.

`gwas_norm.metadata.column`#

The generic column objects.

class gwas_norm.metadata.column.Column(col_name, info=False, map_to=None, dtype=None)#

Bases: _XmlBase, InfoHolderMixin

A representation of a column.

Parameters:

col_name (str) – The column name in a source (un-normalised) GWAS file.
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the column name should be known as the map_to value in the info field. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.

ROOT_TAG = 'column'#: The name of the root XML element tag (str)

property name#: Get the column name, this is an alias for Column.col_name (str)

to_xml()#

Convert the Column to an XML element.

Returns:: element – A column element built from the Column object and it’s attributes.
Return type:: lxml.etree.Element

classmethod from_xml(element)#

Parse the data from an XML column element.

Parameters:: element (lxml.Element) – An lxml element where the tag name is column.
Returns:: column – A column object that represents the XML element.
Return type:: gwas_norm.metadata.column.Column
Raises:: KeyError – If the element does not have the tag column.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: parse_class – A class inheriting from gwas_norm.metadata.column.Column.
Return type:: class
Raises:: KeyError – If the appropriate class can’t be found for the tag.

DATA_TYPE_ATTRIBUTE = 'dtype'#: The name of the data type attribute of the column (str)

INFO_ATTRIBUTE = 'info'#: The name of the info attribute of the column (str)

MAP_TO_ATTRIBUTE = 'map_to'#: The name of the key attribute of the column (str)

property dstruct#: Get the data structure value. C is a scalar. A is an array, (str).

property dtype#: Get the dtype value. S is a string value. F is a float, I is an integer (str).

equals(other)#: Determine equality against another InfoHolderMixin containing object. This is based on map_to, dtype and dstruct values matching

classmethod get_attributes(element)#

Get the attributes from an XML element.

Parameters:

element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.

Returns:

info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the map_to value in the info field and not as the name.
dtype (str) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array.

property info#: Get the is info output value (bool).

init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#

Initialise all of the info related values for the mixin.

Parameters:

info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the map_to value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.
all_info_false (bool, optional, default: False) – If this is set to True, then if info is False and the map_to is defined, then map_to it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.

property map_to#: Get the column name remapping value (str or NoneType).

set_attributes(element)#

Set the attributes into an XML element.

Parameters:: element (lxml.etree.Element) – The element to add the attributes.

class gwas_norm.metadata.column.MappingColumn(col_name, map_to_name, **kwargs)#

Bases: Column

A representation of a mapping column.

Parameters:

col_name (str) – The gwas-norm column name that is being mapped to, this will be set to the element tag name.
map_to_name (str) – The column name in a source (un-normalised) GWAS file that is being mapped to the col_name.
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the column name should be known as the map_to value in the info field. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.

ROOT_TAG = 'mapping_column'#: The name of the root XML element tag (str)

to_xml()#

Convert the MappingColumn to an XML element.

Returns:: element – A column element built from the MappingColumn object and it’s attributes.
Return type:: lxml.etree.Element

DATA_TYPE_ATTRIBUTE = 'dtype'#: The name of the data type attribute of the column (str)

INFO_ATTRIBUTE = 'info'#: The name of the info attribute of the column (str)

MAP_TO_ATTRIBUTE = 'map_to'#: The name of the key attribute of the column (str)

property dstruct#: Get the data structure value. C is a scalar. A is an array, (str).

property dtype#: Get the dtype value. S is a string value. F is a float, I is an integer (str).

equals(other)#: Determine equality against another InfoHolderMixin containing object. This is based on map_to, dtype and dstruct values matching

classmethod from_xml(element)#

Parse the data from an XML mapping column element.

Parameters:: element (lxml.Element) – An lxml element where the tag name is the name of a gwas-norm column.
Returns:: column – A mapping column object that represents the XML element.
Return type:: gwas_norm.metadata.column.MappingColumn

classmethod get_attributes(element)#

Get the attributes from an XML element.

Parameters:

element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.

Returns:

info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the map_to value in the info field and not as the name.
dtype (str) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array.

property info#: Get the is info output value (bool).

init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#

Initialise all of the info related values for the mixin.

Parameters:

info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the map_to value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.
dtype (str, optional, default: NoneType) – The datatype definition string. S is a string value. F is a float, I is an integer. A represents an array and C a scalar. so SA would be a string array. NoneType is interpreted as an SC.
all_info_false (bool, optional, default: False) – If this is set to True, then if info is False and the map_to is defined, then map_to it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.

property map_to#: Get the column name remapping value (str or NoneType).

property name#: Get the column name, this is an alias for Column.col_name (str)

set_attributes(element)#

Set the attributes into an XML element.

Parameters:: element (lxml.etree.Element) – The element to add the attributes.

classmethod get_class(element)#

Get the appropriate parse class for the XML element tag.

Parameters:: element (lxml.etree.Element) – The element to check against.
Returns:: parse_class – A class inheriting from gwas_norm.metadata.column.MappingColumn.
Return type:: class

Notes

This will always return gwas_norm.metadata.column.MappingColumn.

`gwas_norm.metadata.convert`#

gwas_norm.metadata.convert.convert_xml(old_xml, new_xml)#

Convert an old-style XML file to a new style XML file.

Parameters:

old_xml (str) – The old style XML file, can be compressed.
new_xml (str) – The new style output XML file, will be compressed if the extension is .gz.
verbose (bool, optional, default: False) – Log output

gwas_norm.metadata sub-package#

gwas_norm.metadata.gwas_data#

gwas_norm.metadata.study#

gwas_norm.metadata.analysis#

gwas_norm.metadata.phenotype#

gwas_norm.metadata.cohort#

gwas_norm.metadata.file#

gwas_norm.metadata.test#

gwas_norm.metadata.info#

gwas_norm.metadata.column#

gwas_norm.metadata.convert#

`gwas_norm.metadata` sub-package#

`gwas_norm.metadata.gwas_data`#

`gwas_norm.metadata.study`#

`gwas_norm.metadata.analysis`#

`gwas_norm.metadata.phenotype`#

`gwas_norm.metadata.cohort`#

`gwas_norm.metadata.file`#

`gwas_norm.metadata.test`#

`gwas_norm.metadata.info`#

`gwas_norm.metadata.column`#

`gwas_norm.metadata.convert`#