gwas_norm.metadata
sub-package#
gwas_norm.metadata.gwas_data
#
The root of the XML metadata.
- class gwas_norm.metadata.gwas_data.GwasData(studies=None, metadata_file=None, file_check=True, root_source_dir=None, root_norm_dir=None)#
Bases:
_XmlBase
The root class for describing GWAS metadata.
- Parameters:
studies (list of gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile optional, default: NoneType) – Any existing study objects that need to be added to the
GwasData
object during initialisation.metadata_file (str or File, optional, default: NoneType) – The path to an existing metadata file or a previously opened file that will be added to the
GwasData
object.file_check (bool, optional, default: True) – Perform checks on the presence of input files and calculate the MD5 of any input files. Please note if the MD5 is not added to the any files then this will give an error.
root_source_dir (str, optional, default: NoneType) – The root directory where all source study data is located. If not set then the environment variable
GWAS_SOURCE_DATA_ROOT
is used. However, this is only used/required if thestudy_source_dir
is a relative path.root_norm_dir (list or str, optional, default: NoneType) – The root directory where all normalised study data is located. If not set then the environment variable
GWAS_DEST_DATA_ROOT
is used. However, this is only used/required if thestudy_norm_dir
is a relative path.
Notes
Has functionality for adding study elements, reading and writing metadata description files (XML format). When initialising, both studies and an existing metadata file can be given and all will be added to the initialised
gwas_norm.metadata.gwas_data.GwasData
object.- ROOT_TAG = 'gwas_data'#
The name of the root XML element tag name (str)
- property n_studies#
Return the number of studies (a synonym for len()) (int)
- get_study_by_name(name)#
Return a study with the name.
- Parameters:
name (str) – The study name to get.
- Returns:
study – The study matching the name.
- Return type:
gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile
- Raises:
KeyError – If a study with that name does not exist.
- get_analysis_by_name(name)#
Return an analysis with the name.
- Parameters:
name (str) – The analysis name to get.
- Returns:
analysis – The analyses matching the name.
- Return type:
list of gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.KeyAnalysis
- Raises:
KeyError – If an analysis with that name does not exist.
- property n_analyses#
Return the total number of analyses (int)
- property root_source_dir#
Get the source root directory (str or NoneType).
Notes
This is a root directory where all the study directories to be normalised should be located. Must be an absolute directory. If it is not defined then the environment variable
GWAS_SOURCE_DATA_ROOT
is returned (if defined).
- property root_norm_dir#
Get the normalised root directory (str or NoneType).
Notes
This is a root directory where all the study directories that have been normalised should be located. Must be an absolute directory. If it is not defined then the environment variable
GWAS_DEST_DATA_ROOT
is returned (if defined).
- property file_check#
Get the file checking status (bool).
- property studies#
Return a list of all the study objects in the
GwasData
object. (list of gwas_norm.metadata.study.Study)
- add_study(study, error=False)#
Add a study object to the
GwasData
object.- Parameters:
study (gwas_norm.metadata.study.Study) – A study to add, must be an instance of subclass of gwas_norm.metadata.study.Study
error (bool, optional, default: False) – Raise a KeyError if an existing GwasData study has the same name as the one being added. If set to False then the addition will silently fail
- Raises:
KeyError – If the study being added has the same name as an existing study in the GwasData object.
Notes
This instigates a binding process where the
GwasData
object is also bound as a parent of the study object.
- remove_study(study)#
Remove a study from the
GwasData
object.- Parameters:
study (gwas_norm.metadata.study.Study) – The study to remove, note that this does not have to be the exact same object. The removal is done based on the study name.
- Returns:
study – The actual
gwas_norm.metadata.study.Study
object that has been removed from the GwasData object. If no matching Study object was found then NoneType is returned.- Return type:
gwas_norm.metadata.study.Study or NoneType
- read(infile)#
Read an input metadata file.
- Parameters:
infile (str) – An input metadata file name.
- Raises:
KeyError – If a study of the same name already exists in the GwasData object or if no study elements were found in the file.
ValueError – If the file extension is unknown (i.e. not
.xml
)
- write(outfile)#
Write all studies to an output XML file.
- Parameters:
outfile (str) – The output file path to write to, currently only XML is supported and the output file must have a .xml file extension.
- Raises:
IndexError – If no Study objects were found in the GwasData object
- to_xml()#
Generate an
lxml.etree.Element
object for all the attributes within the gwas data object.- Returns:
gwas_data_element – An element representing the gwas_data.
- Return type:
lxml.etree.Element
- classmethod from_xml(element, **kwargs)#
load data from an already available XML element.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name
gwas_data
.- Returns:
gwas_data – A GwasData object built from the element.
- Return type:
gwas_norm.metadata.gwas_data.GwasData
- Raises:
KeyError – If the tag name of the element is not
gwas_data
.
Notes
In general, the user should use the object read method.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
parse_class – A class of type gwas_norm.metadata.gwas_data.GwasData
- Return type:
class
- Raises:
KeyError – If the appropriate class can’t be found for the tag.
gwas_norm.metadata.study
#
Implementation of Study classes.
- class gwas_norm.metadata.study.Study(*args, file_check=True, **kwargs)#
Bases:
_BaseStudy
The study class, for use in studies where the analyses are contained in separate files (
gwas_norm.metadata.analysis.AnalysisFile
objects).- Parameters:
study_name (str) – The name of the study, this should be unique for each study. This will be converted to all lowercase and have spaces replaced with underscores.
study_source_dir (str) – The root directory of the study that contains the un-normalised source files. If it is a relative path then it is assumed it is relative to the root directory in the parent
gwas_norm.metadata.gwas_data.GwasData
, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.study_norm_dir (str) – The root directory of the study that contains the normalised files. If it is a relative path then it is assumed it is relative to the root directory in the parent
gwas_norm.metadata.gwas_data.GwasData
, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.source_genome_assembly (str) – The genome assembly of the study (and by extension, analyses within the study).
target_genome_assemblies (#) –
need (# Any target genome assemblies. only required if any liftovers) –
out. (# to be carried) –
pubmed_id (int or NoneType, optional, default: NoneType) – The pubmed identifier. If
NoneType
then a dummy pubmed ID of00000000
is used instead.consortium (str or NoneType, optional, default: NoneType) – Any consortium name for the study.
analyses (list of (gwas_norm.metadata.analysis.Analysis or gwas_norm.metadata.analysis.KeyAnalysis), optional, default: NoneType) – Analysis objects that are associated with the study.
url (str, optional, default: NoneType) – The info url to be associated with the study.
metafiles (list or str, optional, default: NoneType) – The metafile to be associated with the study.
- Raises:
ValueError – If any of the
study_name
,source_root
orsource_genome_assembly
areNoneType
or''
.
- CONSORTIUM_TAG = 'consortium'#
The XML tag for the consortium data (str).
- METAFILE_TAG = 'metafile'#
The XML tag for the metafile data (str).
- PRIVATE_ATTRIBUTE = 'private'#
The XML attribute for the private status (str).
- PUBMED_ID_TAG = 'pubmed_id'#
The XML tag for the pubmed ID data (str).
- SOURCE_GENOME_ASSEMLY_TAG = 'source_genome_assembly'#
The XML tag for the source genome assembly data (str).
- STUDY_ID_TAG = 'study_id'#
The XML tag name for the study ID (str).
- STUDY_NAME_TAG = 'study_name'#
The XML tag name for the study name (str).
- STUDY_NORM_DIR_TAG = 'study_norm_dir'#
The XML tag for the study normalised directory (str).
- STUDY_SOURCE_DIR_TAG = 'study_source_dir'#
The XML tag name for the study source directory (str).
- URL_TAG = 'url'#
The XML tag for the study webpage (str).
- add_metafile(metafile)#
Add a metafile to the study.
- Parameters:
metafile (str) – The metafile path. It can be an absolute or a relative path. In the later case, it is converted to an absolute path.
- Raises:
ValueError – If a metadata file with the same path already exists.
- property analyses#
Get all the analyses within the study list of (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis)
- Raises:
AttributeError – If no analyses have been defined.
- bind(parent)#
Bind a study with a parent
gwas_norm.metadata.gwas_data.GwasData
object.- Parameters:
parent (gwas_norm.metadata.gwas_data_obj.GwasData) – A parental
GwasData
object.- Raises:
KeyError – If the study is already bound to a different
GwasData
object.
Notes
The binding instigates a reciprocal adding of the study object to the
GwasData
studies, so binding is in both directions. A study can only be bound to a singleGwasData
parent.
- check_old_analysis_ids()#
Perform a check of old analysis IDs and warn of there are duplicated IDs.
- Parameters:
s (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) –
- property chksum#
Get the BSD checksum based on the
study_name
(str).
- create_analysis_xml(study_e)#
Convert all of the analyses within the study to XML elements.
- Parameters:
study_e (lxml.etree.Element) – The study element to add the converted analyses to.
- Returns:
study – The same study element that was passed with the analyses added.
- Return type:
lxml.etree.Element
- Raises:
ValueError – If there are no analyses within the study.
- get_analysis_by_name(name)#
Return an analysis with the name.
- Parameters:
name (str) – The analysis name to get.
- Returns:
analysis – The analysis matching the name.
- Return type:
gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
- Raises:
KeyError – If an analysis with that name does not exist.
- classmethod get_class(element)#
Helper function that will determine the required study class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A XML element, it is expected to have the tag name
study
orstudy_file
.- Returns:
study_class – The relevant study class for the element.
- Return type:
class of (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)
- Raises:
KeyError – If the element does not have the tag name
study
orstudy_file
.
- property info#
Get the info object stored in the study (gwas_norm.metadata.info.Info).
- invalidate()#
Invalidate the study.
- invalidate_analyses()#
Invalidate all the analyses in the study. This will also invalidate the study.
- property is_validated#
Determine if the study has been validated (bool).
- property metafile_abspath#
Get the absolute paths of any metafiles associated with the study (list of str).
- property metafiles#
Get any metafile paths associated with the study (list of str).
- property name#
Get the study name. This is an alias for
study_name
- that is designed to be a common property across studies and analysis (str).
- property parent#
Get study parent (gwas_norm.metadata.gwas_data.GwasData).
- Raises:
AttributeError – If no parent gwas data object has been defined.
- property private#
Get the is private status of the study (bool)
- property pubmed_id#
Get the pubmed ID (str).
Notes
The
pubmed_id
, is treated as a string but should be castable to anint
. If the pubmed_id is not known a dummy pubmed_id of00000000
is used.
- refresh_analysis_data()#
The study keeps an internal cache of analyis names/IDs. This will loop through all the analyses within the study and refresh these.
Notes
Typically, this will be called by an analysis when it has it’s name /ID changed.
- remove_analysis(analysis_obj)#
Remove an analysis from the study.
- Parameters:
analyses (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis) – The analyses being removed from the study.
- Returns:
removed_analysis – The analysis removed or NoneType if the analysis does not exist in the study.
- Return type:
gwas_norm.metadata.analysis_obj.AnalysisFile or gwas_norm.metadata.analysis_obj.KeyAnalysis or NoneType
- repr_attr_str()#
Generate an array of strings that can be used in to print an objects contents.
- Returns:
attrs – key/value strings representing the contents of the objects.
- Return type:
list of str
- property source_genome_assembly#
Get the source genome assembly (str).
- property study_id#
Get the ID for the study, if not set then an ID will be generated (int)
- property study_name#
Get the study name. Study names are made lowercase and have the spaces replaced with _ (underscores) (str).
- property study_norm_absolute_dir#
Return the absolute directory path for the normalised study. Irrespective of if it has been set via a relative path (str).
- Raises:
FileNotFoundError – If the
study_norm_dir
is a relative path and no root path is available from the parent.
- property study_norm_dir#
Get the study normalised directory path (str).
- property study_source_absolute_dir#
Return the absolute directory path for the study source directory. Irrespective of if it has been set via a relative path (str).
- Raises:
FileNotFoundError – If the
study_source_dir
is a relative path and no root path is available in the parent.
- property study_source_dir#
Get the study source root directory (str).
- unbind()#
Bind a study with a parent
GwasData
object. This also removes the study from the parent.- Returns:
parent – The parent object that has been unbound.
- Return type:
gwas_norm.metadata.gwas_data_obj.GwasData
- ROOT_TAG = 'study'#
The name of the root XML element tag name (str)
- n_file_holders()#
Get the number of file holder objects contained in the study (int)
- property file_check#
Get the file checking status (bool).
- validate()#
Validate the study. This respects the file checking parameter.
- add_analysis(analysis_obj)#
Add an analyses to the study. This causes a reciprocal bind on the analysis.
- Parameters:
analyses_obj (gwas_norm.metadata.analysis_obj.KeyAnalysis) – The analyses being added to the study.
- Raises:
KeyError – If the analysis already exists in the study.
- to_xml()#
Convert the study and all of it’s attributes to an XML element.
- Returns:
study – A study element built from the study object and it’s attributes.
- Return type:
lxml.etree.Element
- classmethod from_xml(element, **kwargs)#
Generate a
gwas_norm.metadata.study.Study
object from anlxml.etree.Element
with the tag namestudy
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
study
.- Returns:
study – A study object built from all the tags in the study element.
- Return type:
gwas_norm.metadata.study_obj.Study
- Raises:
KeyError – If the name of the
element
is not expected. Also, if thesource_genome_assembly
attribute is not defined. Or if there are no analysis elements associated with the study.
- class gwas_norm.metadata.study.StudyFile(study_name, study_source_dir, source_genome_assembly, analysis_type, effect_type, units=None, files=None, cohort=None, file_check=True, **kwargs)#
Bases:
_BaseStudy
,FileHolderMixin
A representation of a study object where all of the analyses are in a single file (
gwas_norm.metadata.analysis.KeyAnalysis
objects).- Parameters:
study_name (str) – The name of the study, this should be unique for each study. This will be converted to all lowercase and have spaces replaced with underscores.
study_source_dir (str) – The root directory of the study that contains the un-normalised source files. If it is a relative path then it is assumed it is relative to the root directory in the parent
gwas_norm.metadata.gwas_data.GwasData
, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.study_norm_dir (str) – The root directory of the study that contains the normalised files. If it is a relative path then it is assumed it is relative to the root directory in the parent
gwas_norm.metadata.gwas_data.GwasData
, either via an environment variable or explicitly set. If the parent has not been set and this is relative then an error will be raised if XML is output of the absolute path is requested.source_genome_assembly (str) – The genome assembly of the study (and by extension, analyses within the study).
analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
pubmed_id (int or NoneType, optional, default: NoneType) – The pubmed identifier. If
NoneType
then a dummy pubmed ID of00000000
is used instead.consortium (str or NoneType, optional, default: NoneType) – Any consortium name for the study.
analyses (list of (gwas_norm.metadata.analysis.Analysis or gwas_norm.metadata.analysis.KeyAnalysis), optional, default: NoneType) – Analysis objects that are associated with the study.
url (str, optional, default: NoneType) – The info url to be associated with the study.
metafiles (list or str, optional, default: NoneType) – The metafile to be associated with the study.
units (str, optional, default: NoneType) – The units of all the analysis within the
StudyFile
.files (list of gwas_norm.metadata.file.GwasFile, optional, default: NoneType) – Study level files. Study level files are for data such as GTEX data where multiple gene analysis are in the same file.
cohort (gwas_norm.metadata.cohort_obj.Cohort or gwas_norm.metadata.cohort_obj.CaseControlCohort or gwas_norm.metadata.cohort_obj.SampleCohort), optional, default: NoneType) – The cohort that the study was performed in.
- Raises:
ValueError – If any of the
study_name
,source_root
orsource_genome_assembly
areNoneType
or''
.
- ANALYSIS_TYPE_TAG = 'analysis_type'#
The name of the analysis type tag/element in the XML file (str).
- CONSORTIUM_TAG = 'consortium'#
The XML tag for the consortium data (str).
- EFFECT_TYPE_TAG = 'effect_type'#
The name of the effect type tag/element in the XML file (str).
- FILE_CLASS = None#
The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)
- METAFILE_TAG = 'metafile'#
The XML tag for the metafile data (str).
- PRIVATE_ATTRIBUTE = 'private'#
The XML attribute for the private status (str).
- PUBMED_ID_TAG = 'pubmed_id'#
The XML tag for the pubmed ID data (str).
- SOURCE_GENOME_ASSEMLY_TAG = 'source_genome_assembly'#
The XML tag for the source genome assembly data (str).
- STUDY_ID_TAG = 'study_id'#
The XML tag name for the study ID (str).
- STUDY_NAME_TAG = 'study_name'#
The XML tag name for the study name (str).
- STUDY_NORM_DIR_TAG = 'study_norm_dir'#
The XML tag for the study normalised directory (str).
- STUDY_SOURCE_DIR_TAG = 'study_source_dir'#
The XML tag name for the study source directory (str).
- UNITS_TAG = 'units'#
The name of the units tag/element in the XML file (str).
- URL_TAG = 'url'#
The XML tag for the study webpage (str).
- add_file(gwas_file, error=False)#
Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.
- Parameters:
gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.
- Raises:
- add_metafile(metafile)#
Add a metafile to the study.
- Parameters:
metafile (str) – The metafile path. It can be an absolute or a relative path. In the later case, it is converted to an absolute path.
- Raises:
ValueError – If a metadata file with the same path already exists.
- property analyses#
Get all the analyses within the study list of (gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis)
- Raises:
AttributeError – If no analyses have been defined.
- property analysis_type#
Get the analysis type (str).
- bind(parent)#
Bind a study with a parent
gwas_norm.metadata.gwas_data.GwasData
object.- Parameters:
parent (gwas_norm.metadata.gwas_data_obj.GwasData) – A parental
GwasData
object.- Raises:
KeyError – If the study is already bound to a different
GwasData
object.
Notes
The binding instigates a reciprocal adding of the study object to the
GwasData
studies, so binding is in both directions. A study can only be bound to a singleGwasData
parent.
- check_old_analysis_ids()#
Perform a check of old analysis IDs and warn of there are duplicated IDs.
- Parameters:
s (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) –
- property chksum#
Get the BSD checksum based on the
study_name
(str).
- property cohort#
Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).
Notes
The cohort associated with the analysis or
NoneType
if no cohort has been set.
- create_analysis_type_xml(element)#
Generate all the analysis type XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_analysis_xml(study_e)#
Convert all of the analyses within the study to XML elements.
- Parameters:
study_e (lxml.etree.Element) – The study element to add the converted analyses to.
- Returns:
study – The same study element that was passed with the analyses added.
- Return type:
lxml.etree.Element
- Raises:
ValueError – If there are no analyses within the study.
- create_cohort_xml(element)#
create cohort specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_effect_type_xml(element)#
Generate the XML element for the effect type.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_files_xml(element)#
create file specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_units_xml(element)#
Generate the units XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_xml(element)#
Generate all the XML elements relating to objects that hold files. This wraps all other
create_*
methods in the mixin.- Parameters:
element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.
- property effect_type#
Get the effect type (str or NoneType).
- property file_check#
Get the file checking status (bool).
- file_repr_attr_str()#
Called by the
__repr__
of host objects to supply a key=value string of the attributes and their values relating to the mixin.- Returns:
attr_str – Each string is an attribute and value for printing.
- Return type:
list of str
- property files#
Get all the associated files (list of gwas_norm.metadata.file.GwasFile).
- get_analysis_by_name(name)#
Return an analysis with the name.
- Parameters:
name (str) – The analysis name to get.
- Returns:
analysis – The analysis matching the name.
- Return type:
gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
- Raises:
KeyError – If an analysis with that name does not exist.
- classmethod get_class(element)#
Helper function that will determine the required study class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A XML element, it is expected to have the tag name
study
orstudy_file
.- Returns:
study_class – The relevant study class for the element.
- Return type:
class of (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile)
- Raises:
KeyError – If the element does not have the tag name
study
orstudy_file
.
- property info#
Get the info object stored in the study (gwas_norm.metadata.info.Info).
- init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#
Initialise all the attributes that a file handling object needs.
- Parameters:
analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.
Notes
This is usually called from the
__init__
method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.
- invalidate()#
Invalidate the study.
- invalidate_analyses()#
Invalidate all the analyses in the study. This will also invalidate the study.
- property is_validated#
Determine if the study has been validated (bool).
- property metafile_abspath#
Get the absolute paths of any metafiles associated with the study (list of str).
- property metafiles#
Get any metafile paths associated with the study (list of str).
- property n_files#
Get the number of associated files (int).
- property name#
Get the study name. This is an alias for
study_name
- that is designed to be a common property across studies and analysis (str).
- property parent#
Get study parent (gwas_norm.metadata.gwas_data.GwasData).
- Raises:
AttributeError – If no parent gwas data object has been defined.
- classmethod parse_files(element, **kwargs)#
Parse any file elements out from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
- Returns:
gwas_files – GWAS file objects.
- Return type:
list of (gwas_norm.metadata.file.GwasFile)
- Raises:
KeyError – If no file elements can be found in the parent element.
- classmethod parse_xml(element, **kwargs)#
Parse the file associated data from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse the elements from.
- Returns:
analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.
- Raises:
KeyError – If any of the required elements can be found in the parent element.
- property private#
Get the is private status of the study (bool)
- property pubmed_id#
Get the pubmed ID (str).
Notes
The
pubmed_id
, is treated as a string but should be castable to anint
. If the pubmed_id is not known a dummy pubmed_id of00000000
is used.
- refresh_analysis_data()#
The study keeps an internal cache of analyis names/IDs. This will loop through all the analyses within the study and refresh these.
Notes
Typically, this will be called by an analysis when it has it’s name /ID changed.
- remove_files(gwas_files)#
Remove one or more gwas file from this object.
- Parameters:
gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.
- repr_attr_str()#
Generate an array of strings that can be used in to print an objects contents.
- Returns:
attrs – key/value strings representing the contents of the objects.
- Return type:
list of str
- property source_genome_assembly#
Get the source genome assembly (str).
- property study_id#
Get the ID for the study, if not set then an ID will be generated (int)
- property study_name#
Get the study name. Study names are made lowercase and have the spaces replaced with _ (underscores) (str).
- property study_norm_absolute_dir#
Return the absolute directory path for the normalised study. Irrespective of if it has been set via a relative path (str).
- Raises:
FileNotFoundError – If the
study_norm_dir
is a relative path and no root path is available from the parent.
- property study_norm_dir#
Get the study normalised directory path (str).
- property study_source_absolute_dir#
Return the absolute directory path for the study source directory. Irrespective of if it has been set via a relative path (str).
- Raises:
FileNotFoundError – If the
study_source_dir
is a relative path and no root path is available in the parent.
- property study_source_dir#
Get the study source root directory (str).
- unbind()#
Bind a study with a parent
GwasData
object. This also removes the study from the parent.- Returns:
parent – The parent object that has been unbound.
- Return type:
gwas_norm.metadata.gwas_data_obj.GwasData
- property units#
Get the units (str or NoneType)
- ROOT_TAG = 'study_file'#
The name of the root XML element tag name (str)
- validate()#
Validate the study. This ensures all of the component analyses are validated.
- n_file_holders()#
Get the number of file holder objects contained in the study (int)
- on_file_added(file_obj, **kwargs)#
Callback for when a file has been added.
- Parameters:
file_obj (gwas_norm.metadata.file.GwasFile) – The file object being added.
- on_files_removed(file_obj, **kwargs)#
Callback for when a file has been removed at the study level.
This is an expensive operation as all the KeyAnalysis objects are checked for validity against remaining files.
- Parameters:
file_objs (list of gwas_norm.metadata.file.GwasFile) – The file objects being removed.
- Raises:
KeyError – If the remaining files
- add_analysis(analysis_obj)#
Callback for when an analysis has been added.
- Parameters:
analysis_obj (gwas_norm.metadata.analysis.KeyAnalysis) – The file object being added.
- Returns:
added – An indicator if the analysis was added to the study.
- Return type:
bool
- remove_analysis(analysis_obj)#
Callback for when an analysis has been added.
- Parameters:
analysis_obj (gwas_norm.metadata.analysis.KeyAnalysis) – The file object being added.
- Returns:
removed – The analysis that was removed.
- Return type:
gwas_norm.metadata.analysis.KeyAnalysis
- to_xml()#
Convert the study file and all of it’s attributes to an XML element.
- Returns:
study – A study element built from the study object and it’s attributes.
- Return type:
lxml.etree.Element
- Raises:
IndexError – If the number of files associated with the study is 0.
- classmethod from_xml(element, **kwargs)#
Generate a gwas_norm.metadata.study.StudyFile object from an lxml.etree.Element with the tag name study_file.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name study.
- Returns:
study – A study file object built from all the tags in the
study_file
element.- Return type:
gwas_norm.metadata.study.StudyFile
- Raises:
KeyError – If the name of the element is not expected. Also, if the source_genome_assembly attribute is not defined. Or if there are no analysis elements associated with the study.
gwas_norm.metadata.analysis
#
Classes representing analyses.
- class gwas_norm.metadata.analysis.KeyAnalysis(analysis_name, keys=None, **kwargs)#
Bases:
_BaseAnalysis
A representation of a ‘keyed’ analysis.
This is an analysis type that is not associated with any files and has one or more key values that flag the respective rows in a StudyFile.
- Parameters:
analysis_name (str) – The name of the analysis. All analysis names are made lowercase and have spaces substituted for underscores.
keys (list of (gwas_norm.metadata.column.Column, str)) – One or more values that will uniquely ID rows belonging to analysis the within the parent study file. The first element of the nested tuple is the column and the second is the column value.
phenotype (gwas_norm.metadata.phenotype.Phenotype, optional, default: NoneType) – The phenotype description associated with the analysis.
caveat (gwas_norm.metadata.phenotype.Caveat, optional, default: NoneType) – The caveat description associated with the analysis.
tests (list of gwas_norm.metadata.test.Test, optional, default: NoneType) – One of more tests that should be applied to the analysis.
info (gwas_norm.metadata.info.Info, optional, default: NoneType) – Columns or definitions that represent the info data for the analysis.
- ROOT_TAG = 'key_analysis'#
The name of the root XML element tag name (str)
- KEY_TAG = 'key'#
The XML tag name for analysis key values (str).
- ANALYSIS_ID_TAG = 'analysis_id'#
The XML tag name for the analysis ID (str)
- ANALYSIS_NAME_TAG = 'analysis_name'#
The XML tag name for the analysis name (str)
- add_test(test)#
Add a test element to the analysis.
- Parameters:
test (gwas_norm.metadata.test.Test) – The test to add.
- property analysis_id#
Get the ID for the analysis, if not set then an ID will be generated (int)
- property analysis_name#
Get the analysis name (str).
- bind(parent)#
Bind the analysis with a parent Study object.
- Parameters:
parent (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) – A parental study object.
- Raises:
KeyError – If the analysis is already bound to a different study object. Call
Analysis.unbind()
first.
Notes
The binding instigates a reciprocal adding of the analysis object to the study object, so binding is in both directions. An analysis can only be bound to a single study parent.
- property caveat#
Get the caveat definition associated with the analysis (gwas_norm.metadata.phenotype.Caveat or NoneType).
- property chksum#
Get the BSD checksum based on the
analysis_name
.- Returns:
chksum – A 5 character BSD checksum of the analysis name.
- Return type:
str
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
parse_class – A class of type gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
- Return type:
class
- Raises:
KeyError – If the appropriate class can’t be found for the tag.
- property has_caveat#
Has the analysis got a caveat associated with it (bool).
- property has_phenotype#
Has the analysis got a phenotype associated with it (bool).
- has_test(chr_name, start_pos)#
Determine if the analysis has any tests matching
chr_name
,start_pos
.- Parameters:
chr_name (str) – A chr_name.
start_pos (int) – The start position.
- Returns:
test_present –
True
if tests exist for this chromosome/start position,False
if not.- Return type:
bool
- property info#
Get the analysis info data (gwas_norm.metadata.info.Info or NoneType).
- property info_columns#
Get all the input file columns that will contribute towards the analysis info fields (list of gwas_norm..metadata.column.Column)
- property info_defs#
Get all the definitions that will contribute towards the analysis info fields (list of gwas_norm.metadata.phenotype.Definition).
Notes
These may be defined within the info field or attributes of phenotypes or caveats.
- invalidate()#
Invalidate the study.
- property is_validated#
Validate the analysis.
- property n_tests#
Get the number of tests associated with the analysis (int).
- property name#
Get the analysis name, this is an alias of
analysis_name
(str).
- property parent#
Get the parent study object. The parent is set with the bind method (
gwas_norm.metadata.study.Study
orgwas_norm.metadata.study.StudyFile
)- Raises:
AttributeError – If no parent study has been defined.
- classmethod parse_caveats(element)#
Will parse out XML documenting the caveats.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
caveat
.- Returns:
caveat_data – The caveat object to add to the analysis.
- Return type:
gwas_norm.metadata.phenotype.Caveat
- classmethod parse_info_data(element)#
Will parse out XML documenting the info data.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
info
.- Returns:
info_data – The parsed info object.
- Return type:
gwas_norm.metadata.info.Info
- classmethod parse_phenotypes(element)#
Will parse out XML documenting the phenotype data.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
phenotype
.- Returns:
phenotype_data – The phenotype object to add to the analysis.
- Return type:
gwas_norm.metadata.phenotype.Phenotype
- classmethod parse_tests(element)#
Will parse out XML documenting the tests to perform on the analysis.
- Parameters:
element (lxml.etree.Element) – The element should contain sub-elements with the tag name
test
.- Returns:
tests – A list of test objects to add to the analysis.
- Return type:
list of gwas_norm.metadata.test.Test
- property phenotype#
Get the phenotype definition associated with the analysis (gwas_norm.metadata.phenotype.Phenotype or NoneType).
- property tests#
Get the tests associated with the analysis (list of gwas_norm.metadata.test.Test).
Notes
The list will be empty if there are no associated tests. The returned list is a copy of the list stored in the analysis (although the actual Test objects are not copies).
- unbind()#
Remove a parent study from this analysis. This also removes the analysis from the study parent.
- Returns:
parent – The parent object that has been unbound.
- Return type:
gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile
- VALUE_TAG = 'value'#
The XML tag name for key values (str).
- repr_attr_str()#
Used to output a list of strings containing
attribute=value
for the attributes of this object.This is used in various
__repr__
methods.- Returns:
attr_str – String representation of the objects attributes and values.
- Return type:
list of str
- property keys#
Get key-values for the analysis.
- Returns:
key_values – The first element of the nested tuple is the column and the second is the column value.
- Return type:
list of (gwas_norm.metadata.column.Column, str)
Notes
These are values that will uniquely ID rows belonging to the analysis within the parent study file.
- add_key(key_column, key_value)#
Add a key value to the analysis object.
- Parameters:
key_column (gwas_norm.metadata.column.Column) – A key column in the input file where the key value should be located.
key_value (str) – A key value to be added to the analysis
- Raises:
ValueError – If the value is an empty string
''
or all spaces orNoneType
.
Notes
These will be used to ID rows belonging to the analysis when parsing through a StudyFile. The key will be built in the order that the keys are added to the analysis.
- validate()#
Validate the analysis. This respects the file checking parameter.
- to_xml()#
Generate an
lxml.etree.Element
object for all the attributes within the analysis.- Returns:
key_analysis_element – An element representing an analysis that can be used in a larger XML structure.
- Return type:
lxml.etree.Element
- classmethod from_xml(element, **kwargs)#
Generate a
KeyAnalysis
object from anlxml.etree.Element
with the tag namekey_analysis
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
key_analysis
.- Returns:
key_analysis – An analysis object built from all the elements in the analysis object.
- Return type:
gwas_norm.metadata.analysis.KeyAnalysis
- Raises:
KeyError – If the name of the element is not
key_analysis
.
- class gwas_norm.metadata.analysis.AnalysisFile(analysis_name, analysis_type, effect_type, units=None, files=None, cohort=None, file_check=True, **kwargs)#
Bases:
_BaseAnalysis
,FileHolderMixin
A representation of an
AnalysisFile
type. This is an analysis that is directly associated with one or more data files.- Parameters:
analysis_name (str) – The unique name of the analysis. The analysis name will be made in to a lowercase string and spaces will be replaced with underscores _.
analysis_type (str) – The analyses type for the analysis. Will be applied to any analysis, that do not have an analysis_type specified
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified
units (str, optional, default: NoneType) – The units for the analysis
files (list of gwas_norm.metadata.file.GwasFile, optional, default: NoneType) – Analysis files. Analysis level files are for data such as full GWAS data for a disease as opposed to study level files such as GTEX data where multiple gene analysis are in the same file.
cohort (gwas_norm.metadata.cohort.Cohort, optional, default: NoneType) – The cohort description.
phenotype (gwas_norm.metadata.phenotype.Phenotype, optional, default: NoneType) – The phenotype description associated with the analysis
caveat (gwas_norm.metadata.phenotype.Caveat, optional, default: NoneType) – The caveat description associated with the analysis.
tests (list of gwas_norm.metadata.test.Test, optional, default: NoneType) – One of more tests that should be applied to the analysis. Tests are not implemented yet.
- ANALYSIS_ID_TAG = 'analysis_id'#
The XML tag name for the analysis ID (str)
- ANALYSIS_NAME_TAG = 'analysis_name'#
The XML tag name for the analysis name (str)
- ANALYSIS_TYPE_TAG = 'analysis_type'#
The name of the analysis type tag/element in the XML file (str).
- EFFECT_TYPE_TAG = 'effect_type'#
The name of the effect type tag/element in the XML file (str).
- FILE_CLASS = None#
The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)
- UNITS_TAG = 'units'#
The name of the units tag/element in the XML file (str).
- add_file(gwas_file, error=False)#
Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.
- Parameters:
gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.
- Raises:
- add_test(test)#
Add a test element to the analysis.
- Parameters:
test (gwas_norm.metadata.test.Test) – The test to add.
- property analysis_id#
Get the ID for the analysis, if not set then an ID will be generated (int)
- property analysis_name#
Get the analysis name (str).
- property analysis_type#
Get the analysis type (str).
- bind(parent)#
Bind the analysis with a parent Study object.
- Parameters:
parent (gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile) – A parental study object.
- Raises:
KeyError – If the analysis is already bound to a different study object. Call
Analysis.unbind()
first.
Notes
The binding instigates a reciprocal adding of the analysis object to the study object, so binding is in both directions. An analysis can only be bound to a single study parent.
- property caveat#
Get the caveat definition associated with the analysis (gwas_norm.metadata.phenotype.Caveat or NoneType).
- property chksum#
Get the BSD checksum based on the
analysis_name
.- Returns:
chksum – A 5 character BSD checksum of the analysis name.
- Return type:
str
- property cohort#
Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).
Notes
The cohort associated with the analysis or
NoneType
if no cohort has been set.
- create_analysis_type_xml(element)#
Generate all the analysis type XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_cohort_xml(element)#
create cohort specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_effect_type_xml(element)#
Generate the XML element for the effect type.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_files_xml(element)#
create file specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_units_xml(element)#
Generate the units XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_xml(element)#
Generate all the XML elements relating to objects that hold files. This wraps all other
create_*
methods in the mixin.- Parameters:
element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.
- property effect_type#
Get the effect type (str or NoneType).
- property file_check#
Get the file checking status (bool).
- file_repr_attr_str()#
Called by the
__repr__
of host objects to supply a key=value string of the attributes and their values relating to the mixin.- Returns:
attr_str – Each string is an attribute and value for printing.
- Return type:
list of str
- property files#
Get all the associated files (list of gwas_norm.metadata.file.GwasFile).
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
parse_class – A class of type gwas_norm.metadata.analysis.AnalysisFile or gwas_norm.metadata.analysis.KeyAnalysis
- Return type:
class
- Raises:
KeyError – If the appropriate class can’t be found for the tag.
- property has_caveat#
Has the analysis got a caveat associated with it (bool).
- property has_phenotype#
Has the analysis got a phenotype associated with it (bool).
- has_test(chr_name, start_pos)#
Determine if the analysis has any tests matching
chr_name
,start_pos
.- Parameters:
chr_name (str) – A chr_name.
start_pos (int) – The start position.
- Returns:
test_present –
True
if tests exist for this chromosome/start position,False
if not.- Return type:
bool
- property info#
Get the analysis info data (gwas_norm.metadata.info.Info or NoneType).
- property info_columns#
Get all the input file columns that will contribute towards the analysis info fields (list of gwas_norm..metadata.column.Column)
- property info_defs#
Get all the definitions that will contribute towards the analysis info fields (list of gwas_norm.metadata.phenotype.Definition).
Notes
These may be defined within the info field or attributes of phenotypes or caveats.
- init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#
Initialise all the attributes that a file handling object needs.
- Parameters:
analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.
Notes
This is usually called from the
__init__
method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.
- invalidate()#
Invalidate the study.
- property is_validated#
Validate the analysis.
- property n_files#
Get the number of associated files (int).
- property n_tests#
Get the number of tests associated with the analysis (int).
- property name#
Get the analysis name, this is an alias of
analysis_name
(str).
- property parent#
Get the parent study object. The parent is set with the bind method (
gwas_norm.metadata.study.Study
orgwas_norm.metadata.study.StudyFile
)- Raises:
AttributeError – If no parent study has been defined.
- classmethod parse_caveats(element)#
Will parse out XML documenting the caveats.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
caveat
.- Returns:
caveat_data – The caveat object to add to the analysis.
- Return type:
gwas_norm.metadata.phenotype.Caveat
- classmethod parse_files(element, **kwargs)#
Parse any file elements out from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
- Returns:
gwas_files – GWAS file objects.
- Return type:
list of (gwas_norm.metadata.file.GwasFile)
- Raises:
KeyError – If no file elements can be found in the parent element.
- classmethod parse_info_data(element)#
Will parse out XML documenting the info data.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
info
.- Returns:
info_data – The parsed info object.
- Return type:
gwas_norm.metadata.info.Info
- classmethod parse_phenotypes(element)#
Will parse out XML documenting the phenotype data.
- Parameters:
element (lxml.etree.Element) – The element should have sub-elements with the tag name
phenotype
.- Returns:
phenotype_data – The phenotype object to add to the analysis.
- Return type:
gwas_norm.metadata.phenotype.Phenotype
- classmethod parse_tests(element)#
Will parse out XML documenting the tests to perform on the analysis.
- Parameters:
element (lxml.etree.Element) – The element should contain sub-elements with the tag name
test
.- Returns:
tests – A list of test objects to add to the analysis.
- Return type:
list of gwas_norm.metadata.test.Test
- classmethod parse_xml(element, **kwargs)#
Parse the file associated data from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse the elements from.
- Returns:
analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.
- Raises:
KeyError – If any of the required elements can be found in the parent element.
- property phenotype#
Get the phenotype definition associated with the analysis (gwas_norm.metadata.phenotype.Phenotype or NoneType).
- remove_files(gwas_files)#
Remove one or more gwas file from this object.
- Parameters:
gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.
- property tests#
Get the tests associated with the analysis (list of gwas_norm.metadata.test.Test).
Notes
The list will be empty if there are no associated tests. The returned list is a copy of the list stored in the analysis (although the actual Test objects are not copies).
- unbind()#
Remove a parent study from this analysis. This also removes the analysis from the study parent.
- Returns:
parent – The parent object that has been unbound.
- Return type:
gwas_norm.metadata.study.Study or gwas_norm.metadata.study.StudyFile
- property units#
Get the units (str or NoneType)
- ROOT_TAG = 'analysis'#
The name of the root XML element tag name (str)
- validate()#
Validate the analysis.
- property study_source_absolute_dir#
Return the parent study source directory absolute path (str).
Notes
This provides a uniform interface for file_holder methods to access the route path without having to know the type of the class they are joined with.
- repr_attr_str()#
Get a list of strings representing the core attributes of the object and there values.
- Returns:
attr_str – The core attributes handled by the base class, and their values.
- Return type:
list of str
Notes
This is used by the
__repr__
of the base class and can be called by any subclasses in their__repr__
methods.
- on_file_added(file_obj, **kwargs)#
Callback for when a file has been added.
- Parameters:
file_obj (gwas_norm.metadata.file.GwasFile) – The file object being added.
- to_xml()#
Generate an
lxml.etree.Element
object for all the attributes within the analysis. This will have the tag nameanalysis
.- Returns:
analysis_element – An element representing an analysis that can be used in a larger XML structure.
- Return type:
lxml.etree.Element
- classmethod from_xml(element, **kwargs)#
Generate an
Analysis
object from anlxml.etree.Element
with the tag nameanalysis
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
analysis
- Returns:
analysis_obj – An analysis object built from all the elements in the analysis object.
- Return type:
gwas_norm.metadata.analysis.AnalysisFile
- Raises:
ValueError – If no file elements are associated with the analysis element.
gwas_norm.metadata.phenotype
#
Classes for building phenotype structures.
- class gwas_norm.metadata.phenotype.Phenotype(definition, reference_string=None)#
Bases:
_BasePhenotype
A representation of a phenotype.
- Parameters:
definition (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or or gwas_norm.metadata.phenotype.And) – The either a single phenotype definition or a composite one.
- ROOT_TAG = 'phenotype'#
The root XML tag for the class (str)
- classmethod from_xml(element)#
Read the phenotype definitions from an XML element.
- Parameters:
element (lxml.etree.Element) – The XML element representing the phenotype.
- Returns:
phenotype_definition – The parsed phenotype definition.
- Return type:
gwas_norm.metadata.phenotype.Phenotype
- REFERENCE_STRING_TAG = 'reference_string'#
The XML tag name for a phenotype/caveat reference string (str).
- property definition#
Get the phenotype/caveat definition (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat).
- property flat_definition#
Get the phenotype/caveat definition, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
phenotype
orcaveat
.- Returns:
class – The relevant class for the element.
- Return type:
class of (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat)
- Raises:
KeyError – If the element does not have the required tag name.
- property info_defs#
Return any definitions that have been tagged as an info definition. (list of gwas_norm.metadata.phenotype.Definition).
- property reference_string#
Get the phenotype/caveat reference string (str).
- to_xml()#
Write all the child elements out to a
<phenotype>
/<caveat>
XML element.- Returns:
element – The XML element representing the
<phenotype>
/<caveat>
.- Return type:
lxml.etree.Element
- class gwas_norm.metadata.phenotype.Caveat(definition, reference_string=None)#
Bases:
_BasePhenotype
A representation of a caveat.
A caveat is defined as anything that will alter the interpretation of the phenotype associations/effect sizes.
- Parameters:
definition (gwas_norm.metadata.phenotype_obj.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or or gwas_norm.metadata.phenotype.And) – The either a single cohort definition or a composite one.
- ROOT_TAG = 'caveat'#
The root XML tag for the class (str)
- classmethod from_xml(element)#
Read the caveat definitions from an XML element.
- Parameters:
element (lxml.etree.Element) – The XML element representing the caveat.
- Returns:
caveat_definition – The parsed caveat definition.
- Return type:
gwas_norm.metadata.phenotype.Caveat
- REFERENCE_STRING_TAG = 'reference_string'#
The XML tag name for a phenotype/caveat reference string (str).
- property definition#
Get the phenotype/caveat definition (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat).
- property flat_definition#
Get the phenotype/caveat definition, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
phenotype
orcaveat
.- Returns:
class – The relevant class for the element.
- Return type:
class of (gwas_norm.metadata.phenotype.Phenotype or gwas_norm.metadata.phenotype.Caveat)
- Raises:
KeyError – If the element does not have the required tag name.
- property info_defs#
Return any definitions that have been tagged as an info definition. (list of gwas_norm.metadata.phenotype.Definition).
- property reference_string#
Get the phenotype/caveat reference string (str).
- to_xml()#
Write all the child elements out to a
<phenotype>
/<caveat>
XML element.- Returns:
element – The XML element representing the
<phenotype>
/<caveat>
.- Return type:
lxml.etree.Element
- class gwas_norm.metadata.phenotype.Definition(name, info=False, map_to=None, dtype=None)#
Bases:
_XmlBase
,InfoHolderMixin
A definition (name and type) of a phenotype, caveat or synonym.
- Parameters:
name (str) – The definition name.
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: text) – If info is true map_to indicates that the definition should be known as the
map_to
value in the info field. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.
- ROOT_TAG = 'definition'#
The root XML tag for the class (str)
- UNDEF_TYPE = 'text'#
The name of a type attribute that has not been set (str)
- property reference_string#
Get the definition reference string, this is the same as the name (str).
- property flat_definition#
Get the definition as a list with a single definition object. (list of gwas_norm.metadata.phenotype.Definition).
- to_xml()#
Write the definition out to an XML element.
- Returns:
definition_element – The XML element representing the definition. Has the tag name
<definition>
- Return type:
lxml.etree.Element
- classmethod from_xml(element)#
Read the definition from an XML element.
- Parameters:
definition_element (lxml.etree.Element) – The XML element representing the definition. Has the tag name
<definition>
.- Returns:
definition – The definition object.
- Return type:
gwas_norm.metadata.phenotype.Definition
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
definition
.- Returns:
class – The relevant class for the element.
- Return type:
class of gwas_norm.metadata.phenotype.Definition
- Raises:
KeyError – If the element does not have the required tag name.
- DATA_TYPE_ATTRIBUTE = 'dtype'#
The name of the data type attribute of the column (str)
- INFO_ATTRIBUTE = 'info'#
The name of the info attribute of the column (str)
- MAP_TO_ATTRIBUTE = 'map_to'#
The name of the key attribute of the column (str)
- property dstruct#
Get the data structure value.
C
is a scalar.A
is an array, (str).
- property dtype#
Get the dtype value.
S
is a string value.F
is a float,I
is an integer (str).
- equals(other)#
Determine equality against another InfoHolderMixin containing object. This is based on
map_to
,dtype
anddstruct
values matching
- classmethod get_attributes(element)#
Get the attributes from an XML element.
- Parameters:
element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.
- Returns:
info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the
map_to
value in the info field and not as the name.dtype (str) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.
- property info#
Get the is info output value (bool).
- init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#
Initialise all of the info related values for the mixin.
- Parameters:
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the
map_to
value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.all_info_false (bool, optional, default: False) – If this is set to
True
, then if info isFalse
and themap_to
is defined, thenmap_to
it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.
- property map_to#
Get the column name remapping value (str or NoneType).
- set_attributes(element)#
Set the attributes into an XML element.
- Parameters:
element (lxml.etree.Element) – The element to add the attributes.
- class gwas_norm.metadata.phenotype.Synonym(*synonyms)#
Bases:
_XmlBase
A container for phenotype definitions that are synonyms of the same thing.
- Parameters:
*synonyms – One or more gwas_norm.metadata.phenotype.Definition objects.
- ROOT_TAG = 'synonym'#
The root element for the class (str)
- property reference_string#
Get the Synonym reference string, this is the first added synonym (str).
- property flat_definition#
Get the synonym definitions, flattened to a list of definition objects (list of gwas_norm.metadata.phenotype.Definition).
- property synonyms#
Get all the synonym definitions (list of gwas_norm.metadata.phenotype.Definition).
- add(s)#
Add a phenotype definition to the synonym.
- to_xml()#
Write all the synonyms out to an XML element.
- Returns:
synonym_element – The XML element representing the synonym. Has the tag name
<synonym>
.- Return type:
lxml.etree.Element
- Raises:
IndexError – If there is < 2 definitions in the synonym.
- classmethod from_xml(element)#
Read the definition from an XML element.
- Parameters:
element (lxml.etree.Element) – The XML element representing the synonyms. Should have the tag name
<synonym>
.- Returns:
synonyms – The synonym object.
- Return type:
gwas_norm.metadata.phenotype.Synonym
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
synonym
.- Returns:
class – The relevant class for the element.
- Return type:
class of gwas_norm.metadata.phenotype.Synonym
- Raises:
KeyError – If the element does not have the required tag name.
- class gwas_norm.metadata.phenotype.And(*contents)#
Bases:
_AndOrBase
A phenotype
<and>
statement.- Parameters:
*contents – One or more instances of: gwas_norm.metadata.phenotype.Definition gwas_norm.metadata.phenotype.Synonym gwas_norm.metadata.phenotype.Or
- property contents#
Get all the and/or definitions.
- Returns:
contents_list (list of)
(gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The contents of the And/Or statement.
- property flat_definition#
Get the synonym definitions, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
and
/or
.- Returns:
class – The relevant class for the element.
- Return type:
class of (gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or)
- Raises:
KeyError – If the element does not have the required tag name.
- classmethod parse_xml(element)#
Parse the contents of the
<and>
/<or>
definiton from an XML element.- Parameters:
element (lxml.etree.Element) – The XML element representing the
<and>
/<or>
definition. Should have the tag name<and>
,<or>
.- Returns:
object – The
And
/Or
object.- Return type:
gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or
- Raises:
IndexError – If there is < 2 definition in the And/Or.
- to_xml()#
Write all the components of the And/Or statement out to an XML element.
- Returns:
element – The XML element representing the and/or statement. Has the tag name
<and>/<or>
.- Return type:
lxml.etree.Element
- Raises:
IndexError – If there is < 2 definition in the And/Or.
- ROOT_TAG = 'and'#
The root element for the class (str)
- property reference_string#
Get the And reference string (str)
- add(s)#
Add an element to the And statement.
- Parameters:
s (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The element to add.
- Raises:
TypeError – If the element being added is not of the expected type.
IndexError – If an
Or
orSynonym
element is being added then it must have > 1 definition within it.
- classmethod from_xml(element)#
Read the
<and>
from an XML element.- Parameters:
element (lxml.etree.Element) – The XML element representing the and statement. Should have the tag name
<and>
.- Returns:
and_obj – The and object.
- Return type:
gwas_norm.metadata.phenotype.And
- class gwas_norm.metadata.phenotype.Or(*contents)#
Bases:
_AndOrBase
A phenotype Or statement.
- Parameters:
*contents – One or more instances of: gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.And
- property contents#
Get all the and/or definitions.
- Returns:
contents_list (list of)
(gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.Or) – The contents of the And/Or statement.
- property flat_definition#
Get the synonym definitions, flattened to a list of Defintion objects (list of gwas_norm.metadata.phenotype.Definition).
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
and
/or
.- Returns:
class – The relevant class for the element.
- Return type:
class of (gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or)
- Raises:
KeyError – If the element does not have the required tag name.
- classmethod parse_xml(element)#
Parse the contents of the
<and>
/<or>
definiton from an XML element.- Parameters:
element (lxml.etree.Element) – The XML element representing the
<and>
/<or>
definition. Should have the tag name<and>
,<or>
.- Returns:
object – The
And
/Or
object.- Return type:
gwas_norm.metadata.phenotype.And or gwas_norm.metadata.phenotype.Or
- Raises:
IndexError – If there is < 2 definition in the And/Or.
- to_xml()#
Write all the components of the And/Or statement out to an XML element.
- Returns:
element – The XML element representing the and/or statement. Has the tag name
<and>/<or>
.- Return type:
lxml.etree.Element
- Raises:
IndexError – If there is < 2 definition in the And/Or.
- ROOT_TAG = 'or'#
The root element for the class (str)
- property reference_string#
Get the
Or
reference string (str)
- add(s)#
Add an element to the
Or
statement.- Parameters:
s (gwas_norm.metadata.phenotype.Definition or gwas_norm.metadata.phenotype.Synonym or gwas_norm.metadata.phenotype.And) – The element to add.
- Raises:
TypeError – If the element being added is not of the expected type.
IndexError – If an
And
orSynonym
element is being added then it must have > 1 definition within it.
- classmethod from_xml(element)#
Read the
<or>
from an XML element.- Parameters:
element (lxml.etree.Element) – The XML element representing the or statement. Should have the tag name
<or>
- Returns:
or_obj – The
Or
object.- Return type:
gwas_norm.metadata.phenotype.Or
gwas_norm.metadata.cohort
#
Cohort XML elements.
- class gwas_norm.metadata.cohort.SampleSizeMixin#
Bases:
_BaseSample
A mix in to add sample size storage and methods.
Notes
The sample size can be expressed in real integer values (i.e. 20000) or a proportional value (i.e. 0.2). Whilst the proportional value does not make sense on it’s own, if this is in a population within a cohort then it will be expected that other populations will also be expressed as proportions.
- TYPE = 'sample'#
The type of the population, i.e.
case_control
,sample
orNoneType
(str)
- N_SAMPLES_TAG = 'n_samples'#
The name of the element containing the number of samples (str)
- property n_samples#
Get the number of samples (int or float).
- create_nsamples_xml(element)#
Write number of samples element to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_samples
elements.
- classmethod parse_xml(element)#
This determines if the element has any number of sample definitions and if it does it parses them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_samples
.- Returns:
nsamples – The number of samples. If no
n_samples
are found then this will beNoneType
.- Return type:
int or float or NoneType
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- class gwas_norm.metadata.cohort.CaseControlMixin#
Bases:
_BaseSample
A mix in to add case/control size storage and methods.
- TYPE = 'case_control'#
The type of the class, i.e.
case_control
,sample
orNoneType
(str)
- N_CASES_TAG = 'n_cases'#
The name of the XML tag containing the number of cases (str)
- N_CONTROLS_TAG = 'n_controls'#
The name of the XML tag containing the number of controls (str)
- property n_samples#
Get the number of samples set, this is the sum of cases+controls (int or float).
- property n_cases#
Get the number of cases (int or float).
- property n_controls#
Get the number of controls (int or float).
- create_case_xml(element)#
Write case control elements to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_cases
andn_controls
elements.
- classmethod parse_xml(element)#
This determines if the element has any case/control definitions and if it does it will error check and parse them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_cases
andn_controls
elements.- Returns:
ncases (int or float or NoneType) – The number of cases. If no
n_cases
,n_contols
elements are found then this will beNoneType
.ncontrols (int or float or NoneType) – The number of controls. If no
n_cases
,n_contols
elements are found then this will beNoneType
.
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- class gwas_norm.metadata.cohort.LdReference(name, weight, pop_names=None)#
Bases:
_PopReference
The LD reference population container.
- Parameters:
name (str) – The name for the population reference group.
weight (float) – The weighting this population group should be given to the overall reference.
pops_names (list of str, optional, default: NoneType) – The population names that can be used interchangeably to represent this reference population. These are applied hierarchically with the topmost in the list being tried before the bottom of the list.
Notes
This is a representation of all the population groups that can be used interchangeably to represent a component of an LD reference group.
- ROOT_TAG = 'ld_ref'#
The name of the root XML element tag name, this should be overridden by sub-classes (str)
- NAME_TAG = 'name'#
The tag name for a reference population name (str)
- REF_POP_TAG = 'ref_pop'#
The tag name for a reference population tag (str)
- WEIGHT_TAG = 'weight'#
The tag name for a reference population weight (str)
- add_pop(pop)#
Add a population to the population list.
- Parameters:
pop (str) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- classmethod from_xml(element)#
Generate a reference population object.
- Parameters:
element (lxml.etree.Element) – The reference population element.
- Returns:
ref_pop_obj – A reference population object.
- Return type:
gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference
- Raises:
KeyError – If the name of the element is recognised.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
ref_pop_class – The relevant reference population class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference)
- Raises:
KeyError – If the element does not have the required tag name.
- property name#
Get the reference population name (str).
- property pops#
Return the population names (list of str).
- property refpops#
Return the population names (list of str).
- remove_pop(pop)#
Remove a population from the population list.
- Parameters:
pop (str) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- reset_pops()#
Reset the population list to empty.
- to_xml()#
Generate a XML element for the reference population.
- Returns:
element – The XML element representation the cohort.
- Return type:
lxml.etree.Element
- property weight#
Get the reference population weight (str).
- class gwas_norm.metadata.cohort.FreqReference(name, weight, pop_names=None)#
Bases:
_PopReference
The base allele frequency reference population container.
- Parameters:
name (str) – The name for the population reference group.
weight (float) – The weighting this population group should be given to the overall reference.
pops_names (list of str, optional, default: NoneType) – The population names that can be used interchangeably to represent this reference population. These are applied hierarchically with the topmost in the list being tried before the bottom of the list.
Notes
This is a representation of all the population groups that can be used interchangeably to represent a component of a frequency reference population group.
- ROOT_TAG = 'allele_freq_ref'#
The name of the root XML element tag name, this should be overridden by sub-classes (str)
- NAME_TAG = 'name'#
The tag name for a reference population name (str)
- REF_POP_TAG = 'ref_pop'#
The tag name for a reference population tag (str)
- WEIGHT_TAG = 'weight'#
The tag name for a reference population weight (str)
- add_pop(pop)#
Add a population to the population list.
- Parameters:
pop (str) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- classmethod from_xml(element)#
Generate a reference population object.
- Parameters:
element (lxml.etree.Element) – The reference population element.
- Returns:
ref_pop_obj – A reference population object.
- Return type:
gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference
- Raises:
KeyError – If the name of the element is recognised.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
ref_pop_class – The relevant reference population class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.LdReference or gwas_norm.metadata.cohort.FreqReference)
- Raises:
KeyError – If the element does not have the required tag name.
- property name#
Get the reference population name (str).
- property pops#
Return the population names (list of str).
- property refpops#
Return the population names (list of str).
- remove_pop(pop)#
Remove a population from the population list.
- Parameters:
pop (str) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- reset_pops()#
Reset the population list to empty.
- to_xml()#
Generate a XML element for the reference population.
- Returns:
element – The XML element representation the cohort.
- Return type:
lxml.etree.Element
- property weight#
Get the reference population weight (str).
- class gwas_norm.metadata.cohort.Population(name, freq_pops=None, ld_pops=None)#
Bases:
_XmlBase
A representation of a population where the sample size is not known.
- Parameters:
name (str) – A free text name for the population group.
freq_pops (list of gwas_norm.metadata.cohorts.FreqReference, optional, default: NoneType) – A hierarchical list of reference populations names that will be used to obtain allele frequency estimates if not provided by the study.
ld_pops (list of gwas_norm.metadata.cohorts.LdReference, optional, default: NoneType) – A hierarchical list of reference populations names that will be used to obtain allele LD estimates.
Notes
If > 1 freq_pop or LD reference is provided these will have a weighting attached to them that indicates their weight in the overall frequency or LD estimate. Within each reference the populations names will be applied hierarchically.
- ROOT_TAG = 'population'#
The name of the root XML element tag name (str)
- TYPE = None#
The type of the population, i.e.
case_control
,sample
orNoneType
(NoneType)
- NAME_TAG = 'name'#
The name of the XML tag containing the free text population name (str)
- property freq_pops#
Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).
- property ld_pops#
Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).
- add_ld_pop(pop)#
Add a population to the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- add_freq_pop(pop)#
Add a population to the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- remove_ld_pop(pop)#
Remove a population from the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- remove_freq_pop(pop)#
Remove a population from the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- reset_ld_pops()#
Reset the LD populations to empty.
- reset_freq_pops()#
Reset the allele frequency populations to empty.
- to_xml()#
Generate a XML element for the population.
- Returns:
element – The XML element representation the cohort.
- Return type:
lxml.etree.Element
- classmethod from_xml(element)#
Generate a population object.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name
population
.- Returns:
population_obj – A population object built from all the elements in the
population
element. The exact class will depend in the element within the population element.- Return type:
gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation
- Raises:
KeyError – If the name of the element is not population.
ValueError – If both
n_cases
orn_controls
are not defined.
Notes
The returned object will be of type
Population
,CaseControlPopulation
orSamplePopulation
depending on if thelxml.etree.Element
has the tag namepopulation
,case_control_population
orsample_population
respectively.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
population_class – The relevant population class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
- Raises:
KeyError – If the element does not have the required tag name.
- class gwas_norm.metadata.cohort.CaseControlPopulation(name, ncases, ncontrols, **kwargs)#
Bases:
CaseControlMixin
,Population
A representation of a population where the number of cases and controls are defined.
- Parameters:
name (str) – A free text name for the population group.
ncases (int or float) – The number of cases.
ncontrols (int or float) – The number of controls.
freq_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain allele frequency estimates.
ld_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain LD estimates.
Notes
The hierarchy of the
ld_pops
andfreq_pops
refers to the order they are used. If the data is not available in the first population in the list, then the next one should be used until the hierarchy is exhausted.- ROOT_TAG = 'case_control_population'#
The name of the root XML element tag name (str)
- property n_cases#
Get the number of cases (int or float).
- property n_controls#
Get the number of controls (int or float).
- to_xml()#
Generate a XML element for the case control population.
- Returns:
element – The XML element representation of the case control population.
- Return type:
lxml.etree.Element
- NAME_TAG = 'name'#
The name of the XML tag containing the free text population name (str)
- N_CASES_TAG = 'n_cases'#
The name of the XML tag containing the number of cases (str)
- N_CONTROLS_TAG = 'n_controls'#
The name of the XML tag containing the number of controls (str)
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- TYPE = 'case_control'#
The type of the class, i.e.
case_control
,sample
orNoneType
(str)
- add_freq_pop(pop)#
Add a population to the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- add_ld_pop(pop)#
Add a population to the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- create_case_xml(element)#
Write case control elements to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_cases
andn_controls
elements.
- property freq_pops#
Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).
- classmethod from_xml(element)#
Generate a population object.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name
population
.- Returns:
population_obj – A population object built from all the elements in the
population
element. The exact class will depend in the element within the population element.- Return type:
gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation
- Raises:
KeyError – If the name of the element is not population.
ValueError – If both
n_cases
orn_controls
are not defined.
Notes
The returned object will be of type
Population
,CaseControlPopulation
orSamplePopulation
depending on if thelxml.etree.Element
has the tag namepopulation
,case_control_population
orsample_population
respectively.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
population_class – The relevant population class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
- Raises:
KeyError – If the element does not have the required tag name.
- property ld_pops#
Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).
- property n_samples#
Get the number of samples set, this is the sum of cases+controls (int or float).
- classmethod parse_xml(element)#
This determines if the element has any case/control definitions and if it does it will error check and parse them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_cases
andn_controls
elements.- Returns:
ncases (int or float or NoneType) – The number of cases. If no
n_cases
,n_contols
elements are found then this will beNoneType
.ncontrols (int or float or NoneType) – The number of controls. If no
n_cases
,n_contols
elements are found then this will beNoneType
.
- remove_freq_pop(pop)#
Remove a population from the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- remove_ld_pop(pop)#
Remove a population from the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- reset_freq_pops()#
Reset the allele frequency populations to empty.
- reset_ld_pops()#
Reset the LD populations to empty.
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- class gwas_norm.metadata.cohort.SamplePopulation(name, nsamples, **kwargs)#
Bases:
SampleSizeMixin
,Population
A representation of a population where the total sample size is defined.
- Parameters:
name (str) – A free text name for the population group.
nsamples (int or float) – The number of samples.
freq_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain allele frequency estimates.
ld_pops (list of str, optional, default: NoneType) – A hierarchical list of population names that will be used to obtain LD estimates.
Notes
The hierarchy of the
ld_pops
andfreq_pops
refers to the order they are used. If the data is not available in the first population in the list, then the next one should be used until the hierarchy is exhausted.- ROOT_TAG = 'sample_population'#
The name of the root XML element tag name (str)
- property n_samples#
Get the number of samples (int or float).
- to_xml()#
Generate a XML element for the sample population.
- Returns:
element – The XML element representation the sample population.
- Return type:
lxml.etree.Element
- NAME_TAG = 'name'#
The name of the XML tag containing the free text population name (str)
- N_SAMPLES_TAG = 'n_samples'#
The name of the element containing the number of samples (str)
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- TYPE = 'sample'#
The type of the population, i.e.
case_control
,sample
orNoneType
(str)
- add_freq_pop(pop)#
Add a population to the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- add_ld_pop(pop)#
Add a population to the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being added. It will only be added if it does not exist, if it does exist then this will fail silently.
- create_nsamples_xml(element)#
Write number of samples element to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_samples
elements.
- property freq_pops#
Return the allele frequency populations (list of gwas_norm.metadata.cohorts.FreqReference).
- classmethod from_xml(element)#
Generate a population object.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name
population
.- Returns:
population_obj – A population object built from all the elements in the
population
element. The exact class will depend in the element within the population element.- Return type:
gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation
- Raises:
KeyError – If the name of the element is not population.
ValueError – If both
n_cases
orn_controls
are not defined.
Notes
The returned object will be of type
Population
,CaseControlPopulation
orSamplePopulation
depending on if thelxml.etree.Element
has the tag namepopulation
,case_control_population
orsample_population
respectively.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
population_class – The relevant population class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)
- Raises:
KeyError – If the element does not have the required tag name.
- property ld_pops#
Return the LD populations (list of gwas_norm.metadata.cohorts.LdReference).
- classmethod parse_xml(element)#
This determines if the element has any number of sample definitions and if it does it parses them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_samples
.- Returns:
nsamples – The number of samples. If no
n_samples
are found then this will beNoneType
.- Return type:
int or float or NoneType
- remove_freq_pop(pop)#
Remove a population from the allele frequency population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.FreqReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- remove_ld_pop(pop)#
Remove a population from the LD population list.
- Parameters:
pop (gwas_norm.metadata.cohorts.LdReference) – The population being removed. It will only be removed if it exists, if it does exist then this will fail silently.
- reset_freq_pops()#
Reset the allele frequency populations to empty.
- reset_ld_pops()#
Reset the LD populations to empty.
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- class gwas_norm.metadata.cohort.Cohort(populations=None, name=None)#
Bases:
_XmlBase
A representation of a cohort, where the samples sizes or cases/controls are not defined in the population groups.
- Parameters:
population (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)) – One or more populations, the exact class will depend in the information available. If no population is known then
gwas_norm.metadata.cohort.Population
should be used, if any other sample number data is known the other classes should be used as appropriate. However,gwas_norm.metadata.cohort.CaseControlPopulation
can’t be mixed withgwas_norm.metadata.cohort.SamplePopulation
. However, both can be mixed withgwas_norm.metadata.cohort_obj.Population
.name (str, optional, default: NoneType) – An overall free text name for the cohort.
- ROOT_TAG = 'cohort'#
The name of the root XML element tag name (str)
- NAME_TAG = 'name'#
The name of the tag describing the cohort name (str)
- TYPE = None#
The type of the cohort, i.e.
case_control
,sample
orNoneType
(str)
- property n_samples#
Return the number of samples in all populations (int).
- property pops#
Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).
- property name#
Return the cohort name (str or NoneType).
- add_population(population)#
Add a population to the cohort.
- Parameters:
populaion (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation) – The population to add. The exact class will depend in the information available. If no population is known then
gwas_norm.metadata.cohort_obj.Population
should be used, if any other sample number data is known the other classes should be used as appropriate. However,gwas_norm.metadata.cohort_obj.CaseControlPopulation
can’t be mixed withgwas_norm.metadata.cohort_obj.SamplePopulation
. However, both can be mixed withgwas_norm.metadata.cohort_obj.Population
.
Notes
Populations will only be added if their name and class type is different from any existing populations in the cohort.
- to_xml()#
Generate a XML element for the cohort.
- Returns:
element – The XML element representation the cohort.
- Return type:
lxml.etree.Element
- classmethod from_xml(element)#
Generate a cohort object from an
lxml.etree.Element
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
cohort
,sample_cohort. (case_control_cohort or) –
- Returns:
cohort – A cohort object built from all the elements in the cohort elements.
- Return type:
gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort
- Raises:
KeyError – If the tag name of the element is not a recognised cohort tag.
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the element.
- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
cohort
orcase_control_cohort
orsample_cohort
.- Returns:
cohort_class – The relevant cohort class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
- Raises:
KeyError – If the element does not have the required tag name.
- class gwas_norm.metadata.cohort.CaseControlCohort(ncases, ncontrols, populations=None, name=None)#
Bases:
CaseControlMixin
,Cohort
A representation of a cohort where the number of cases and controls are defined.
- Parameters:
ncases (int or float) – The number of cases.
ncontrols (int or float) – The number of controls.
populations (list of gwas_norm.metadata.cohort.Population) – One or more populations. Note this must be populations that do not have any sample sizes defined.
Notes
A cohort where the actual sample size of individual population groups within the cohort is unknown and only an aggregate sample size is known.
- NAME_TAG = 'name'#
The name of the tag describing the cohort name (str)
- N_CASES_TAG = 'n_cases'#
The name of the XML tag containing the number of cases (str)
- N_CONTROLS_TAG = 'n_controls'#
The name of the XML tag containing the number of controls (str)
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- TYPE = 'case_control'#
The type of the class, i.e.
case_control
,sample
orNoneType
(str)
- create_case_xml(element)#
Write case control elements to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_cases
andn_controls
elements.
- classmethod from_xml(element)#
Generate a cohort object from an
lxml.etree.Element
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
cohort
,sample_cohort. (case_control_cohort or) –
- Returns:
cohort – A cohort object built from all the elements in the cohort elements.
- Return type:
gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort
- Raises:
KeyError – If the tag name of the element is not a recognised cohort tag.
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the element.
- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
cohort
orcase_control_cohort
orsample_cohort
.- Returns:
cohort_class – The relevant cohort class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
- Raises:
KeyError – If the element does not have the required tag name.
- property n_samples#
Get the number of samples set, this is the sum of cases+controls (int or float).
- property name#
Return the cohort name (str or NoneType).
- classmethod parse_xml(element)#
This determines if the element has any case/control definitions and if it does it will error check and parse them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_cases
andn_controls
elements.- Returns:
ncases (int or float or NoneType) – The number of cases. If no
n_cases
,n_contols
elements are found then this will beNoneType
.ncontrols (int or float or NoneType) – The number of controls. If no
n_cases
,n_contols
elements are found then this will beNoneType
.
- property pops#
Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- ROOT_TAG = 'case_control_cohort'#
The name of the root XML element tag name (str)
- property n_cases#
Get the number of cases (int or float).
- property n_controls#
Get the number of controls (int or float).
- add_population(population)#
Add a population to the cohort.
- Parameters:
populaion (gwas_norm.metadata.cohort.Population) – The population to add. Note this must be a population that do does not have any sample sizes defined.
Notes
Populations will only be added if their name and class type is different from any existing populations in the cohort.
- to_xml()#
Generate a XML element for the case control cohort.
- Returns:
element – The XML element representation the case control cohort.
- Return type:
lxml.etree.Element
- class gwas_norm.metadata.cohort.SampleCohort(nsamples, populations=None, name=None)#
Bases:
SampleSizeMixin
,Cohort
A representation of a cohort where the number of samples is defined.
- Parameters:
nsamples (int or float) – The number of samples.
populations (list of gwas_norm.metadata.cohort.Population) – One or more populations. Note this must be populations that do not have any sample sizes defined.
Notes
A cohort where the actual sample size of individual population groups within the cohort is unknown and only an aggregate sample size is known.
- NAME_TAG = 'name'#
The name of the tag describing the cohort name (str)
- N_SAMPLES_TAG = 'n_samples'#
The name of the element containing the number of samples (str)
- PROPORTION_TYPE = 'proportion'#
Constant indicating a proportional sample value (str)
- REAL_TYPE = 'real'#
Constant indicating a real integer sample value (str)
- TYPE = 'sample'#
The type of the population, i.e.
case_control
,sample
orNoneType
(str)
- create_nsamples_xml(element)#
Write number of samples element to the given element.
- Parameters:
element (lxml.etree.Element) – The element to write
n_samples
elements.
- classmethod from_xml(element)#
Generate a cohort object from an
lxml.etree.Element
.- Parameters:
element (lxml.etree.Element) – The element should have the tag name
cohort
,sample_cohort. (case_control_cohort or) –
- Returns:
cohort – A cohort object built from all the elements in the cohort elements.
- Return type:
gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort
- Raises:
KeyError – If the tag name of the element is not a recognised cohort tag.
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the element.
- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
cohort
orcase_control_cohort
orsample_cohort
.- Returns:
cohort_class – The relevant cohort class for the element.
- Return type:
class of (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort)
- Raises:
KeyError – If the element does not have the required tag name.
- property name#
Return the cohort name (str or NoneType).
- classmethod parse_xml(element)#
This determines if the element has any number of sample definitions and if it does it parses them.
- Parameters:
element (lxml.etree.Element) – The element potentially containing a
n_samples
.- Returns:
nsamples – The number of samples. If no
n_samples
are found then this will beNoneType
.- Return type:
int or float or NoneType
- property pops#
Return the populations (list of (gwas_norm.metadata.cohort.Population or gwas_norm.metadata.cohort.CaseControlPopulation or gwas_norm.metadata.cohort.SamplePopulation)).
- reset_seen_values()#
Reset previously seen sample values given when setting the
value_type
.Notes
This can be used if the user wants to pass a
sample_value
of a different type (which will normally raise aValueError
).
- property value_type#
Get the sample value type (str).
- ROOT_TAG = 'sample_cohort'#
The name of the root XML element tag name (str)
- property n_samples#
Get the number of samples (int or float).
- add_population(population)#
Add a population to the cohort.
- Parameters:
populaion (gwas_norm.metadata.cohort.Population) – The population to add. Note this must be a population that do does not have any sample sizes defined.
Notes
Populations will only be added if their name and class type is different from any existing populations in the cohort.
- to_xml()#
Generate a XML element for the sample cohort.
- Returns:
element – The XML element representation the sample cohort.
- Return type:
lxml.etree.Element
gwas_norm.metadata.file
#
The representation of gwas summary stat file metadata.
- class gwas_norm.metadata.file.FileHolderMixin#
Bases:
object
A mixin designed to implement the logic for objects handling files (currently this is the
StudyFile
andAnalysisFile
objects).Notes
Any class using this mixin requires the on_file_added/on_files_removed methods to be implemented, these callbacks allow object specific behaviour to occur when a file is being added/removed. These accept a single file and a list of files respectively.
- EFFECT_TYPE_TAG = 'effect_type'#
The name of the effect type tag/element in the XML file (str).
- ANALYSIS_TYPE_TAG = 'analysis_type'#
The name of the analysis type tag/element in the XML file (str).
- UNITS_TAG = 'units'#
The name of the units tag/element in the XML file (str).
- FILE_CLASS = None#
The file class that should be used for parsing XML, this should be overridden by the sub-class (NoneType)
- file_repr_attr_str()#
Called by the
__repr__
of host objects to supply a key=value string of the attributes and their values relating to the mixin.- Returns:
attr_str – Each string is an attribute and value for printing.
- Return type:
list of str
- init_file_attr(analysis_type, effect_type, units=None, cohort=None, files=None, file_check=True)#
Initialise all the attributes that a file handling object needs.
- Parameters:
analysis_type (str) – The analyses type for the study. Will be applied to any analysis, that do not have an analysis_type specified.
effect_type (str) – The default effect_type of the study. Will be applied to any analysis, that do not have an effect_type specified.
units (str, optional, default: NoneType) – The units of the effect sizes within the file are measured in.
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort) –
default (NoneType) – If the cohort that applies to the file..
files (list of (gwas_norm.metadata.file.GwasFile), optional, default: NoneType) – Source files. Data files can either be given at the study level or the analysis level but not both. Study level files are for data such as GTEX data where multiple gene analysis are in the same file
file_check (bool, optional, default: True) – Toggle file checking.
Notes
This is usually called from the
__init__
method of the host object and will initialise all the parameters relating to file handling from arguments that have been given to the host object.
- property cohort#
Get the cohort definition associated with the analysis (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType).
Notes
The cohort associated with the analysis or
NoneType
if no cohort has been set.
- property files#
Get all the associated files (list of gwas_norm.metadata.file.GwasFile).
- property n_files#
Get the number of associated files (int).
- property file_check#
Get the file checking status (bool).
- property analysis_type#
Get the analysis type (str).
- property effect_type#
Get the effect type (str or NoneType).
- property units#
Get the units (str or NoneType)
- add_file(gwas_file, error=False)#
Add a GWAS file to the object. This results in a reciprocal bind of the file to the parent object.
- Parameters:
gwas_file (gwas_norm.metadata.file.GwasFile) – The file being added.
error (bool, optional, default: False) – If the file exists in this object already, should an error be raised.
- Raises:
- remove_files(gwas_files)#
Remove one or more gwas file from this object.
- Parameters:
gwas_file (list of gwas_norm.metadata.file.GwasFile) – The file being removed.
- create_effect_type_xml(element)#
Generate the XML element for the effect type.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_analysis_type_xml(element)#
Generate all the analysis type XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_units_xml(element)#
Generate the units XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the element to.
- Returns:
element – The parent XML element with the element added.
- Return type:
lxml.etree.Element
- create_files_xml(element)#
create file specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_cohort_xml(element)#
create cohort specific XML elements in the parental element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to add the file specific elements to.
- Returns:
element – The parent XML element with the file elements added.
- Return type:
lxml.etree.Element
Notes
This is designed to add XML elements to a study/analysis element that can has file parameters.
- create_xml(element)#
Generate all the XML elements relating to objects that hold files. This wraps all other
create_*
methods in the mixin.- Parameters:
element (lxml.etree.Element) – The parent XML element to add the elements to.
element – The parent XML element with the elements added.
- classmethod parse_xml(element, **kwargs)#
Parse the file associated data from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse the elements from.
- Returns:
analysis_type (str) – The analysis type.
effect_type (str) – The effect type.
units (str or NoneType) – The units (if defined).
cohort (gwas_norm.metadata.cohort.Cohort or gwas_norm.metadata.cohort.CaseControlCohort or gwas_norm.metadata.cohort.SampleCohort or NoneType) – The cohort (if defined).
files (list of` gwas_norm.metadata.file.GwasFile) – Files that have been parsed out of the XML.
- Raises:
KeyError – If any of the required elements can be found in the parent element.
- classmethod parse_files(element, **kwargs)#
Parse any file elements out from the XML element.
- Parameters:
element (lxml.etree.Element) – The parent XML element to parse file specific elements from.
- Returns:
gwas_files – GWAS file objects.
- Return type:
list of (gwas_norm.metadata.file.GwasFile)
- Raises:
KeyError – If no file elements can be found in the parent element.
- class gwas_norm.metadata.file.GwasFile(relative_path, column_map, chrpos_spec=None, parent=None, comment_char=None, pvalue_logged=False, compression=None, skiplines=0, md5_chksum=None, encoding='utf-8', file_check=True, has_header=True, keys=None, info=None, **csv_kwargs)#
Bases:
_XmlBase
The base class for a representation of an input GWAS file. Do not use directly.
- Parameters:
relative_path (str) – The relative path to the GWAS file, relative to the
study_source_dir
in the study object.column_map (dict) – The keys should be standard column names and the values should be GWAS file column names,
chrpos_spec (gwas_norm.normalise.ChrPosSpec, optional, default: NoneType) – The specification of columns in a combined chromosome position column. Whilst this is optional, an error will be raised if it is
NoneType
and achrpos
column is defined in thecolumn_map
.parent (gwas_norm.metadata.study.StudyFile or gwas_norm.metadata.analysis.AnalysisFile, optional, default: NoneType) – The parent object that will hold the file. A reciprocal bind is initiated in the parent.
comment_char (str, optional, default: NoneType) – A character that is treated as a comment either at the start of a line or at the start of the file.
pvalue_logged (bool, optional, default: False) – Is the pvalue in the data file -log10 transformed.
compression (bool, optional, default: NoneType) – The type of compression in the file.
skiplines (int, optional, default: 0) – A fixed number of rows to skip before looking for the header. Any comment rows are not included in this. i.e. skip lines from the start of the file then look for comment lines.
md5_chksum (str, optional, default: NoneType) – The 32 character checksum of the file. If not supplied then it is calculated upon XML writing or can be calculated with gwas_norm.metadata.file.GwasFile.check_file.
encoding (str, optional, default: NoneType) – The encoding of the file. The default is
NoneType
and this will meanutf-8
.file_check (bool, optional, default: False) – Should file checks be performed. If this is False and md5_chksum is not defined then an error is raised. If True this will calculate the MD5 hash. If md5_chksum is NoneType then the calculated md5_chksum is used when the XML file is written. If md5_chksum is supplied then the supplied md5_chksum is compared to the calculated one. File checks will also test for the presence of a header and check the delimiter and valid compression (where possible).
has_header (bool, optional, default: True) – Does the input file have a header?
keys (list of gwas_norm.metadata.column.Column, optional, default: NoneType) – Any key columns that have been defined in the file. Key columns are used to define row keys for a GWAS file, this is these must be set if the file is being added to a gwas_norm.metadata.study.StudyFile object. Key columns must be present in the header of a file.
info (gwas_norm.metadata.info.Info, optional, default: NoneType) – Any file-level info definitions/columns. Info columns must be present in the header of the file.
**csv_kwargs – Any arguments used in a csv dialect to read in a csv file. These are: currently only
delimiter
andlineterminator
are supported for XML writing, the others that are stored but not supported yet in XML writing aredoublequote
,escapechar
,quotechar
,quoting
,skipinitialspace
,``strict`` ordialect
. Note however, that unlike csv the delimiter defaults to a tab (`` ``).
- ROOT_TAG = 'file'#
The name of the root XML element tag (str)
- RELATIVE_PATH_TAG = 'relative_path'#
The name of the relative file path XML tag (str)
- MD5_TAG = 'md5_chksum'#
The name of the MD5 chksum XML tag (str)
- COMMENT_CHAR_TAG = 'comment_char'#
The name of the comment character XML tag (str)
- SKIPLINES_TAG = 'skiplines'#
The name of the skiplines XML tag (str)
- PVALUE_LOGGED_TAG = 'pvalue_logged'#
The name of the pvalue is log transformed XML tag (str)
- COMPRESSION_TAG = 'compression'#
The name of the file compression XML tag (str)
- ENCODING_TAG = 'encoding'#
The name of the file encoding tag XML (str)
- CHR_POS_SPEC_TAG = 'chrpos_spec'#
The name of the chromosome position XML tag (str)
- COLUMNS_TAG = 'columns'#
The XML tag for the GWAS file columns tag in the XML document (str)
- HAS_HEADER_TAG = 'has_header'#
The XML tag for indicating of the GWAS file has a header row (str)
- KEYS_TAG = 'keys'#
The XML tag for indicating key columns in the GWAS file (str)
- INFO_ATTRIBUTE = 'info'#
The attribute name indicating a mapping column should be used in the info field (str)
- MAP_TO_ATTRIBUTE = 'map_to'#
The attribute name indicating a mapping column should be used in the info field and be mapped to a different name (str)
- CSV_DOUBLE_QUOTE_TAG = 'doublequote'#
The XML tag name of the
csv
argument element for double quotes (str)
- CSV_ESCAPE_CHAR_TAG = 'escapechar'#
The XML tag name of the
csv
argument element for escape character (str)
- CSV_QUOTE_CHAR_TAG = 'quotechar'#
The XML tag name of the
csv
argument element for quote character (str)
- CSV_QUOTING_TAG = 'quoting'#
The XML tag name of the
csv
argument element for quoting (str)
- CSV_SKIP_INIT_WHITESPACE_TAG = 'skipinitialspace'#
The XML tag name of the
csv
argument element for skipping initial whitespace (str)
- CSV_STRICT_TAG = 'strict'#
The XML tag name of the
csv
argument element for strict (str)
- CSV_LINE_TERMINATOR_TAG = 'lineterminator'#
The XML tag name of the
csv
argument element for the line terminator (str)
- CSV_DELIMITER_TAG = 'delimiter'#
The XML tag name of the
csv
argument element for the delimiter (str)
- property is_validated#
Validate the analysis.
- validate()#
Validate the analysis.
- invalidate()#
Invalidate the study.
- property csv_kwargs#
Get the CSV keyword arguments, if these have not been set then we use the dialect that was detected during the header extraction if nothing has been set then we return an empty dictionary (dict).
- property has_header#
Should the input file be expected to have a header (bool).
- property header_is_known#
Flag to indicate if the header is known to the file. This is can be used to check if the header has been read from the file or defined by the user without having to call file.header which may instigate a call to get the header from the file (if file_check is True) (bool).
- property header#
Get the file header. If the file is not expected to have a header then this will be a list of column numbers the sample length as the first row (list of (str or int)).
- property normaliser#
Get the normaliser object (should be available after column_map is set) (gwas_norm.normalise.Normaliser or NoneType).
- property compression#
Get the compression (str).
- property skiplines#
Get the compression (int).
- property chrpos_spec#
Get the chrpos_spec (gwas_norm.normalise.ChrPosSpec or NoneType).
- property column_map#
Get the whole column mapping dictionary. Note this will always return a copy of the column map dictionary (dict).
- property md5_chksum#
Get the MD5 checksum for the file (str).
- property is_checked#
Has the MD5 of the file been checked (bool).
- property absolute_path#
Get the absolute path to the file (str).
- Raises:
FileNotFoundError – If no parent object has been bound to the file object.
- property basename#
Get the basename to the gwas file (str).
- property relative_path#
Get the relative path to the gwas file (str).
- property keys#
Get any defined key columns, this will be an empty list of none have been defined (list of gwas_norm.metadata.column.Column)
- property info#
Get any other info definitions or columns. These are set as info elements in the XML and differ from info_columns, that are attributes set on the column mappings (gwas_norm.metadata.info.Info)
- property info_columns#
Return any info columns that have been defined in the file column mappings, or any key columns. If none have been defined this will be an empty list. (list of (gwas_norm.metadata.column.MappingColumn or gwas_norm.metadata.column.Column)).
- set_header(header)#
An explicit setter for the header.
Note it is not recommended to set the header directly but might be useful if you do not have access to the GWAS files directly when you are building the XML files. This will only work if file checking is disabled.
- Parameters:
header (list of (str or int)) – The header for the file. If has_header is False this should be a list of ints.
- Raises:
ValueError – If file_check is True.
TypeError – If the header is not all ints (has_header=False) or strings (has_header=True).
- check_file()#
Perform a battery of checks on the file.
- Raises:
ValueError – If the user has defined the MD5 and it does not match that which is calculated by the file checks.
Notes
This is designed to highlight any differences between the spec of the file as defined in the file object and those of the actual file itself. First the compression is tested to see if the file is GZIP. The MD5 of the file is then checked. If not defined in the file object then it is set. if it is defined then it is checked for consistency.
If any inconsistencies are detected then warnings are issued rather than errors, i.e. we will believe the user as header detection is approximate.
- bind(parent)#
Bind a gwas file object with a parent object.
- Parameters:
parent (gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.StudyFile) – A parental object.
- Raises:
KeyError – If the study is already bound to a different study/analysis object.
Notes
The binding instigates a reciprocal adding of the file object to the parent.
- unbind()#
Unbind a gwas file from a parent object. This also removes the gwas_file from the parent.
- Returns:
parent – The unbound parental object.
- Return type:
gwas_norm.metadata.analysis.AnalaysisFile or gwas_norm.metadata.analysis.StudyFile
- to_xml()#
Convert the gwas file and all of it’s attributes to an XML element.
- Returns:
gwas_file_element – A file element built from the gwas file object and it’s attributes.
- Return type:
lxml.etree.Element
- classmethod from_xml(element, **kwargs)#
Parse the data from an XML element (parsed using lxml.etree).
- Parameters:
element (lxml.Element) – An lxml element that must have the root name file
- Returns:
file_element – A file object parsed from the XML.
- Return type:
gwas_norm.metadata.file.GwasFile.
- Raises:
KeyError – If the tag name of the root element is not file or key_file.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
file_class – The relevant file class for the element.
- Return type:
class of gwas_norm.metadata.file.GwasFile
- Raises:
KeyError – If the element does not have the tag name
file
orkey_file
.
gwas_norm.metadata.test
#
Enable test GWAS associations to be given in the XML. When encountered, after normalisation these are compared to the normalisation output to give a report on if they are the expected values.
- class gwas_norm.metadata.test.Test(chr_name, start_pos, effect_type, effect_size, effect_allele, other_allele=None, standard_error=None, pvalue=None, pvalue_logged=False, var_id=None)#
Bases:
_XmlBase
A class representing Test objects and moving them to/from XML.
- Parameters:
chr_name (str) – The chromosome name to be compared. Whilst this is tested by definition it will always pass as this is the way the tests are selected.
start_pos (int) – The start position to be compared. Whilst this is tested by definition it will always pass as this is the way the tests are selected.
effect_type (str) – The effect_type that will be used to transform the
effect_size
so it can be compared, valid values are, beta, log_or and or. Odds ratios are log transformed.effect_size (float) – The effect size that will be compared with the data in the
effect_size
column.effect_allele (str) – The effect allele that will be compared to the data in the
effect_allele
column.other_allele (str, optional, default: NoneType) – The other allele that will be compared to the data in the
other_allele
columnstandard_error (float, optional, default: NoneType) – The standard error that will be compared to the data in the
standard_error
columnpvalue (float or str, optional, default: NoneType) – The p-value that will be compared to the data in the
pvalue
column. This is expected to be a float but could be a string if precision is an issue. The pvalues are -log10 transformed internally using thedecimal
package.pvalue_logged (bool, optional, default: False) – Is the pvalue -log10 transformed. Even if
False
, the pvalue test is carried out on -log10 transformed pvalues.var_id (str, optional, default: NoneType) – The variant ID. Typically, this could be a variant rsID.
Notes
Input/Output XML looks like this:
<test> <id/> <effect_allele/> <effect_size/> <effect_type/> <standard_error/> <pvalue/> </test>
This also has methods for performing tests against the expected values in the Test object.
- ROOT_TAG = 'test'#
The name of the root XML element tag name (str)
- CHR_NAME_TAG = 'chr_name'#
The tag name for the chromosome name in the XML (str).
- START_POS_TAG = 'start_pos'#
The tag name for the start position in the XML (str).
- EFFECT_TYPE_TAG = 'effect_type'#
The tag name for the effect type in the XML (str).
- EFFECT_SIZE_TAG = 'effect_size'#
The tag name for the effect size in the XML (str).
- EFFECT_ALLELE_TAG = 'effect_allele'#
The tag name for the effect allele in the XML (str).
- VAR_ID_TAG = 'var_id'#
The tag name for the variant ID in the XML (str).
- OTHER_ALLELE_TAG = 'other_allele'#
The tag name for the other allele in the XML (str).
- STANDARD_ERROR_TAG = 'standard_error'#
The tag name for the standard error in the XML (str).
- PVALUE_TAG = 'pvalue'#
The tag name for the pvalue in the XML (str).
- PVALUE_LOGGED_TAG = 'pvalue_logged'#
The tag name for the pvalue is logged in the XML (str).
- EFFECT_SIZE_DELTA = 0.0001#
Allowed difference between the expected effect size and the normalised one (float)
- STANDARD_ERROR_DELTA = 0.0001#
Allowed difference between the expected standard error and the normalised one (float)
- LOG10_PVALUE_DELTA = 0.05#
Allowed difference between the expected -log10(pvalue) and the normalised one (float)
- TEST_PASS = 'PASS'#
Constant for a test pass (str)
- TEST_FAIL = 'FAIL'#
Constant for a test fail (str)
- class TestResult(test_id, test_type, expected_value, observed_value, delta, test_outcome)#
Bases:
tuple
A container for the result of a test (namedtuple)
- count(value, /)#
Return number of occurrences of value.
- delta#
Alias for field number 4
- expected_value#
Alias for field number 2
- index(value, start=0, stop=9223372036854775807, /)#
Return first index of value.
Raises ValueError if the value is not present.
- observed_value#
Alias for field number 3
- test_id#
Alias for field number 0
- test_outcome#
Alias for field number 5
- test_type#
Alias for field number 1
- test_row(row)#
Apply the test to a row.
- Parameters:
row (list of str or int) – The row to use for test lookup. This should have the elements represented in the
constants.STANDARD_COLUMNS
(in exactly the same order).- Returns:
test_results – A list of the results from all the comparisons.
- Return type:
list of gwas_norm.metadata.test.Test.TestResult
- test_chr_name(chr_name)#
Test the chromosome name against the expected value.
- Parameters:
chr_name (str) – The chromosome name to test.
- Returns:
test_result – The test result.
- Return type:
gwas_norm.metadata.test.Test.TestResult
- test_start_pos(start_pos)#
Test the start position against the expected value.
- Parameters:
start_pos (int) – The start position to test.
- Returns:
test_result – The test result.
- Return type:
gwas_norm.metadata.test.Test.TestResult
- test_effects(effect_allele, other_allele, effect_size)#
Test the effect allele/other allele and effect size against the expected values.
- Parameters:
effect_allele (str) – The effect allele to test.
other_allele (str) – The other (non-effect) allele to test.
effect_size (float) – The effect size to test.
- Returns:
test_results – The test results.
- Return type:
list of gwas_norm.metadata.test.Test.TestResult
Notes
In addition to testing the difference of the
effect_size
, an additional test of the effect sign is carried out. Also, this may swap the input effect/other allele and effect direction to carry out the test as there is no guarantee that the users effect allele is the same, in this way the effect size will fail if the alleles are wrong.
- test_standard_error(standard_error)#
Test the standard error against the expected value.
- Parameters:
standard_error (float) – The standard error to test.
- Returns:
test_result – The test result.
- Return type:
gwas_norm.metadata.test.Test.TestResult
- test_pvalue(pvalue)#
Test the pvalue against the expected value.
- Parameters:
pvalue (float) – The pvalue to test. Important, this should be -log10 transformed.
- Returns:
test_result – The test result.
- Return type:
gwas_norm.metadata.test.Test.TestResult
- test_var_id(var_id)#
Test the variant identifier against the expected value.
- Parameters:
var_id (str) – The variant identifier to test.
- Returns:
test_result – The test result.
- Return type:
gwas_norm.metadata.test.Test.TestResult
Notes
variant identifiers may fail quite a lot as they are re-mapped against the latest dbSNP ids.
- to_xml()#
Generate a XML element for the Test object.
- Returns:
test – A test element built from the Test object and it’s attributes.
- Return type:
lxml.etree.Element
- classmethod from_xml(element)#
Generate a gwas_norm.metadata.test_obj.Test object from the data in the XML element.
- Parameters:
element (lxml.etree.Element) – The element should have the tag name test
- Returns:
test – The
Test
object built from all the element.- Return type:
gwas_norm.metadata.test.Test
- Raises:
KeyError – If the name of the element is not test
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
test
.- Returns:
class – The relevant cohort class for the element.
- Return type:
class of gwas_norm.metadata.test.Test
- Raises:
KeyError – If the element does not have the required tag name.
gwas_norm.metadata.info
#
Info columns and definitions.
- class gwas_norm.metadata.info.Info(columns=None, definitions=None)#
Bases:
_XmlBase
A representation of the info elements within the analysis definition.
- Parameters:
columns (list of gwas_norm.metadata.column.Column, optional, default: NoneType) – The columns that will be used in the info field.
definitions (list of gwas_norm.metadata.phenotype.Definition, optional, default: NoneType) – The static definitions that will be used in the info field.
Notes
The
key
andinfo
attributes on the columns added to theInfo
object do not have any functionality with in theInfo
object.- ROOT_TAG = 'info'#
The root tag for the info element (str)
- property columns#
Get the info columns (list of gwas_norm.metadata.column.Column).
- property definitions#
Get the info definitions (list of gwas_norm.metadata.phenotype.Definition).
- add_column(col)#
Add an info column.
- Parameters:
col (gwas_norm.metadata.column.Column) – The column object to add.
- Raises:
TypeError – If
col
is not a sub-class of gwas_norm.metadata.column.Column.
- remove_column(col)#
Remove an info column.
- Returns:
col – The removed column.
- Return type:
gwas_norm.metadata.column.Column
- Raises:
ValueError – If the column is not an info column.
- add_definition(definition)#
Add an info definition.
- Parameters:
definition (gwas_norm.metadata.phenotype.Definition) – The definition object to add.
- Raises:
TypeError – If the definition is not a sub-class of gwas_norm.metadata.phenotype.Definition.
- remove_definition(definition)#
Remove an info definition.
- Returns:
definition – The removed definition.
- Return type:
gwas_norm.metadata.phenotype.Definition
- Raises:
ValueError – If the definition is not an info definition.
- to_xml()#
Convert the info object to an XML element.
- Returns:
info_element – An XML element built from the info object and it’s attributes.
- Return type:
lxml.etree.Element
- Raises:
ValueError – If there are no info columns/definitions defined.
- classmethod from_xml(element)#
Parse the data from an info XML element (parsed using lxml.etree).
- Parameters:
element (lxml.Element) – An lxml element that must have the root name info
- Returns:
info_obj – The info object representing the XML element.
- Return type:
gwas_norm.metadata.info.Info
- Raises:
KeyError – If the tag name of the root element is not info.
ValueError – If the info element does not contain any columns or definitions.
- classmethod get_class(element)#
Helper method that will determine the required file class for parsing based on the root tag in the
element
.- Parameters:
element (lxml.etree.Element) – A file carrying element built, it is expected to have the tag name
info
.- Returns:
info_class – The info class for the tag.
- Return type:
class of gwas_norm.metadata.info.Info
- Raises:
KeyError – If the element does not have the required tag name.
gwas_norm.metadata.column
#
The generic column objects.
- class gwas_norm.metadata.column.Column(col_name, info=False, map_to=None, dtype=None)#
Bases:
_XmlBase
,InfoHolderMixin
A representation of a column.
- Parameters:
col_name (str) – The column name in a source (un-normalised) GWAS file.
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the column name should be known as the
map_to
value in the info field. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.
- ROOT_TAG = 'column'#
The name of the root XML element tag (str)
- property name#
Get the column name, this is an alias for
Column.col_name
(str)
- to_xml()#
Convert the Column to an XML element.
- Returns:
element – A column element built from the
Column
object and it’s attributes.- Return type:
lxml.etree.Element
- classmethod from_xml(element)#
Parse the data from an XML column element.
- Parameters:
element (lxml.Element) – An lxml element where the tag name is
column
.- Returns:
column – A column object that represents the XML element.
- Return type:
gwas_norm.metadata.column.Column
- Raises:
KeyError – If the element does not have the tag
column
.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
parse_class – A class inheriting from
gwas_norm.metadata.column.Column
.- Return type:
class
- Raises:
KeyError – If the appropriate class can’t be found for the tag.
- DATA_TYPE_ATTRIBUTE = 'dtype'#
The name of the data type attribute of the column (str)
- INFO_ATTRIBUTE = 'info'#
The name of the info attribute of the column (str)
- MAP_TO_ATTRIBUTE = 'map_to'#
The name of the key attribute of the column (str)
- property dstruct#
Get the data structure value.
C
is a scalar.A
is an array, (str).
- property dtype#
Get the dtype value.
S
is a string value.F
is a float,I
is an integer (str).
- equals(other)#
Determine equality against another InfoHolderMixin containing object. This is based on
map_to
,dtype
anddstruct
values matching
- classmethod get_attributes(element)#
Get the attributes from an XML element.
- Parameters:
element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.
- Returns:
info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the
map_to
value in the info field and not as the name.dtype (str) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.
- property info#
Get the is info output value (bool).
- init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#
Initialise all of the info related values for the mixin.
- Parameters:
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the
map_to
value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.all_info_false (bool, optional, default: False) – If this is set to
True
, then if info isFalse
and themap_to
is defined, thenmap_to
it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.
- property map_to#
Get the column name remapping value (str or NoneType).
- set_attributes(element)#
Set the attributes into an XML element.
- Parameters:
element (lxml.etree.Element) – The element to add the attributes.
- class gwas_norm.metadata.column.MappingColumn(col_name, map_to_name, **kwargs)#
Bases:
Column
A representation of a mapping column.
- Parameters:
col_name (str) – The gwas-norm column name that is being mapped to, this will be set to the element tag name.
map_to_name (str) – The column name in a source (un-normalised) GWAS file that is being mapped to the
col_name
.info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the column name should be known as the
map_to
value in the info field. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.
- ROOT_TAG = 'mapping_column'#
The name of the root XML element tag (str)
- to_xml()#
Convert the MappingColumn to an XML element.
- Returns:
element – A column element built from the
MappingColumn
object and it’s attributes.- Return type:
lxml.etree.Element
- DATA_TYPE_ATTRIBUTE = 'dtype'#
The name of the data type attribute of the column (str)
- INFO_ATTRIBUTE = 'info'#
The name of the info attribute of the column (str)
- MAP_TO_ATTRIBUTE = 'map_to'#
The name of the key attribute of the column (str)
- property dstruct#
Get the data structure value.
C
is a scalar.A
is an array, (str).
- property dtype#
Get the dtype value.
S
is a string value.F
is a float,I
is an integer (str).
- equals(other)#
Determine equality against another InfoHolderMixin containing object. This is based on
map_to
,dtype
anddstruct
values matching
- classmethod from_xml(element)#
Parse the data from an XML mapping column element.
- Parameters:
element (lxml.Element) – An lxml element where the tag name is the name of a gwas-norm column.
- Returns:
column – A mapping column object that represents the XML element.
- Return type:
gwas_norm.metadata.column.MappingColumn
- classmethod get_attributes(element)#
Get the attributes from an XML element.
- Parameters:
element (lxml.etree.Element) – The element potentially containing info, map_to, dtype attributes.
- Returns:
info (bool) – Is the class acting as an info field.
map_to (str, optional, default: False) – If info is true map_to indicates that the info value defined in the class/column should be known as the
map_to
value in the info field and not as the name.dtype (str) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.
- property info#
Get the is info output value (bool).
- init_info_values(info=False, map_to=None, dtype=None, allow_info_false=False)#
Initialise all of the info related values for the mixin.
- Parameters:
info (bool, optional, default: False) – Is the class acting as an info field.
map_to (str, optional, default: NoneType) – If info is true map_to indicates that the name defined in the class should be known as the
map_to
value in the info field and not as the name. Must only contain alpha numeric characters and underscores with no spaces.dtype (str, optional, default: NoneType) – The datatype definition string.
S
is a string value.F
is a float,I
is an integer.A
represents an array andC
a scalar. soSA
would be a string array.NoneType
is interpreted as anSC
.all_info_false (bool, optional, default: False) – If this is set to
True
, then if info isFalse
and themap_to
is defined, thenmap_to
it is still output to the XML. This is for phenotype definitions where the map_to value has meaning even if not outputting to the info column.
- property map_to#
Get the column name remapping value (str or NoneType).
- property name#
Get the column name, this is an alias for
Column.col_name
(str)
- set_attributes(element)#
Set the attributes into an XML element.
- Parameters:
element (lxml.etree.Element) – The element to add the attributes.
- classmethod get_class(element)#
Get the appropriate parse class for the XML element tag.
- Parameters:
element (lxml.etree.Element) – The element to check against.
- Returns:
parse_class – A class inheriting from
gwas_norm.metadata.column.MappingColumn
.- Return type:
class
Notes
This will always return
gwas_norm.metadata.column.MappingColumn
.
gwas_norm.metadata.convert
#
- gwas_norm.metadata.convert.convert_xml(old_xml, new_xml)#
Convert an old-style XML file to a new style XML file.
- Parameters:
old_xml (str) – The old style XML file, can be compressed.
new_xml (str) – The new style output XML file, will be compressed if the extension is
.gz
.verbose (bool, optional, default: False) – Log output