pyaxis

Pcaxis Parser module

This module obtains a pandas DataFrame of tabular data from a PC-Axis file or URL. Reads data and metadata from PC-Axis [1] into a dataframe and dictionary, and returns a dictionary containing both structures.

Example

from pyaxis import pyaxis

px = pyaxis.parse(self.base_path + ‘px/2184.px’, encoding=’ISO-8859-2’)

[1]https://www.scb.se/en/services/statistical-programs-for-px-files/

..todo:

meta_split: "NOTE" attribute can be multiple, but only the last one
is added to the dictionary
pyaxis.build_dataframe(dimension_names, dimension_members, data_values)

Builds a dataframe by adding the cartesian product of dimension members, plus the series of data.

Parameters:
  • dimension_names (list of string) –
  • dimension_members (list of string) –
  • data_values (list of string) –
Returns:

data (pandas dataframe)

pyaxis.get_dimensions(metadata)

Reads STUB and HEADING values from metadata dictionary.

Parameters:metadata – dictionary of metadata
Returns:dimension_names (list) dimension_members (list)
pyaxis.metadata_extract(pc_axis)

Extracts metadata and data from pc-axis file contents.

Parameters:pc_axis (str) – pc_axis file contents.
Returns:each item conforms to an ATTRIBUTE=VALUES pattern data (string): data values
Return type:metadata_attributes (list of string)
pyaxis.metadata_split_to_dict(metadata_elements)

Splits the list of metadata elements into a dictionary of multi-valued keys.

Parameters:metadata_elements (list of string) – pairs ATTRIBUTE=VALUES
Returns:{‘attribute1’: [‘value1’, ‘value2’, … ], …}
Return type:metadata (dictionary)
pyaxis.parse(uri, encoding, timeout=10)

Extracts metadata and data sections from pc-axis.

Parameters:
  • uri (str) – file name or URL
  • encoding (str) – charset encoding
  • timeout (int) – request timeout in seconds; optional
Returns:

dictionary of metadata and pandas df.

METADATA: dictionary of metadata DATA: pandas dataframe

Return type:

pc_axis_dict (dictionary)

pyaxis.read(uri, encoding, timeout=10)

Reads a text file from file system or URL.

Parameters:
  • uri (str) – file name or URL
  • encoding (str) – charset encoding
  • timeout (int) – request timeout; optional
Returns:

file contents.

Return type:

raw_pcaxis (str)

pyaxis.uri_type(uri)

Determines the type of URI.

Parameters:uri (str) – pc-axis file name or URL
Returns:‘URL’ | ‘FILE’
Return type:uri_type (str)