xtab.py is a Python module and command-line program that rearranges data from a normalized format to a crosstabulated format. It takes data in this form:
Station | Date | Value |
---|---|---|
WQ-01 | 2006-05-23 | 4.5 |
WQ-02 | 2006-05-23 | 3.7 |
WQ-03 | 2006-05-23 | 6.8 |
WQ-01 | 2006-06-15 | 9.7 |
WQ-02 | 2006-05-15 | 5.1 |
WQ-03 | 2006-06-15 | 7.2 |
WQ-01 | 2006-07-19 | 10 |
WQ-02 | 2006-07-19 | 6.1 |
WQ-03 | 2006-07-19 | 8.8 |
and rearranges it into this form:
Station | 2006-05-23 | 2006-06-15 | 2006-07-19 |
---|---|---|---|
WQ-01 | 4.5 | 3.7 | 6.8 |
WQ-02 | 9.7 | 5.1 | 7.2 |
WQ-03 | 10 | 6.1 | 8.8 |
Input and output are both text (CSV) files.
Capabilities
You can use the xtab program to:
- Rearrange data exported from a database to better suit its subsequent usage in statistical, modeling, graphics, or other software, or for easier browsing.
- Convert a single file of data to a SQLite database.
- Check for multiple rows of data in a text file with the same key values.
Required and Optional Arguments
Required Arguments
Optional Arguments
Notes
- The xtab program does not carry out any summarization or calculation on the data values, and therefore there should be no more than one data value to be placed in each cell of the output table. More than one value per cell is regarded as an error, and in such cases only one of the multiple values will be put in the cell.
- Multiple data values can be crosstabbed, as specified by multiple column names supplied with the -c argument.
- Column names should be specified in the same case as they appear in the input file.
- The -f option creates a temporary file in the same directory as the output file. This file has the same name as the input file, but an extension of '.sqlite'.
- There are no inherent limits to the number of rows or columns in the input or output files. (So the output may exceed the limits of some other software.)
- Missing required arguments will result in an exception rather than an error message, whatever the error logging option. If no error logging option is specified, then if there are multiple values to be put in a cell (the most likely data error), a single message will be printed on the console. If an error logging option is specified, then the SQL for all individual cases where there are multiple values per cell will be logged. The occurrence of multiple values for a cell can also be seen in the output from the -q option.
- The -d option produces output in a format that is intended to facilitate visual examination, rather than automated processing by subsequent software.
Copyright and License
Copyright (c) 2008, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.