Data File Format

Next: q2c Input Format, Previous: Portable File Format, Up: Top

Appendix D Data File Format

PSPP necessarily uses the same format for system files as do the products with which it is compatible. This chapter is a description of that format.

There are three data types used in system files: 32-bit integers, 64-bit floating points, and 1-byte characters. In this document these will simply be referred to as int32, flt64, and char, the names that are used in the PSPP source code. Every field of type int32 or flt64 is aligned on a 32-bit boundary.

The endianness of data in PSPP system files is not specified. System files output on a computer of a particular endianness will have the endianness of that computer. However, PSPP can read files of either endianness, regardless of its host computer's endianness. PSPP translates endianness for both integer and floating point numbers.

Floating point formats are also not specified. PSPP does not translate between floating point formats. This is unlikely to be a problem as all modern computer architectures use IEEE 754 format for floating point representation.

The PSPP system-missing value is represented by the largest possible negative number in the floating point format; in C, this is most likely -DBL_MAX. There are two other important values used in missing values: HIGHEST and LOWEST. These are represented by the largest possible positive number (probably DBL_MAX) and the second-largest negative number. The latter must be determined in a system-dependent manner; in IEEE 754 format it is represented by value 0xffeffffffffffffe.

System files are divided into records. Each record begins with an int32 giving a numeric record type. Individual record types are described below: