Phoenix File System

Documentation Version: 1.32
---------------------------------------------------------------------------

Table Of Contents

   * Physical Media Abstraction Layer
   * Media Partitioning
        o PC-compatible Master Boot Record
        o PhoenixFS Segments
        o Cylinder Table
   * Volume Abstraction Layer
   * Volume Sector Allocation Layer
        o Free Space Descriptors
        o Free Space Descriptor Location Table
   * File Allocation Layer
        o Tree Node Descriptor
        o File Nodes
             + Directories
             + Symbolic Links
             + Rights List
             + Extended Attributes
        o Bad Sector List
   * Directory Structure
   * SuperBlock
   * Boot Record
   * Directory Structure

---------------------------------------------------------------------------

Physical Media Abstraction Layer - This layer abstracts the physical format
of the media into equal-sized logical allocation units called Logical
Sectors using the following guidelines:

   * Logical Sectors are numbered consecutively beginning at 0.
   * No Logical Sector number can be greater than 264-1.
   * Logical Sector 0 always refers to the lowest accessible physical
     sector.
   * Lower Logical Sector numbers are considered to be toward the
     'beginning' of the device.
   * Logical Sector numbers are consistent as follows:
        o For SCSI and other LBA devices, LBA sector numbers are used.
        o For devices which require Cylinder/Head/Sector, the formula:
          SectorNumber = (CylinderNumber * HeadCount + HeadNumber) * SectorCount + SectorNumber - 1
          must be used.
   * All Logical Sectors represent 512 bytes of physical storage each.
   * The latency time seeking between any two consecutive Logical Sectors
     is minimum.
   * Logical Sectors cannot refer to overlapping regions of the physical
     media.

By definition, a Logical Sector is a single allocatable unit for the
physical media addressable via an unsigned 64-bit sector number. The number
of Logical Sectors is determined by the capacity of the media. The Physical
Media Abstraction Layer determines Logical Sector size, SectorSize, which
is the number of 8-bit bytes each Logical Sector represents of the physical
media; currently only a SectorSize of 512 bytes is supported. Logical
Sector sizes less than 512 bytes will never be supported. In addition, the
Physical Media Abstraction layer determines the number of Logical Sectors
used to represent the physical media, NumSectors; it also determines the
function which maps Logical Sectors to physical regions of the media given
the above criteria.

It is the sole responsibility of this layer to translate Logical Sector to
physical regions of the media. This is the only layer that should have to
concern itself with the details of accessing the physical media. All higher
layers have no idea after boot of the working of the hardware nor the
physical layout of the media, but rather perform all operations in terms of
Logical Sectors assuming the above guidelines for Logical Sector
properties.

Additional requirements have to be made to accomidate the boot process.
When booting the BIOS int13h functions have to be called to load sectors
into memory. This requires that physical sectors be 512 bytes each, it also
requires that sectors be able to be accessed using Cylinder/Head/Sector
unless LBA mode is enabled and LBA BIOS extensions are present. For devices
which support both LBA and Cylinder/Head/Sector addressing (such as many
modern IDE drives and SCSI drives with IDE compatibility ROM), LBA sector
numbers must correlate with the Cylinder/Head/Sector sector numbers
perfectly. This is only truly a concern with boot devices, but should not
affect the performance of any device.

Media Partitioning - The device is scanned to see if it contains a
supported partition table format, if no supported partition table is found,
the entire device is considered a single partition. Currently, only a
PC-compatible master boot record and partition table is supported; it is
located in the first accessible sector, Logical Sector 0. This allows the
peaceful coexistence of the Phoenix File System on one partition along with
any other PC-compatible file system, including FAT, VFAT, NTFS, and HPFS,
on another partition of the same physical device. The format of the
PC-compatible master boot record and partition table is:

  PC-compatible Partition Record Entry:
  Offset   Size           Field Name     Description
    00h    BYTE           BootIndicator  (80h) Partition is the one the system is booted from
                                         (00h) Partition is not the boot partition
    01h    BYTE           StartHead      Partition Start Head
    02h    BYTE
             bits   0-5 : StartSector    Partition Start Sector
             bits   6-7 : StartCylinderH Partition Start Cylinder bits 8-9
    03h    BYTE           StartCylinderL Partition Start Cylinder bits 0-7
    04h    BYTE           SystemID       FileSystem ID

    05h    BYTE           EndHead        Partition End Head
    06h    BYTE
             bits   0-5 : EndSector      Partition End Sector
             bits   6-7 : EndCylinderH   Partition End Cylinder bits 8-9
    07h    BYTE           EndCylinderL   Partition End Cylinder bits 0-7
    08h    DWORD          SectorsBefore  Sectors preceding partition
    0Ch    DWORD          Length         Length of partition in sectors
  ----------------
  Total: 16 bytes

  PC-compatible Master Boot Record format:
  Offset   Size           Field Name     Description
     0h    446 BYTES      Bootstrap      Master Bootstrap loader program
   1BEh    4 PREs         PartitionTable Array of 4 partition records
   1FEh    2 BYTES        Signature      Boot Record Signature (AA55h)
  ----------------
  Total: 512 bytes

This document only discusses the structure of a partition dedicated to the
Phoenix File System. The Phoenix File System subdivides each partition into
one or more Segments; Segments can be grouped into Volumes. A single Volume
may contain Segments from several different partitions on several different
storage devices. Volumes are used to store files and associated data in the
Phoenix File System. Segments can also be used as raw storage space for
virtual memory.

Each logical sector in a given Segment can be referred to as a Segment
Sector; Segment Sector number 0 refers to the first logical sector in the
given Segment and the number of Segment Sectors is equal to the number of
logical sectors in the Segment. Once segments are grouped into Volumes, a
Volume Sector is used to make the entire volume look like a contiguous
series of equal-sized storage blocks. Each Volume Sector can be comprised
of 1 or more Segment Sectors, the number of which must be a power of 2 and
all Volume Sectors must be the same size. Volume Sectors begin being
numbered with the first Segment Sector on the first segment in the volume.
The number of Volume Sectors is the sum of the number of Segment Sectors on
each segment that constitutes the volume divided by the number of Segment
Sectors per Volume Sector. This is discussed in greater detail as part of
the Volume Abstraction Layer.

The first Logical Sector of a Phoenix File System partition holds a short
bootstrap loader and a PFS Segment Table. The purpose of the PFS Segment
Table is to allow a Phoenix File System partition to be further subdivided
in logical segments. Every Phoenix File System partition always contains at
least one Segment. Furthermore, one or more Segments compose a Volume. The
format of this initial sector is:

  Phoenix File System Segment Table entry:
  Offset   Size           Field Name     Description
     0h    1 QWORD        StartSector    Starting logical sector of this Segment
     8h    1 QWORD        Length         Number of logical sectors in this Segment
    10h    20 BYTES       NextDrive      DriveID of next Segment in Volume (or 0 if last)
    24h    1 BYTE         NextSegment    Segment number of next Segment in Volume
    25h    1 BYTE         NumSegments    Number of Segments belonging to this same Volume
    26h    1 BYTE         Sequence       Index number of Segment within Volume
    27h    1 BYTE         Flags
             bit      0 : BootVolume     Whether the Volume this Segment belongs to is bootable
             bit      1 : Interleaved    Whether the Segments in this same Volume are interleaved
                                         (also called data striping)
                      2 : InUse          Indicates the System Segment Table entry contains valid information
                      3 : FinalSegment   Indicates the given Segment is the last in its Volume
                      4 : Swap           Indicates the given Segment is dedicated as a swap partition
             bits   5-7 : (reserved)     must be 0
   ------------
   Total: 40 bytes

  Phoenix File System Segment Table and Bootstrap Loader format:
  Offset   Size           Field Name     Description
     0h    138 BYTES      BootstrapInit  Bootstrap loader initialization
    8Ah    1 BYTE         Flags          File system and device flags
             bits   0-7 : (reserved)     must be 0
    8Bh    1 BYTE         BootSectors    Number of logical sectors in the bootstrap loader
    8Ch    1 DWORD        MagicNumber    Special number to help discern a Segment Table from other
                                         data should the file system become corrupt, equal to
                                         31676573h ("seg1")
    90h    8 STEs         Segments       Segment Table
   1E0h    1 QWORD        CylinderTable  Logical sector number where Cylinder Table for device resides
                                         or 0 if no Cylinder Table is present
   1E8h    1 WORD         SectorSize     Logical Sector Size for this device
   1EAh    20 BYTES       DiskID         Serial number used to identify drive
   1FEh    1 WORD         Signature      Boot Record Signature (AA55h)
   ------------
   Total: 512 bytes*

* The Segment Table and Bootstrap Loader sector is always 512 bytes, even
if the device's logical sectors are larger than 512 bytes. Data stored in
the remainder of the Segment Table and Bootstrap Loader block is undefined
by the Phoenix File System and may be used however the operating system
wishes. The function of the Bootstrap loader initialization is to locate
the bootable PhoenixFS Segment, load the first sector of the bootstrap from
the Segment into memory, and then transfer control of the processor to the
bootstrap program. The BootSectors field should always at least 1 and if it
is greater than 1, then the Bootstrap loader initialization should load
each sector from the storage device consecutively into memory. This allows
a device with 512-byte sectors and 2 BootSectors to be practically
indistinguisable during the boot process from a device with 1024-byte
sectors, if one were supported, and a value of 1 in the BootSectors field.
Again, the format of the data in these extended boot sectors is undefined
and implementation-dependent.

The Swap flag is used to indicate that the given Segment is reserved to be
used for virtual memory. This can be much more efficient than maintaining a
swap file on a PFS volume. Segments reserved for virtual memory are not
formatted as PFS volumes and the data in the Segment is considered garbage
and may be overwritten in an operating-system specific manner. This
accounts for the fact that different operating systems may have different
methods of providing virtual memory and that data stored in virtual memory
is never consistent between system reboots. Segments reserved for virtual
memory are always local to the machine and may not be shared in any
fashion. The NextDrive, NextSegment, Sequence, BootVolume, and Interleaved
fields should all be set to 0 and the NumSegments, InUse, and FinalSegments
should always be 1 whenever the Swap flag is set.

If a Segment Table Entry is unused, then all fields should be set to 0. The
InUse flag would then indicate that Segment Table Entry was empty and this
condition could be verified by examing the other fields and discovering
them all to be 0 also.

A bootstrap loader is always present even if the partition is not flagged
as bootable in the partition table. If the partition is not bootable,
however, it is suggested that the bootstrap loader simply display some sort
of error message should it ever be executed.

It is also important to point out that if a device contains no system
partition table supported by PhoenixFS then the entire device is treated as
a single partition. As such, if the initial segment(s) corresponds to a
Phoenix File System Segment Table and Bootstrap Loader, then the single
partition is considered to be PhoenixFS. This can be utilized in formatting
removable diskettes with PhoenixFS or otherwise dedicating entire mass
storage devices to PhoenixFS.

  Diagram of system partitions and PhoenixFS Segments:

                         PhoenixFS Segment Table
                         (First sector in partition)
  Master Boot Record     .-----------.
  (Logical sector 0)    /| bootstrap |      Segment
  .-------------.      / | loader    |     .------------.
  |    boot     |     /  |-----------|    /|            |
  |    code     |    /   | Segment 0 |   / |            |
  |     .       |   /    |-----------|  /  |            |
  |     .       |  /     | Segment 1 | /   |            |
  |     .       | /      |-----------|/    |            |
  |-------------|/       | Segment 2 |     .            .
  | partition 0 |        |-----------|\    .            .
  |             |        | Segment 3 | \   .            .
  |-------------|\       |-----------|  \  |            |
  | partition 1 | \      | Segment 4 |   \ |            |
  |             |  \     |-----------|    \|            |
  |-------------|   \    | Segment 5 |     `------------'
  | partition 2 |    \   |-----------|
  |             |     \  | Segment 6 |
  |-------------|      \ |-----------|
  | partition 3 |       \| Segment 7 |
  |             |        `-----------'
  `-------------'

The CylinderTable field holds the location of an optional Cylinder Table
which can be used to increase file system performance by noting between
which sequential logical sectors occurs greater seek latency. No specific
knowledge of the storage device is required but rather such a table is
constructed by empirically observing seek latencies between adjacent
sectors and is stored for use between file system sessions. The name
originates from the fact that most storage devices store information
logically in cylinders broken into sectors and that seeking from a sector
in one cylinder to a sector in another cylinder takes longer than seeking
between two sectors in the same cylinder. The Cylinder Table would then end
up storing the first sector in each cylinder as the seek between the last
sector of the previous cylinder and it would take longer than a seek
between any two sectors in the same cylinder. However, the Cylinder Table
does not have to be restricted to only describing the locations of the
beginning of cylinders, but can be used to indicate any high latency seek
between logically consecutive sectors. Many SCSI drives have the ability to
remap good sectors logically over sectors that may go bad during normal
operation, any seeks to or from such sectors would have much higher latency
than would be expected if the translation were not being made by the
device.
The file system can then use this Cylinder Table to determine how to best
allocate logical sectors to a single file. For example, the file system may
be discouraged from spreading a file across cylinder groups in order to
minimize seek latency when accessing the file.

The Cylinder Table itself is simply a series of Logical Sector Numbers
indicating the destination sector of a sector-to-sector seek between two
consecutive logical sectors. The series is terminated by a Logical Sector
Number of 0. The table may span as many sectors as is required to store the
entire table, but the sectors must be contiguous.

Volume Abstraction Layer - This layer manages the grouping of one or more
Segments into logical Volumes. Furthermore, this layer makes a collection
of Segment Sectors in distinct Segments appear as a single array of Volume
Sectors; each Volume Sector represents a portion of the underlying physical
media. This is the final layer of sector-based abstraction present in the
Phoenix File System.

As mentioned before, each Volume Sector can be comprised of 1 or more
Segment Sectors, the number of which must be a power of 2 and all Volume
Sectors must be the same size; this is the VolumeSectorSize. Since the only
supported size of a Segment Sector is 512 bytes, the minimum size of a
Volume Sector is also 512 bytes. Volume Sectors begin being numbered at 0
which correlated with the first Segment Sector on the first segment in the
volume. The number of Volume Sectors, NumVolumeSectors, is the sum of the
number of Segment Sectors on each segment that constitutes the volume
divided by the number of Segment Sectors per Volume Sector. The boot
Volume, the Volume from which the operating system is loaded, may not span
multiple Segments.

The grouping of Segments into Volumes is indicated by the Segment Table
Entry for each Segment. Each Segment Table Entry specifies the next drive
and next segment, NextDrive and NextSegment respectively, belonging to the
same volume. The NextDrive field holds the serial number of the device on
which resides the next Segment of this Volume or 0 if the given Segment is
the last Segment in the Volume (this can also be verified by examining the
FinalSegment flag). Serial numbers should be unique among all storage
devices on the same system. The NextSegment field holds the Segment number
of the Segment which is next in this Volume; the 5 high bits of a Segment
number determine the system partition on the device and low 3 bits
determine the Segment within the Phoenix File System partition. An error
occurs when a partition which does not exist is specified, the partition or
Segment specified is unused, or the partition specified has a SystemID
other than PhoenixFS (B9h). Two Segments belonging to the same Volume
should not reside on the same physical device, although if two Segments
belonging to a single Volume are found to be on the same physical device an
error does not occur.

In addition, each Segment Table Entry also indicates the number of Segments
which comprise the Volume that is belongs to, NumSegments. An error occurs
if any two Segments supposedly part of the same Volume have different
values for NumSegments. Each Segment Table has a Sequence value which gets
incrementally higher from Segment to Segment as the chain of Segments in a
Volume is traced. A gap in Sequence values between two Segments of the same
volume indicates that a Segment is missing and is considered an error. The
Sequence value also determines which Volume Sectors reside on the given
Segment. The way in which Volume Sectors are arranged across Segments is
determined by the Interleaved flag:

     If the Interleaved flag is clear then each Segment is responsible
     for a consecutive number of Volume Sectors corresponding to the
     size of the Segment; the order of the Segments is determined by
     the Sequence value. For example, if two Segments belong to a
     single Volume, one Segment with 50,000 logical sectors and a
     Sequence value of 0, the other with 30,000 logical sectors and a
     Sequence value of 1, and each volume sector was composed of 2
     Segment Sectors, then the Volume would consist of 40,000 Volume
     Sectors. The first 25,000 Volume Sectors, numbered 0 through
     24,999, would be located on the first Segment and the last 15,000
     Volume Sectors, numbered 25,000 through 39,999, would be located
     on the second Segment.

     If the Interleaved flag is set then the Phoenix File System
     scatters data across all the Segments in the Volume in a uniform
     pattern such that individual read and write operations can be
     fulfilled cooperatively by all the underlying physical devices
     providing storage for that Volume. This requires that each
     Segment in the Volume represent exactly the same number of
     logical sectors. The Sequence value determines which modulus of
     the volume sector number and the NumSegments refers to the
     particular Segment. For example, if three Segments belong to a
     single volume, and each volume sector was composed of 2 Segment
     Sectors; a write to volume sector 3401 would be written to the
     Segment with Sequence value 2 while a write to volume sector 3402
     would be written to the Segment with Sequence value 0. This is
     because the volume segment number 3401 modula the number of
     Segments in the Volume, 3, yields 2 and the volume Segment number
     3402 modula the number of Segments in the Volume, 3, yields 0.
     Note that the entirety of the volume sector is written to the
     same Segment, event if the number of Segment Sectors per volume
     sector is greater than 1. It would be possible to further
     interleave the Segment Sectors which compose a volume sector if
     and only if the number of Segments per volume is a power of 2 and
     all Segments in the Volume have the same logical sector size.
     Currently this additional interleaving is not currently supported
     by the Phoenix File System but may be implemented in a future
     version.

Volume are the basic unit of all high level file operations. This layer of
abstraction allows for Volumes to be independent of the underlying storage
devices.

Volume sector 0, the first sector in the Volume, holds the Volume
Descriptor for the Volume as well as a boot stub program. The boot stub is
the last phase of the boot process before turning control over to the
kernel loader; its function is to locate and load the kernel loader into
memory and then turn control over to the kernel loader to actually start
the operating system. The Volume Descriptor holds information about the
Volume and has the format:

  Format of a Volume Descriptor:
  Offset   Size           Field Name     Description
     0h    4 BYTES        BootStubStart  Short intrasegment jump to real start of boot stub
     4h    1 BYTE         StubSectors    Number of logical sectors in the boot stub program
     5h    1 BYTE         Flags
             bit      0 : Dirty          Set to 1 when the file system is initialized; set to 0 when
                                         file system is properly shutdown.
             bits   1-7 : (reserved)     must be 0
     6h    1 BYTE
             bits   0-3 : ClusterSize    Log base 2 of size of a Volume Sector in 512-byte segments,
                                         for example 0 indicates Volume Sectors are each 512 bytes
                                         while 15 indicates Volume Sectors are each 16,777,216 bytes.
             bits   4-7 : (reserved)     must be 0
     7h    1 BYTE         (reserved)     must be 0
     8h    1 QWORD        VolumeSize     Number of Volume Sectors in this Volume
    10h    1 QWORD        SuperBlock     Volume sector number where SuperBlock is located
    18h    1 DWORD        DateCreated    Date/Time Volume was created
    1Ch    1 DWORD        (reserved)     must be 0
    20h    64 BYTES       VolumeLabel    Short name associated with the volume (null terminated string)
    60h    x BYTES        BootStub       Minimum of 410 bytes of boot stub program
  ----------------
  Total: 512 bytes minimum

At the point in the boot process when the Volume Descriptor is loaded into
memory, it is read as a single Segment Sector, without knowledge of the
number of Segment Sectors per volume sector. The StubSectors field
determines the number of Segment Sectors the boot stub program occupies;
this field must always be at least 1 and must be a multiple of the number
of Segment Sectors per volume sector. The first task of the boot stub
should be to load any additional boot stub sectors into memory
consecutively after the first boot stub sector; this should allow for
linear program execution.

A volume sector is composed of one or more Segment Sectors. The number of
Segment Sectors per volume sector is determined by dividing the size of a
volume sector in bytes, ClusterSize, by the size of a logical sector. The
size of a logical sector can vary from Segment to Segment that belongs to
the same Volume, the only requirement being that the size of a volume
sector be at least as large as a logical sector on any Segment that
composes the Volume. In which case, the number of Segment Sectors per
volume sector can also vary from Segment to Segment.

Each Volume in a system should have a distinct VolumeLabel, if not, once
loaded the Operating System is responsible for alerting the user and
correcting the conflict. The VolumeLabel is a 31 character long Unicode
string, null-terminated. When determining a label conflict, case is not
important.

Volume Sector Allocation - This layer exists to keep track of which volume
sectors have been allocated to hold data, which are available, and which
are unusable (bad).

Single volume sectors called Free Space Descriptors are used to determine
whether a number of sectors are in use or available for use. This is done
by using each bit in the Free Space Descriptor to represent whether a
single successive volume sector is in use or not (0 indicates the
respective volume sector is in use, 1 indicates it is not). The number of
volume sectors accounted for per Free Space Descriptor depends on the size
of a volume sector (since a Free Space Descriptor is exactly one volume
sector in size), but can be determined using the formula
     SectorsPerFSD = VolumeSectorSize * 8
Traditionally, PCs have used 512-byte sectors, using a VolumeSectorSize
equal to a single 512-byte sector implies that 512*8, or 4096, volume
sectors could be accounted for per Free Space Descriptor. In addition, it
needs to be noted that the volume sectors which are used to hold Free Space
Descriptors themselves have to be accounted for like any other volume
sector. The number of Free Space Descriptors varies based on the number of
volume sectors, NumVolumeSectors. Every volume sector MUST be accounted for
using a Free Space Descriptor, and it is not possible for the sectors
accounted for by Free Space Descriptors to overlap. Volume sectors used by
the Free Space Descriptor Location Table and by the Free Space Descriptors
are considered in use and therefore are not available for allocation.

The location of each of the Free Space Descriptors is stored as a series of
64-bit Volume Sector numbers in a consecutive series of sectors ideally
stored near the beginning of the storage media. This series of sectors is
called the Free Space Descriptor Location Table and each 64-bit entry is
the Volume Sector number of the Free Space Descriptor which accounts for a
successive series of volume sectors. For example, if VolumeSectorSize is
equal to 512 bytes, SectorsPerFSD would then be 4096, as such the first
entry in the Free Space Descriptor Location Table (FSDLT) would hold the
Volume Sector number of the Free Space Descriptor accountable for the first
4096 sectors (numbered 0 through 4095) and the next entry would hold the
Volume Sector number of the Free Space Descriptor accountable for the next
4096 sectors (numbered 4096 through 8091), etc.

      FSDLT         Free Space Descriptors (FSDs)
      .----.
    1 |  o--------->|100010010...100101|  Allocation map for sectors 0 - 4095
      |----|
    2 |  o--------->|000010101...110001|  Allocation map for sectors 4096 - 8091
      |----|
    3 |  o--------->|100100101...001000|  Allocation map for sectors 8092 - 12287
      |----|
      .    .
      .    .
      |----|
    N |  o--------->|001001000...001001|  Allocation map for sectors 4096*N - 4096*(N-1)-1
      `----'

Where N is the number of entries in the Free Space Descriptor table (equal
to NumVolumeSectors / SectorsPerFSD, or in this example, NumVolumeSectors /
4096).

The Free Space Descriptor Location Table may span as many sectors as it
needs in order to hold the location of all of the Free Space Descriptors,
but the sectors must be contiguous. As shown in the table below, the Free
Space Descriptor Location Table is very efficient at describing the media
and therefore the FSDLT will be relatively small (for example, only 4
kilobytes per gigabyte of capacity given a VolumeSectorSize of 512 bytes).
For this reason, it is suggested that any OS implementing the Phoenix File
System load and store a copy of the entire Free Space Descriptor Location
table in memory for quick reference.

                                            FSD         Sectors
                                            location    accountable
                                            entries per per sector
                                            sector of   of Free      Bytes accountable
                                            Free Space  Space        per sector of
                               Bytes        Descriptor  Descriptor   Free Space
                               accounted forLocation    Location     Descriptor
VolumeSectorSize SectorsPerFSD per FSD      Table       Table        Location Table

        256 bytes 2048 sectors 524,288 bytes 32 entries       65,536  16,777,216 bytes
                                                             sectors

        512 bytes 4096 sectors     2,097,152 64 entries      262,144 134,217,728 bytes
                                       bytes                 sectors

       1024 bytes8,192 sectors     8,388,608128 entries    1,048,576     1,073,741,824
                                       bytes                 sectors             bytes

       2048 bytes       16,384    33,554,432256 entries    4,194,304     8,589,934,592
                       sectors         bytes                 sectors             bytes

       4096 bytes       32,768   134,217,728512 entries   16,777,216    68,719,476,736
                       sectors         bytes                 sectors             bytes

       8192 bytes       65,536   536,870,912       1024   67,108,864   549,755,813,888
                       sectors         bytes    entries      sectors             bytes

      16384 bytes      131,072 2,147,483,648       2048  268,435,456 4,398,046,511,104
                       sectors         bytes    entries      sectors             bytes

The suggested arrangement for the Free Space Descriptors in relation to the
volume sectors they describe is that for even indexed Free Space
Descriptors the Free Space Descriptor is located in the first sector it
describes and for odd indexed Free Space Descriptor the Free Space
Descriptor is located in the last sector it describes; this maximizes the
number of contiguous sectors for allocation and minimizes the distance, and
thus the seek time, between Free Space descriptors and the data they
describe. This is by no means the only way to arrange the Free Space
Descriptors, Free Space Descriptors need not even reside in the region they
describe, but whenever possible it would be considered desireable to place
Free Space Descriptors near the sectors they account for in order to
minimize access times.

Bad Sector Recovery: when a given media is first formatted, the Free Space
Descriptors are located only in "good" sectors (ie. not bad) and bad
sectors are marked as used and recorded in a special File Allocation Layer
- This layer exists to group volume sectors into units called files, a file
is a collection of information that logically related. In the Phoenix File
System, files are described using a tree structure where that tree leafs
hold the actual file information. Each node and leaf of the data tree is
described using a Tree Node Descriptor:

  Structure of a Tree Node Descriptor:
  Offset   Size           Field Name     Description
    0h     QWORD
            bit       0 : Type           (1) Descriptor refers to SubTree block (internal node)
                                         (0) Descriptor refers to data block (leaf)
            bits   1-63 : Location       Volume Sector number for block location
    8h     QWORD          Length         If Type is 1, is total number of bytes described by SubTree
                                         If Type is 0, is number of bytes in data block
  ----------------
  Total: 16 bytes

Where each block of information is exactly 1 volume sector in size and can
hold information either about the files contents (a tree leaf) or about the
location of additional blocks (a tree node). If a sector is a data block,
Length bytes of the sector are considered to be part of the file's
contents. If a sector is a SubTree block, it is simply a consecutive list
of Tree Node Descriptors and the Length field represents the total number
of bytes of the file's information that is described by all data blocks in
or under the SubTree.
SubTrees can be nested any number of levels deep, but whenever possible
should be balanced to minimize the amount of recursion.

Sparse files: a region of a file is considered sparse if it contains no
data. This can occur, for example, if a new file is created, then a seek if
performed to 1000 bytes into the file, and then a single byte is written
and the file is closed. The total length of the file would be 1001 bytes,
even though only 1 byte of information is actually stored in the file; the
first 1000 bytes are considered sparse. The Phoenix File System is
efficient is storing files with sparse data by making note of the condition
using a Tree Node Descriptor. The Type indicates the region of the file is
a data block since it actually refers to the file's contents. The Length is
the number of bytes in the sparse region, and the Location has the special
reserved value of 0. (Location value 0 is considered reserved in the File
Allocation Layer because volume sector 0 will always hold the Volume
Descriptor for the volume). There cannot exist a sparse subtree as it would
be illogical. Any information read from a sparse file region will always be
0. A sparse region 0 bytes in length is used to indicate a Tree Node
Descriptor is not in use (entire Tree Node Descriptor is all zeros).

The basic properties of a file are described using a File Node with the
following structure:

  Structure of a File Node:
  Offset   Size           Field Name     Description
   00h     DWORD          MagicNumber    Special number to help discern a File Node from other
                                         data should the file system become corrupt, equal to
                                         31534650h ("PFS1")
   04h     DWORD          HardLinks      Number of references made from Directories to this file
   08h     DWORD          Flags          Basic file attributes
             bit      0 : ArchiveFlag    Set whenever the LastModified field is updated
             bit      1 : SystemFlag     Indicates file is an operating-system related file
             bit      2 : HiddenFlag     Indicates file should not be listed in default file listings
             bit      3 : ReadOnlyFlag   Indicates file cannot be written to or deleted
             bits   4-7 : (reserved)     must be 0
             bit      8 : ImmediatePurge Indicates whether file should be immediately purged on delete
             bits  9-15 : (reserved)     must be 0
                                     --- end general flags, begin internal flags ---
             bits 16-18 : FileType       Type definition for file data
                                           000 = generic file
                                           001 = directory
                                           010 = symbolic link
             bit     19 : (reserved)     must be 0
             bit     20 : Compression    reserved for file data compression flag; must be 0
             bit     21 : Encryption     reserved for file data encryption flag; must be 0
             bit     22 : DeletedFlag    Indicates whether or not this file has been deleted
             bit     23 : PurgeFlag      Indicates whether or not this file is to be purged
             bits 24,25 : InternalFlag   Indicates what file information, if any, is stored
                                         internally if the File Node
                                           00 = no internal data
                                           01 = internal rights information
                                           10 = internal extended attributes
                                           11 = internal file data
             bits 26,27 : (reserved)     must be 0
             bit     28 : NodeSize       Indicates whether or not the File Node occupies the full
                                         volume sector
                                           0 = File Node is half the size of the volume sector
                                           1 = File Node occupies the entire volume sector
             bits 29-31 : (reserved)     must be 0
   0Ch     DWORD          Owner          Object ID of owner of this File Node
   10h     DWORD          Creator        Object ID which created this File Node
   14h     DWORD          Modifier       Object ID of user who last modified File Node
   18h     DWORD          Created        Date/Time File Node was created
   1Ch     DWORD          LastModified   Date/Time File Node was last modified
   20h     DWORD          DataAccessed   Date/Time file data last accessed
   24h     DWORD          DataModified   Date/Time file data last modified
   28h     QWORD          FileSize       Total length of file data
   30h     1 TND          Rights         Rights list data tree
   40h     2 TNDs         EAs            Extended Attributes data tree
   60h     7 TNDs         FileData       File contents data tree
   D0h     x BYTES        InternalData   minimum of 48 bytes of space specifically set aside for
                                         storing small amounts of data inside the File Node
                                         without using data trees. The InternalFlag determines
                                         which information, if any, is stored internally.
  ----------------
  Total: 256 bytes minimum

A file node is always at most 1 volume sector in size and at least 256
bytes in size, as such, the amount of space reserved for internal data with
a File Node can vary from 48 bytes to VolumeSectorSize-208 bytes in size. A
File Node may occupy an entire sector or only half of a sector, the latter
only being valid for VolumeSectorSizes of 512-bytes or more (since one half
of 512 bytes is 256 bytes, the minimum size of a File Node). Furthermore,
each File Node is identified using a File Node Number of which bits 1-63
indicate the volume sector number the File Node resides in, and bit 0 is
clear if the File Node is in the first half of the sector and is 1 if the
File Node is in the second half of the sector. The following table
summarizes the minimum and maximum sizes of File Nodes and the amount of
space reserved in each File Node for internal data, based on the size of a
volume sector.

                                           Minimum         Maximum
                  Minimum FileMaximum File Internal Data   Internal Data
VolumeSectorSize  Node Size   Node Size    Reserve         Reserve
        512 bytes    256 bytes   512 bytes        48 bytes        304 bytes
       1024 bytes    512 bytes  1024 bytes       304 bytes        816 bytes
       2048 bytes   1024 bytes  2048 bytes       816 bytes       1840 bytes
       4096 bytes   2048 bytes  4096 bytes      1840 bytes       3888 bytes
       8192 bytes   4096 bytes  8192 bytes      3888 bytes       7984 bytes
      16384 bytes   8192 bytes 16384 bytes      7984 bytes      16176 bytes

When data is stored internally, the field(s) that would ordinarilly used to
store Tree Node Descriptor(s) for the given data are instead overwritten
with a portion of the internal data. This can be safely done since the
InternalFlag identifies which data is stored internally, and if the data is
being stored internally, then the external data Tree Node Descriptor
field(s) are not used, thereby leaving them available to store additional
internal data. The space normally reserved for the Tree Node Descriptors is
first used in the storage of internal data before utilizing the space
specifically reserved for internal data. If the Rights List or Extended
Attributes are stored internally, the first two bytes of the internal data
are the length of the remaining internal data in bytes; the remainder of
the internal data is the actual information to be stored internally. In the
case of internal file data, the FileSize field can be consulted to
determine the amount of internal data, and as such two bytes are not
prepended to the internal data. The following table shows the maximum
amount of internal data that can be stored in a File Node utilizing this
optimization; it should be noted that this efficient use of File Node space
is not a feature that can be optionally implemented, but is a standard
component of the Phoenix File System.

                  Maximum       Maximum      Maximum Internal Maximum
                  Internal Data Internal     Extended         Internal
VolumeSectorSize  Reserve       Rights Info  Attributes       File Data

        512 bytes     304 bytes     318 bytes  334 bytes (336     416 bytes
                                 (320 total)          total)

       1024 bytes     816 bytes     830 bytes  846 bytes (848     928 bytes
                                 (832 total)          total)

       2048 bytes    1840 bytes    1854 bytes1870 bytes (1872    1952 bytes
                                (1856 total)          total)

       4096 bytes    3888 bytes    3902 bytes3918 bytes (3920    4000 bytes
                                (3904 total)          total)

       8192 bytes    7984 bytes    7998 bytes8014 bytes (8016    8096 bytes
                                (8000 total)          total)

      16384 bytes   16176 bytes   16190 bytes     16206 bytes   16388 bytes
                                (16192 total)   (16208 total)

A Rights list is maintained to determine which users or groups of users
have access to the information stored in the file data, and Extended
Attributes are maintained to give system add-ons a storage area for
additional file properties.

Directory files: Directories are special files which hold a list of files.
This is used to provide a hierarchical organization to the file system.
Every file must have an entry in at least 1 directory (or else the file
cannot be accessed from the file system which is bad). A file's HardLinks
is the number of directories in which the file is listed. A directory can
hold any type of file including symbolic links and other directories. Every
entry in the directory must have an associated file name, unique in the
directory, for the user to discern which files are which. File names are
case-sensitive Unicode strings, null-terminated. File names may contain any
defined Unicode character except:

   * a forward slash (/)
   * a backward slash (\)
   * a control character (first 32 characters of Unicode set)

All directories must contain a "." and a ".." entry corresponding to the
directory file itself and the directory file's parent directory
respectively. The exact format of each directory entry is:

  Structure of a Directory entry:
  Offset   Size           Field Name     Description
   00h     QWORD          FileNode       File node of file associated with entry
   08h     WORD           NameLength     Length of file name in Unicode characters
   0Ah     x bytes        Name           Name associated with entry
  ----------------
  Total: 10+x bytes minimum

These entries are packed back-to-back in the directory file.

Symbolic links: Symbolic links are special files which simply hold the path
to another file and the method to be used to retrieve it. Symbolic links
can, in this manner, point to other files including generic files,
directories, other symbolic links, or network resources. Symbolic links,
unlike hard links, can refer to files located on volumes other than the one
on which it resides as well as remote systems. The file data of a symbolic
link holds the path to the other file or resource in URI format. URI format
includes the method (protocol) to retrieve to data, and the location of the
data to retrieve. The only protocol required to be implemented is the file
method. The file protocol simply indicates that the referenced file is
located on the local system, the location is simply the full path to the
file. Other protocols such as http and ftp may optionally be implemented by
the operating system. If no protocol is specified, file is assumed and
paths relative to the current working directory can be used rather than
full paths.
A symbolic link has its own set of flags and extended attributes, its own
owner, its own creator, etc. The Rights for a symbolic link determine
access to the link itself, a separate rights access check is performed on
the file the symbolic link refers to.

Rights list format: The rights list consists of an array of Rights List
Entries, where each entry specifies the rights for the file node for one
user or group Object ID. The number of entries in the rights list is
determined by the length of the rights list as stored in the file node;
simply divide the length of the Rights by the size of a single
RightsListEntry to get the number of entries. Each Rights List Entry has
the format:

  Structure of a Rights List Entry:
  Offset   Size           Field Name     Description
   00h     DWORD          User           Object ID of user or group to whom the rights pertain
   04h     DWORD          Rights         Determines what rights the User has to this file node
             bit      0 : Find           user may read file node information
             bit      1 : Read           user may read file data
                                         For generic files  : may read the "contents" of the file
                                         For directories    : may scan the directory contents
                                         For symbolic links : may see where the link points to
             bit      2 : ReadRights     user may read the rights information of the file node
             bit      3 : ChangeRights   user may modify the rights information in the file node
             bit      4 : ChangeEAs      user may modify the extended attributes of the file node
             bit      5 : ChangeOwner    user may change the ownership of a file node
             bit      6 : ChangeFlags    user may modify the file node's general flags
             bit      7 : Unlink         user may remove a link to this file node
             bit      8 : Write          user may modify the file data of a generic file
             bit      9 : Redirect       user may change the file data of a symbolic link
             bit     10 : Create         user may add an entry to a directory (modify directory file data)
             bit     11 : Rename         user may rename an entry in a directory (modify directory file data)
             bit     12 : Remove         user may remove an entry from a directory (modify directory file data)
             bits 13-23 : (reserved)     must be 0
                     24 : Supervisor     user has all rights to the file node
             bits 25-30 : (reserved)     must be 0
             bit     31 : InheritMask    (1) rights list entry is an inherited rights mask
                                         (0) rights list entry contains access rights
  ----------------
  Total: 8 bytes

Rights for a given user are resolved using the fully qualified path of the
file in question. The rights list of the file are scanned for an entry
explicitly defining the rights of the user in question, if an entry is
found then it completely determines the user's rights. If not, the parent
directory's rights list is scanned for an entry explicitly defining the
rights of the user, if an entry is found then it determines the user's
rights, modifiable by an inherited rights mask. If an entry is not found,
the process is continued up the directory tree until the root is reached.
If the root is reached and no rights were ever defined, then the entire
process is repeated per group the user belongs to. If no rights entries are
found to be applicable to the user, then the user is presumed to have no
rights for the given file node. The fact that directories which constitute
the fully qualified path to the file can determine a file's access rights
illustrates that rights for a file may be inherited. In addition, this also
illustrates how through the use of inheritance, two directory entries which
are linked to the same file node may actually have different access rights.
Even though the rights list for the file node is the same for both
directory entries (since both entries actually refer to the same file), the
inherited rights can differ.

In addition to defining access rights for user and group objects, Rights
List Entries can be used to store Inherited Rights Masks limiting the
amount of access that can be inherited from directories composing the fully
qualified path to a file node. Usually Inherited Rights Masks are defined
for groups of users, however it is possible to also have Inherited Rights
Masks per user which override masks per group. The full Inherited Rights
Mask for a file node is determined in a similar manner as a user's access
rights for the file node. The effective Inherited Rights Mask is
initialized to all bits set, then the rights list of each directory up the
directory tree from the file node is scanned for Inherited Rights Mask
entries pertaining to the user and any group the user belongs to. For each
entry found the rights mask is logically AND'ed with the current value of
the effective Inherited Rights Mask to form the new value for the effective
Inherited Rights Mask. Note that only directories higher up the directory
tree than the file in question are scanned. When the root of the directory
tree is reached, or when the effective Inherited Rights Mask becomes 0, the
effective Inherited Rights Mask is complete. This mask is then logically
AND'ed with a user's access rights whenever there is not an rights list
entry in the file node explictly defining the user's access to that file
node. For example, consider the following set of access rights and
Inherited Rights Masks for a given user (for simplicity, this example does
not include use of groups):

  File or Directory             Access Rights    Inherited Rights Mask   effective Inherited Rights Mask
  /example/of/rights                                                     11111111...11
  /example/of                                    11110111...11           11110111...11
  /example                      11011111...01    10011111...11           10010111...11
  /                             00001100...00                            10010111...11

  So the given user's access to the file node linked to /example/of/rights would be
                      Access Rights : 11011111...01
    effective Inherited Rights Mask : 10010111...11 AND
                                     --------------
     user's effective access rights : 10010111...01

One thing worth pointing out is that a user's access rights are determined
by a single rights list entry either located in the file node's rights list
or in a directory's rights list which forms the fully qualified path to the
given file. On the other hand, a user's Inheritied Rights Mask is
determined by AND'ing appropriate Inherited Rights Mask entries from each
directory that forms the fully qualified path to the given file.
Futhermore, a effective Inheritied Rights Mask is only applied when the
user inherited their rights to a file from a directory above the file (ie.
the given file did not explicitly list access rights for the given user or
a group that the user belongs to).

Extended Attributes format: Extended Attributes are stored individually in
packets describing the information stored by the attribute. These packets
are stored linearly and are aligned on DWORD boundaries. The format of each
Extended Attribute packet is:

  Structure of an Extended Attribute:
  Offset   Size           Field Name     Description
    00h    BYTE           NameLength     Length of attribute name (x)
    01h    BYTE           Type           Base data type stored in EA
                                           0 = Character (8 bits)
                                           1 = Short Integer Value (8 bits)
                                           2 = Integer Value (32 bits)
                                           3 = Integer Value (64 bits)
                                           8 = Floating Point Value (64 bits)
    02h    WORD           ValueLength    Length of attribute value data
    04h    x BYTES        Name           The name of the Extended Attribute
  x+04h    y BYTES        Value          The EA value; format is determined by the data type
  ----------------
  Total: x+y+4 bytes

Where x is equal to the length of the attribute name rounded to a multiple of 4 bytes and
y is equal to the length of the actual attribute value data.

While a value of 0 in the ValueLength field is acceptable, the NameLength
may never be 0. Floating point numbers are stored as a double-precision
ANSI/IEEE Standard 754-1985 binary floating point value ("64-bit real").
Any number of values of the base data type may be stored in a single
extended attributes's value data (allowing "arrays" of data stored in a
single extended attribute). Each entry in the attribute's value can be
referenced given an index. The number of entries of the base data type that
are stored in the extended attribute's value can be determined by examining
the ValueLength field; the ValueLength must always be a multiple of the
size of the base type stored. For example, a 9 character null-terminated
string can be stored in an extended attribute by specifying a base Type of
character and a ValueLength of 10 (9 characters and a null character). All
strings should be stored null-terminated using the character data type. The
file system should provide API functions to read and write null-terminated
strings as extended attribute values. The file system can also perform
bounds checking on requests to read from an index into the value data that
is invalid by comparing the index with the number of entries actually
present.

Bad Sector List: One File Node is used to maintain a file which holds a
list of all known bad sectors on the storage medium. The File Node
indicates a generic file with Hidden and System attributes set; no
encryption or compression will ever be allowed on the Bad Sector List. At
present, Extended Attributes and Rights information is undefined for this
file. The format of the file is simply a series of QWORD values specifing
the Volume Sector Numbers of any bad sectors. Since the Bad Sector List is
stored as a file, the proper procedure to update the list is to first make
the bad sector as being in-use by setting the appropriate bit in a Free
Space Descriptor before writing to the Bad Sector List. This ensures that
should the Bad Sector List require an additional sector of disk space in
order to hold the new value, it does not allocate the sector already found
bad! There is one exception to this policy: should the sector found bad be
a sector normally reserved for a Free Space Descriptor, the Free Space
Descriptor may simply be relocated to the next available sector, this
leaves a "hole" where the bad sector is not accounted for by any Free Space
Descriptor... that is alright because we cannot allocate a sector not
accounted for by a Free Space Descriptor and so updating the Bad Sector
List cannot possibly cause the list to expand into the bad sector.

Ideally, and this is not required in the implementation of the Phoenix File
System, the Bad Sector List File Node should not be referenced from any
directory (the hard link count should always be 0) and special system calls
should be provided to add bad sectors to the bad sector list rather than
writing to the file using standard system calls. The location of the Bad
Sector List is always specified in the SuperBlock for the volume.

Should a sector which holds part of the Bad Sector List data become bad
itself, the sector should be marked as in-use in the appropriate Free Space
Descriptor, the remaining sectors (if any) should be reassembled as best as
possible into an incomplete Bad Sector List, and then a full scan of the
Volume should be performed to determine the remaining Bad Sector List
entries and to possibly find new ones.

Directory Structure: path naming conventions, location of kernel, etc. put
here. ie. How paths are formed.

The SuperBlock I'm just going to make notes of what fields that will need
to be in the superblock as we think of them, we can go back and define the
format later.

  Location of Free Space Descriptor Table
  Bad Block List File Node number
  Root Directory File Node number

  File System Version Number
  Number of bytes of reserved space in File Nodes for internal data
  A Features DWORD, now is reserved(0), but later on bits may indicate advanced features
  Date/Time created
  Volume sectors around which to locate directories (center of directory band(s)), up to 16
  Minimum acceptable percentage of volume sectors that may be free.
? Volume creator (who partitioned it and/or formatted it) ?

Backup copies of the SuperBlock are placed immediately after every 16th
Free Space Descriptor on the volume. Any writes to the SuperBlock do not
complete until all backup copies are also updated.
---------------------------------------------------------------------------
Phoenix File System Specification written and maintained by John Baldwin
and Kelly Yancey
