Large-file support
Large-file support is the term frequently applied to the ability to create files larger than either 2 or 4 GiB on 32-bit filesystems.
Details
Traditionally, many operating systems and their underlying file system implementations used 32-bit integers to represent file sizes and positions. Consequently, no file could be larger than 232 − 1 bytes. In many implementations, the problem was exacerbated by treating the sizes as signed numbers, which further lowered the limit to 231 − 1 bytes. Files that were too large for 32-bit operating systems to handle came to be known as large files.While the limit was quite acceptable at a time when hard disks were smaller, the general increase in storage capacity combined with increased server and desktop file usage, especially for database and multimedia files, led to intense pressure for OS vendors to overcome the limitation.
In 1996, multiple vendors responded by forming an industry initiative known as the Large File Summit to support large files on POSIX, an obvious backronym of "LFS". The summit was tasked to define a standardized way to switch to 64-bit numbers to represent file sizes.
This switch caused deployment issues and required design modifications, the consequences of which can still be seen:
- The change to 64-bit file sizes frequently required incompatible changes to file system layout, which meant that large-file support sometimes necessitated a file system change. For example, Microsoft Windows' FAT32 file system does not support files larger than 4 GiB−1; one has to use NTFS or exFAT instead.
- To support binary compatibility with old applications, operating system interfaces had to retain their use of 32-bit file sizes and new interfaces had to be designed specifically for large-file support.
- To support writing portable code that makes use of LFS where possible, C standard library authors devised mechanisms that, depending on preprocessor constants, transparently redefined the functions to the 64-bit large-file aware ones.
- Many old interfaces, especially C-based ones, explicitly specified argument types in a way that did not allow straightforward or transparent transition to 64-bit types. For example, the C functions
fseek
andftell
operate on file positions of typelong int
, which is typically 32 bits wide on 32-bit platforms, and cannot be made larger without sacrificing backward compatibility.Adoption
The problem disappeared slowly with PC and workstations moving completely to 64-bit computing. Microsoft Windows Server 2008 has been the last server version to be shipped in 32-bit. Redhat Enterprise Linux 7 was published in 2014 only as a 64-bit operating system. Ubuntu Linux stopped delivering a 32-bit variant in 2019. Nvidia stopped developing 32-bit drivers in 2018 and they stopped delivering updates after January 2019. Apple stopped developing 32-bit Mac OS versions in 2018 delivering macOS Mojave only as a 64-bit operating system. There is no end-of-life known for Windows 10 on the desktop which is related to the latest upgrades from old systems like Windows 7 & Windows 8 in January 2020 as some of those system ran on old computers built on the i386 architecture.
A similar development can be seen in the mobile area. Google required to support 64-bit versions of applications in their app store by August 2019, which allows to discontinue 32-bit support for Android later. The shift towards 64-bit started in 2014 when all new processors were designed to a 64-bit architecture and Android 5 was published in that year providing a fitting 64-bit variant of the operating system. Apple had made shift in the year before starting to produce the 64-Bit Apple A7 by 2013. Google started to deliver the development environment for Linux only in 64-bit by 2015. In May 2019 the share of Android versions below 5 had fallen to ten percent. As app developers concentrate on a single compilation variant, many manufacturers started to require Android 5 as the minimum version by mid 2019, for example Niantic. Subsequently the 32-bit versions were hard to get.
Except for embedded systems with their special programs, the consideration of varying large-file support becomes obsolete in program code after 2020.
Related problems
The year 2038 problem is well known for another case where a 32-bit "long" on 32-bit platforms will lead into problems. Just like the large-file limitation it will get obsolete when systems move to 64-bit only. In the meantime a 64-bit timestamp was introduced. In the Win32 API it is visible in functions having a "64" suffix along the earlier "32" suffix. When large-file support was added to the Win32 API it has led to functions having an additional "i64" suffix which sometimes makes for four combinations.. By comparison the UNIX98 API introduces functions with a "64" suffix when "_LARGEFILE64_SOURCE" is used.Related to the large-file API there is a limitation of block numbers for mass storage media. With a common size of 512 bytes per data block the barrier resulting from 32-bit numbers did occur later. When hard disk drives reached a size of 2 terabyte the master boot record had to be replaced by the GUID Partition Table which uses 64-bit for the LBA numbers. On Unix-like operating systems it did also require to enlarge the inode numbers which are used in some functions. The Linux kernel introduced that in 2001 leading to version 2.4 which was picked up by the glibc in that year. As the large-file support and large-disk support was introduced at the same time the GNU C Library exports 64-bit inode structures on 32-bit architectures at the same time when the Unix LFS API is activated in program code.
When the kernel moved to 64-bit inodes the file system ext3 used them internally in the driver by 2001. However the inode format on the storage media itself was stuck at 32-bit numbers. As mass storage devices moved to the Advanced Format of 4 kilobyte per block the actual limit of that file system format is at 8 or 16 terabyte. Handling larger disk partitions requires the usage of a different file system like XFS which was designed with 64-bit inodes from the start allowing for exabyte files and partitions. The first 16 terabyte magnetic disk drives were delivered by mid 2019. Solid-state drive with 32 TiB for data centers were available as early as 2016 with some manufacturers forecasting 100 TiB SSD by 2020.