Linux Filesystem
The Linux Filesystem, also referred to as the Linux file hierarchy, effectively organizes data storage on a Linux system. It follows a hierarchical structure with directories (folders) containing files and subdirectories.
A file system is a structure used to organize and manage how data is stored and retrieved on a partition or drive. It defines the way files are named, stored, and organized on the storage medium. Linux supports various file systems like ext4, XFS, Btrfs, and more. File systems are created on partitions, providing users with the ability to read, write and access files stored on the disk.
Here are some important terms:
- Drive: In Linux, a drive refers to a physical storage device like a hard disk drive (HDD) or a solid-state drive (SSD). Linux present these drive as pseudo files labeled with the prefix “sd” within the /dev directory. For instance, /dev/sda is for the first physical drive connected to a system and /dev/sdb is for the second physical drive.
- Partition: Drives are divided into partitions, and each partition will have a type of file system. A partition is a logical segment on a physical drive that divides the storage space into distinct sections, each functioning independently as if it were a separate disk. These partitions are recognized by numbers added to the drive name, such as /dev/sda1, /dev/sda2, and so on.
Common Filesystems:
ext3 (Third Extended Filesystem)
A journaled file system used in Linux that helps ensure the integrity of the filesystem in case of unexpected shutdowns or crashes. The journal records the changes that are about to be made to the filesystem, allowing for quicker recovery and reduced chances of data corruption. Ext3 is backward-compatible with its predecessor, Ext2. In addition, Ext3 supports features such as file permissions, ownership, symbolic links, and standard Unix file types
ext4 (Fourth Extended Filesystem)
This file system is widely used and popular journaling file system in Linux. Similar to Ext3, Ext4 utilizes journaling to improve reliability and recovery in the event of system crashes. Ext4 supports larger file systems and file sizes compared to Ext3. Ext4 supports online defragmentation, making it more efficient in managing and optimizing file layout on the disk without the need for offline defragmentation tools.
hpfs (High-Performance File System)
This is associated with the OS/2 operating system developed by IBM and Microsoft. It is not a Linux filesystem.
ntfs (New Technology File System)
This filesystem is developed by Microsoft for Windows operating systems. It is not a Linux-native filesystem, but Linux has support for mounting and accessing NTFS partitions
vfat (Virtual File Allocation Table)
It is a filesystem format used for compatibility with Microsoft Windows. Linux includes support for VFAT, allowing users to mount and interact with VFAT-formatted partitions. This enables the sharing of data between Linux and Windows systems on removable storage devices like USB drives.
XFS (X File System)
It is a high-performance, journaling filesystem designed for Unix-like operating systems, including Linux. Known for its scalability, it is particularly suitable for large storage systems. XFS incorporates journaling and supports advanced features such as delayed allocation for improved write performance, online resizing (allowing you to resize the filesystem while it’s mounted and active), and efficient handling of parallel I/O operations. Widely selected for scenarios demanding high-performance storage, XFS proves suitable for both personal and enterprise-level use.
btrfs (B-Tree File System)
It is a modern, copy-on-write filesystem designed for Linux. This means when data is modified, the changes are written to a new location rather than overwriting the existing data. This helps in maintaining data integrity and enables features like snapshots. Btrfs also includes features like RAID configurations, self-healing (detect and correct errors on the filesystem) and online filesystem checks and repairs without unmounting it.
ReiserFS
This is a journaled filesystem designed for Linux, ReiserFS uses a balanced tree structure to organize and store data efficiently. This design is intended to provide faster access and better performance. ReiserFS has a feature called tail packing, where small files’ data and metadata are stored within the inode structure itself, reducing storage overhead and improving efficiency for small files.
Many Linux distributions now favor other filesystems like Ext4, XFS, or Btrfs. The choice of filesystem often depends on specific use cases, requirements, and user preferences.
What is inode?
An inode, short for “index node,” is a data structure used in Unix-like file systems to store information about a file or a directory. It serves as a unique identifier for each file or directory and contains metadata that describes the attributes and characteristics of the associated file system object.
So rather than storing all file information in a central database, each file has its dedicated inode, allowing for quick and direct access to file metadata.
Journaling vs Copy-on-write Filesystem
Journaling Filesystem
In a journaling filesystem, changes to the filesystem are first recorded in a journal or log before being committed to the actual filesystem structure. The journal acts as a record of planned changes, and in the event of a system crash or unexpected shutdown, the filesystem can use the journal to quickly recover and complete or roll back the pending transactions.
Ext4 and Ext3 are examples of filesystems that use journaling.
Copy-on-Write Filesystem
In a copy-on-write (CoW) filesystem, when data is modified, the changes are not made directly to the existing data. Instead, a copy of the data is created, and modifications are made to the copy. This approach ensures that the original data remains unchanged until the entire modified block or file is written back to the filesystem. It enhances data integrity and supports features like efficient snapshots.
Btrfs and ZFS are examples of filesystems that use copy-on-write.
Which one should we choose?
Journaling aims to provide fast recovery from crashes by logging planned changes, while copy-on-write focuses on maintaining data integrity by creating copies of modified data until changes are finalized. he choice between them often depends on specific use cases, performance considerations, and the desired features of the filesystem.
Linux Directory Structure
Unlike Microsoft Windows operating system, Linux does not use drive letters in pathnames. Instead, it consolidates all physical hard drives and partitions into a unified directory structure with the root directory ( / ) at the top. All other directories and subdirectories are located within this Linux root directory.
/ (Root Directory)
This is the top-level directory, it contains essential system files, configuration files, and subdirectories. The root directory is denoted by a forward slash (/).
/boot
This directory has essential files related to the boot process. It typically includes the Linux kernel, along with other necessary files such as initial ramdisk (initramfs) and boot loader configuration files.
/bin
This folder contains binary executable files that are fundamental for the system’s basic functionality. These binaries include common commands and utilities required for system maintenance and recovery, accessible to all users.
/dev
You will find device files, representing physical and virtual devices connected to the system. These device files allow applications and users to interact with hardware components and peripherals
/etc
There are various configuration files in this directory that govern the behavior of the operating system and installed applications. Administrators use the /etc
directory to customize and manage system configurations, user accounts, network settings etc.
/lib
This directory contains essential shared library files required for the functioning of system programs and binaries. These shared libraries provide common functions and routines that multiple programs can use, promoting code reuse and efficiency.
/home
This directory serves as the default location for user home directories. Each user on the system typically has a subdirectory within /home
that houses their personal files, settings, and configuration data
/media
This directory is created to be a mount point for removable media devices such as USB drives, optical discs, and external hard drives. When a removable storage device is connected to the system, it is often automatically mounted under the /media
directory.
/mnt
This directory is often used for temporary mounts, allowing users to access external or remote file systems without disrupting the standard directory structure. It provides a location where administrators can manually mount additional storage devices or network shares.
/sbin
This directory contains essential system binaries that are crucial for system administration and maintenance tasks. These binaries in /sbin
are essential for system recovery, troubleshooting, and ensuring the proper functioning of the operating system. Unlike /bin directory, which holds binaries for general users, /sbin binaries are typically reserved for the root user and system administrators
/opt
This directory is designated for optional software or add-on packages that are not part of the default system installation. Software installed in the /opt directory often has its own subdirectories to organize libraries, executables, and other associated files.
/srv
This place is intended for storing data that is served by the system. It is commonly used to host data for services like FTP, HTTP, or other network services.
/tmp
This is for temporary files. It allows applications and users to create and store temporary data that doesn’t need to persist across reboots. The contents of /tmp are typically cleared during system startup.
/usr
This directory contains user-related files and resources, including binaries, libraries, documentation, and more. It is a significant part of the file system hierarchy, housing files that are not essential for the system’s basic functionality but are crucial for user applications and utilities. The / usr directory is often divided into subdirectories like /usr/bin for user binaries, /usr/lib for libraries, and /usr/share for shared data among applications.
/proc
This directory provides information about running processes and system status. It contains a dynamic set of entries, each representing a process or system attribute, accessible as files. Administrators and applications can read and manipulate these virtual files to gather real-time information about the system’s state and processes.
/var
This directory is used to store variable data that may change during the course of system operation. It includes files such as logs, spool files, and temporary files generated by applications and processes. The contents of /var are often dynamic and can vary, reflecting the state and activities of the system over time.
/root
This directory is the home for the superuser, also known as the root user. It serves as the default working directory for the root user, containing configuration files and settings specific to the root account. Unlike regular user home directories, which are typically located under /home , the /root
directory provides the root user with a dedicated space for managing system-related files and configurations.
The structure of Linux directories follows the Filesystem hierarchy Standard (FHS), providing a standardized framework for organizing files and directories. A majority of Linux distributions adhere to the FHS, ensuring a uniform directory layout. As a result, user can expect a seamless experience in locating files, regardless of the specific FHS-compliant Linux system they are using. Visit this website to keep up to date on the FHS standard.
Conclusion
In summary, the Linux Filesystem offers a diverse range of options, from traditional ext3 and ext4 to advanced choices like XFS and Btrfs. Understanding concepts like inodes, journaling, and copy-on-write is also important. Also, when selecting a filesystem, consider factors like performance and data integrity. Lastly, the Linux Directory Structure provides an organized framework and therefore choosing the filesystem that aligns with your needs for optimal performance in your Linux environment.