Knowledge Base

Linux

2.1.1. Storage2

File Systems

Introduction to File systems

In Linux (and all UNIX-like operating systems) it is often said “Everything is a file”, or at least it is treated as such.
This means whether you are dealing with normal data files and documents, or with devices such as sound cards and printers, you interact with them through the same kind of Input/Output (I/O) operations.
This simplifies things: you open a “file” and perform normal operations like reading the file and writing on it
On many systems (including Linux), the file system is structured like a tree.
The tree is usually portrayed as inverted, and starts at what is most often called the root directory, which marks the beginning of the hierarchical file system and is also sometimes referred to as the trunk, or simply denoted by **/**.
The root directory is not the same as the root user.
The hierarchical file system also contains other elements in the path (directory names), which are separated by forward slashes (**/**), as in **/usr/bin/emacs**, where the last element is the actual file name.

File system Varieties

Linux supports a number of native file system types, expressly created by Linux developers, such as:
- ext3
- ext4
- squashfs
- btrfs

It also offers implementations of file systems used on other alien operating systems, such as those from:

Windows (ntfs, vfat)
SGI (xfs)
IBM (jfs)
MacOS (hfs, hfs+)
Many older, legacy file systems, such as FAT, are also supported.
It is often the case that more than one file system type is used on a machine, based on considerations such as the size of files, how often they are modified, what kind of hardware they sit on and what kind of access speed is needed, etc.
The most advanced file system types in common use are the journaling varieties: ext4, xfs, btrfs, and jfs. These have many state-of-the-art features and high performance, and are very hard to corrupt accidentally.

Linux Partitions

Each file system on a Linux system occupies a disk partition.
Partitions help to organize the contents of disks according to the kind and use of the data contained.
- For example, important programs required to run the system are often kept on a separate partition (known as **root** or **/**) than the one that contains files owned by regular users of that system (**/home**).
In addition, temporary files created and destroyed during the normal operation of Linux may be located on dedicated partitions.
One advantage of this kind of isolation by type and variability is that when all available space on a particular partition is exhausted, the system may still operate normally.

Mount Points

![[/Untitled 1 46.png|Untitled 1 46.png]]

Before you can start using a file system, you need to mount it on the file system tree at a mount point. This is simply a directory (which may or may not be empty) where the file system is to be grafted on. Sometimes, you may need to create the directory if it does not already exist.
WARNING__: If you mount a file system on a non-empty directory, the former contents of that directory are covered-up and not accessible until the file system is unmounted. Thus, mount points are usually empty directories.

Mounting and Un-mounting

The **mount** command is used to attach a file system (which can be local to the computer or on a network) somewhere within the file system tree. The basic arguments are the **device node** and mount point.
- For example - **sudo mount /dev/sda5 /home**will attach the file system contained in the disk partition associated with the **/dev/sda5** device node, into the file system tree at the **/home**
  mount point. There are other ways to specify the partition other than the device node, such as using the disk label or UUID.
To unmount the partition, the command would be:**sudo umount /home**
Note the command is **umount**, not unmount!
Only a root user (logged in as root, or using **sudo**) has the privilege to run these commands, unless the system has been otherwise configured.
If you want it to be automatically available every time the system starts up, you need to edit **/etc/fstab** accordingly (the name is short for file system table).
Executing **mount** without any arguments will show all presently mounted file systems.
The command **df -Th** (disk-free) will display information about mounted file systems, including the file system type, and usage statistics about currently used and available space.

NFS (Network File systems)

![[/Untitled 2 35.png|Untitled 2 35.png]]

The Client-Server Architecture of NFS

Other network filesystems include AFS (Andrew File System), and SMB (Server Message Block), also termed CIFS (Common Internet File System).

It is often necessary to share data across physical systems which may be either in the same location or anywhere that can be reached by the Internet.
A network (also sometimes called distributed) file system may have all its data on one machine or have it spread out on more than one network node.
A variety of different file systems can be used locally on the individual machines; a network file system can be thought of as a grouping of lower level file systems of varying types.
Many system administrators mount remote users’ home directories on a server in order to give them access to the same files and configuration files across multiple client systems. This allows the users to log in to different computers, yet still have access to the same files and resources.
The most common such filesystem is named simply NFS (the Network File System). It has a very long history and was first developed by Sun Microsystems**.**
Another common implementation is CIFS (also termed SAMBA), which has Microsoft roots.

NFS on the Server

On the server machine, NFS uses daemons (built-in networking and service processes in Linux) and other system servers are started at the command line by typing:

**$ sudo systemctl start nfs**

NOTE: On RHEL/CentOS 8, the service is called nfs-server__, not nfs__.
The text file **/etc/exports** contains the directories and permissions that a host is willing to share with other systems over NFS.
- A very simple entry in this file may look like the following:
  
  **/projects *.example.com(rw)**
- This entry allows the directory **/projects** to be mounted using NFS with read and write (**rw**) permissions and shared with other hosts in the example.com domain.
Every file in Linux has three possible permissions: read (r), write (w) and execute (x).
After modifying the **/etc/exports** file, you can type

**exportfs -av**
to notify Linux about the directories you are allowing to be remotely mounted using NFS.
You can also restart NFS with

**sudo systemctl restart nfs**
but this is heavier, as it halts NFS for a short while before starting it up again.
To make sure the NFS service starts whenever the system is booted, issue

**sudo systemctl enable nfs**

NFS on the Client

On the client machine, if it is desired to have the remote filesystem mounted automatically upon system boot, **/etc/fstab** is modified to accomplish this.
- For example, an entry in the client’s **/etc/fstab** might look like the following:
  
  **servername:/projects /mnt/nfs/projects nfs defaults 0 0**
You can also mount the remote filesystem without a reboot or as a one-time mount by directly using the mount command:

**$ sudo mount servername:/projects /mnt/nfs/projects**
Remember, if **/etc/fstab** is not modified, this remote mount will not be present the next time the system is restarted.
Furthermore, you may want to use the **nofail** option in **fstab** in case the NFS server is not live at boot.

Inodes

![[/Untitled 3 23.png|Untitled 3 23.png]]

Data storage in an inode vs data storage in a directory file

Filenames are not stored in inodes, they are stored in directory.

The name of a file is just a property of its inode, which is the more fundamental object.
An inode is a data structure on disk that describes and stores file attributes, including its location.
Every file which is contained in a Linux filesystem is associated with its own inode. All data about a file is contained within its inode.
The inode is used by the operating system to keep track of properties such as location, file attributes (permissions, ownership, etc.), access times and other items. Because of this, all I/O activity concerning a file usually also involves the file’s inode.

Inodes describe and store information about a file, including:

Permissions
User and group ownership
Size
Timestamps (nanosecond)
- Access time - The last time the file was accessed for any purpose
- Modification time - The last time the file’s contents were modified
- Change time - The last time the file’s inode was changed, by a change in permissions, ownership, filename, hard links, etc.

Create and manage Hard and Soft links

A directory file is a particular type of file that is used to associate file names and inodes. There are two ways to associate (or link) a file name with an inode :

Hard links point to an inode.
- They are made by using **ln** without an option.
- Two or more files can point to the same inode (hard link).
- All hard linked files have to be on the same filesystem.
- Changing the content of a hard linked file in one place may not change it in other places.
Soft (or symbolic) links point to a file name which has an associated inode.
- They are made by using **ln** with the -``**s** option.
- Soft linked files may be on different filesystems.
- If the target does not yet exist or is not yet mounted, it can be dangling.
Each association of a directory file contents and an inode is known as a link.
Because it is possible (and quite common) for two or more directory entries to point to the same inode (hard links), a file can be known by multiple names, each of which has its own place in the directory structure. However, it can have only one inode no matter which name is being used.
When a process refers to a path name, the kernel searches directories to find the corresponding inode number. After the name has been converted to an inode number, the inode is loaded into memory and is used by subsequent requests.
Normally, when you modify a file it does not break the hard links that reference the same inode. However, there are (badly written) applications that can copy a file and change it and then replace it, or delete a file and replace it, and in the process create a new file that is not linked any more. So keep your eye out for this behavior if it is not intended.

BASIS FOR COMPARISON	HARD LINK	SOFT LINK
Basic	A file can be accessed through many different names known as hard links.	A file can be accessed through different references pointing to that file is known as a soft link.
Link validation, when the original file is deleted	Still valid and file can be accessed.	Invalid
Command used for creation	`ln`	`ln -s`
inode number	Same	Different
Can be linked	To its own partition.	To any other file system even networked.
Memory consumption	Less	More
Relative Path	Not applicable	Allowed

ln /path/to/file /path/to/hardlink
ln -s /path/to/file /path/to/softlink

![[/Untitled 83.png|Untitled 83.png]]

Hard Links

The **ln** utility is used to create hard links and (with the **-s** option) soft links, also known as symbolic links or symlinks. These two kinds of links are very useful in UNIX-based operating systems.
Suppose that file1 already exists. A hard link, called file2, is created with the command:

**$ ln file1 file2**

Note that two files now appear to exist. However, a closer inspection of the file listing shows that this is not quite true.

**$ ls -li file1 file2**

The **-i** option to **ls** prints out in the first column the inode number, which is a unique quantity for each file object. This field is the same for both of these files; what is really going on here is that it is only one file, but it has more than one name associated with it, as is indicated by the 2 that appears in the ls output.
Thus, there was already another object linked to
file1 before the command was executed.
Hard links are very useful and they save space. For one thing, if you remove either file1 or file2 in the example, the inode object (and the remaining file name) will remain, which might be undesirable, as it may lead to subtle errors later if you recreate a file of that name. If you edit one of the files, exactly what happens depends on your editor; most editors, including vi and gedit, will retain the link by default, but it is possible that modifying one of the names may break the link and result in the creation of two objects.

https://www.youtube.com/watch?v=lW_V8oFxQgA

Soft (Symbolic) Links

Soft (or Symbolic) links are created with the **-s** option, as in:

**$ ln -s file1 file3**

**$ ls -li file1 file3**

Notice file3 no longer appears to be a regular file, and it clearly points to file1 and has a different inode number.
Symbolic links take no extra space on the file system (unless their names are very long). They are extremely convenient, as they can easily be modified to point to different places. An easy way to create a shortcut from your home directory to long path names is to create a symbolic link.
Unlike hard links, soft links can point to objects even on different file systems, partitions, and/or disks and other media, which may or may not be currently available or even exist. In the case where the link does not point to a currently available or existing object, you obtain a dangling link.

Comparing Files and File Types

Comparing Files with diff

**diff** is used to compare files and directories. This often-used utility program has many useful options (see: man diff) including:

`diff` Option	Usage
`-c`	Provides a listing of differences that include three lines of context before and after the lines differing in content
`-r`	Used to recursively compare sub directories, as well as the current directory
`-i`	Ignore the case of letters
`-w`	Ignore differences in spaces and tabs (white space)
`-q`	Be quiet: only report if files are different without listing the differences

To compare two files, at the command prompt, type

**diff [options] <filename1> <filename2>**.
**diff** is meant to be used for text files; for binary files, one can use **cmp**

Using `diff3` and `patch`

**diff3**

Compare 3 files at once using **diff3**, which uses one file as the reference basis for the other two.
- For example, suppose you and a co-worker both have made modifications to the same file working at the same time independently. **diff3** can show the differences based on the common file you both started with.
The syntax for **diff3** is as follows:

**$ diff3 MY-FILE COMMON-FILE YOUR-FILE**

**patch**

Many modifications to source code and configuration files are distributed utilizing patches, which are applied with the **patch** program.
A patch file contains the deltas (changes) required to update an older version of a file to the new one.
The patch files are actually produced by running **diff** with the correct options, as in:

**$ diff -Nur originalfile newfile > patchfile**
Distributing just the patch is more concise and efficient than distributing the entire file.
- For example, if only one line needs to change in a file that contains 1000 lines, the patch file will be just a few lines long.
To apply a patch, you can just do either of the two methods below:

**$ patch -p1 < patchfile**

**$ patch originalfile patchfile**
- The first usage is more common, as it is often used to apply changes to an entire directory tree, rather than just one file, as in the second example.

Using the **file** Utility

In Linux, a file’s extension often does not categorize it the way it might in other operating systems. One cannot assume that a file named file.txt is a text file and not an executable program.
In Linux, a filename is generally more meaningful to the user of the system than the system itself.
In fact, most applications directly examine a file’s contents to see what kind of object it is rather than relying on an extension. This is very different from the way Windows handles filenames, where a filename ending with .exe, for example, represents an executable binary file.
The real nature of a file can be ascertained by using the **file** utility.
For the file names given as arguments, it examines the contents and certain characteristics to determine whether the files are plain text, shared libraries, executable programs, scripts, or something else.

18. Dignosing Linux Issues 2.1.2.filesystem Hierarchy