To run fully virtualized guests, host CPU support is needed. This is typically referred to as Intel VT, or AMD-V. To check for Intel VT support look for the ‘vmx’ flag, or for AMD-V support check for ‘svm’ flag:


# grep vmx /proc/cpuinfo 
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
AMD # grep svm /proc/cpuinfo
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy

If you have the ‘svm’ or ‘vmx’ flags, then your CPU is capable of fully-virt.

Linux kernel 2.6 has improved memory subsystem, with which administrators now have a simple interface to finetune the swapping behavior of the kernel.  The parameter stored in /proc/sys/vm/swappiness can be used to define how aggressively memory pages are swapped to disk. Linux moves memory pages that have not been accessed for some time to the swap space even if there is enough free memory available. By changing the percentage in /proc/sys/vm/swappiness you can control the swapping behavior, depending on the system configuration. If swapping is not desired, /proc/sys/vm/swappiness should have low values. Systems with memory constraints that run batch jobs (processes that sleeps for long time) might benefit from an aggressive swapping behavior.

To change swapping behavior, use either echo or sysctl

# sysctl -w vm.swappiness=90

Tuning the Linux memory subsystem is a tough task that requires constant monitoring to ensure that changes do not negatively affect other components in the server. If you do choose to modify the virtual memory parameters (in /proc/sys/vm), change only one parameter at a time and monitor how the server performs.

What is Direct I/O

Tháng Bảy 10, 2008

A file is simply a collection of data stored on media. When a process wants to access data from a file, the operating system brings the data into main memory, the process reads it, alters it and stores to the disk . The operating system could read and write data directly to and from the disk for each request, but the response time and throughput would be poor due to slow disk access times. The operating system therefore attempts to minimize the frequency of disk accesses by buffering data in main memory, within a structure called the file buffer cache.


Certain applications derive no benefit from the file buffer cache. Databases normally manage data caching at the application level, so they do not need the file system to implement this service for them. The use of a file buffer cache results in undesirable overheads in such cases, since data is first moved from the disk to the file buffer cache and from there to the application buffer. This “doublecopying” of data results in more CPU consumption and adds overhead to the memory too.

For applications that wish to bypass the buffering of memory within the file system cache, Direct I/O is provided. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Direct I/O can be used for a file either by mounting the corresponding file system with the direct i/o option (options differs for each OS), or by opening the file with the O_DIRECT flag specified in the open() system call.  Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice – first between the disk and the file buffer cache, and then from the file.However, there are also few performance impacts when direct i/o is used. Direct I/O bypasses filesystem read-ahead – so there will be a performance impact.

The steps for creating network boding in Linux is available in RHEL bonding supports 7 possible “modes” for bonded interfaces. These modes determine the way in which traffic sent out of the bonded interface is actually dispersed over the real interfaces. Modes 0, 1, and 2 are by far the most commonly used among them. 

  • Mode 0 (balance-rr)
    This mode transmits packets in a sequential order from the first available slave through the last. If two real interfaces are slaves in the bond and two packets arrive destined out of the bonded interface the first will be transmitted on the first slave and the second frame will be transmitted on the second slave. The third packet will be sent on the first and so on. This provides load balancing and fault tolerance. 

  • Mode 1 (active-backup)
    This mode places one of the interfaces into a backup state and will only make it active if the link is lost by the active interface. Only one slave in the bond is active at an instance of time. A different slave becomes active only when the active slave fails. This mode provides fault tolerance. 

  • Mode 2 (balance-xor)
    Transmits based on XOR formula. (Source MAC address is XOR’d with destination MAC address) modula slave count. This selects the same slave for each destination MAC address and provides load balancing and fault tolerance. 

  • Mode 3 (broadcast)
    This mode transmits everything on all slave interfaces. This mode is least used (only for specific purpose) and provides only fault tolerance. 

  • Mode 4 (802.3ad)
    This mode is known as Dynamic Link Aggregation mode. It creates aggregation groups that share the same speed and duplex settings. This mode requires a switch that supports IEEE 802.3ad Dynamic link. 

  • Mode 5 (balance-tlb)
    This is called as Adaptive transmit load balancing. The outgoing traffic is distributed according to the current load and queue on each slave interface. Incoming traffic is received by the current slave. 

  • Mode 6 (balance-alb)
    This is Adaptive load balancing mode. This includes balance-tlb + receive load balancing (rlb) for IPV4 traffic. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the server on their way out and overwrites the src hw address with the unique hw address of one of the slaves in the bond such that different clients use different hw addresses for the server.

Network bonding

Tháng Bảy 10, 2008

Bonding is creation of a single bonded interface by combining 2 or more ethernet interfaces. This helps in high availability and performance improvement.

Steps for bonding in Fedora Core and Redhat Linux

Step 1.

Create the file ifcfg-bond0 with the IP address, netmask and gateway. Shown below is my test bonding config file.

$ cat /etc/sysconfig/network-scripts/ifcfg-bond0

IPADDR=192.168. 1.12
NETMASK=255. 255.255.0
GATEWAY=192. 168.1.1

Step 2.

Modify eth0, eth1 and eth2 configuration as shown below. Comment out, or remove the ip address, netmask, gateway and hardware address from each one of these files, since settings should only come from the ifcfg-bond0 file above.

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0

# Settings for Bond

$ cat /etc/sysconfig/network-scripts/ifcfg-eth1

# Settings for bonding
$ cat /etc/sysconfig/network-scripts/ifcfg-eth2


Step 3.

Set the parameters for bond0 bonding kernel module. Add the following lines to /etc/modprobe. conf

# bonding commands
alias bond0 bonding
options bond0 mode=balance- alb miimon=100

Step 4.

Load the bond driver module from the command prompt.

$ modprobe bonding

Step 5.

Restart the network, or restart the computer.

$ service network restart # Or restart computer

When the machine boots up check the proc settings.

$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.0.2 (March 23, 2006)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:13:72:80: 62:f0

Look at ifconfig -a and check that your bond0 interface is active. You are done!


Tháng Bảy 9, 2008

File system divided into two categories:

User data – stores actual data contained in files
Metadata – stores file system structural information such as superblock, inodes, directories

Let us take an example of 20 GB hard disk. The entire disk space subdivided into multiple file system blocks. And blocks used for what?

The blocks used for two different purpose:

Most blocks stores user data aka files (user data).
Some blocks in every file system store the file system’s metadata. So what the hell is a metadata?
In simple words Metadata describes the structure of the file system. Most common metadata structure are superblock, inode and directories. Following paragraphs describes each of them.

Superblock – Each file system is different and they have type like ext2, ext3 etc. Further each file system has size like 5 GB, 10 GB and status such as mount status. In short each file system has a superblock, which contains information about file system such as:

File system type
Information about other metadata structures
If this information lost, you are in trouble (data loss) so Linux maintains multiple redundant copies of the superblock in every file system. This is very important in many emergency situation, for example you can use backup copies to restore damaged primary super block. Following command displays primary and backup superblock location on /dev/sda3:

# dumpe2fs /dev/hda3 | grep -i superblock

Primary superblock at 0, Group descriptors at 1-1
Backup superblock at 32768, Group descriptors at 32769-32769
Backup superblock at 98304, Group descriptors at 98305-98305
Backup superblock at 163840, Group descriptors at 163841-163841
Backup superblock at 229376, Group descriptors at 229377-229377
Backup superblock at 294912, Group descriptors at 294913-294913

Increasing swap performance

Tháng Bảy 9, 2008

If your Linux machine is configured with several swap partitions, the default trend is to use of one partition at a time. Once one swap partition has been full the next will be used. This is not always the best method for performance, since when a new process needs to be swapped to disk it may be forced to wait until another process is swapped out.

For instance, if there are two swap partitions specified in /etc/fstab it will look something like this:

/dev/sda2    swap     swap    defaults           0 0
/dev/sda3    swap     swap    defaults           0 0

Change the mount options section from ”defaults” to ”pri=0” :

/dev/sda2    swap      swap    pri=0           0 0
/dev/sda3 swap swap pri=0 0 0
If you want to do this in a live system, then swapoff and swapon with “-p 0” option for each swap device – one by one. Once this has been done the system will be able to access any of the designated swap partitions independently of the others. This can increase the swap performance of a machine which is regularly swapping memory to disk. However it is important to bear in mind that in most situations a machine should not make heavy use of swap partitions.

Linux RPM database recovery

Tháng Bảy 9, 2008

Sometimes you may get the below error while running “rpm -qa” or any other rpm listing command. This error suggests that the RPM db is corrupted and you need to clean it up.

rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from dbenv->open: DB_RUNRECOVERY

Here are the steps to do it.

  1. Remove the rpm db lock information files

    rm -f /var/lib/rpm/__db*

  2. Run the rpm rebuild database command . The rpm –rebuilddb option will rebuild the database indices from the installed RPM package headers. This process will take a while to complete. Once it is completed, the rpm -qa and other rpm listing commands will work.

    rpm –rebuilddb

To illustrate the capabilities of the Linux file system layer (and the use of mount), create a file system in a file within the current file system. This is accomplished first by creating a file of a given size using dd (copy a file using /dev/zero as the source) — in other words, a file initialized with zeros, as shown in Listing 1.
Listing 1. Creating an initialized file

$ dd if=/dev/zero of=file.img bs=1k count=10000
10000+0 records in
10000+0 records out


You now have a file called file.img that’s 10MB. Use the losetup command to associate a loop device with the file (making it look like a block device instead of just a regular file within the file system):

$ losetup /dev/loop0 file.img


With the file now appearing as a block device (represented by /dev/loop0), create a file system on the device with mke2fs. This command creates a new second ext2 file system of the defined size, as shown in Listing 2.
Listing 2. Creating an ext2 file system with the loop device

$ mke2fs -c /dev/loop0 10000
mke2fs 1.35 (28-Feb-2004)
max_blocks 1024000, rsv_groups = 1250, rsv_gdb = 39
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
2512 inodes, 10000 blocks
500 blocks (5.00%) reserved for the super user


The file.img file, represented by the loop device (/dev/loop0), is now mounted to the mount point /mnt/point1 using the mount command. Note the specification of the file system as ext2. When mounted, you can treat this mount point as a new file system by doing using an ls command, as shown in Listing 3.
Listing 3. Creating a mount point and mounting the file system through the loop device

$ mkdir /mnt/point1
$ mount -t ext2 /dev/loop0 /mnt/point1
$ ls /mnt/point1


As shown in Listing 4, you can continue this process by creating a new file within the new mounted file system, associating it with a loop device, and creating another file system on it.
Listing 4. Creating a new loop file system within a loop file system

$ dd if=/dev/zero of=/mnt/point1/file.img bs=1k count=1000
1000+0 records in
1000+0 records out
$ losetup /dev/loop1 /mnt/point1/file.img
$ mke2fs -c /dev/loop1 1000
mke2fs 1.35 (28-Feb-2004)
max_blocks 1024000, rsv_groups = 125, rsv_gdb = 3
Filesystem label=
$ mkdir /mnt/point2
$ mount -t ext2 /dev/loop1 /mnt/point2
$ ls /mnt/point2
$ ls /mnt/point1
file.img lost+found


From this simple demonstration, it’s easy to see how powerful the Linux file system (and the loop device) can be. You can use this same approach to create encrypted file systems with the loop device on a file. This is useful to protect your data by transiently mounting your file using the loop device when needed.

Nguồn :

Sometimes you may need additional local partition or local mountable drives, but you dont have free blocks in the parition table. In that case virtual filesystem helps. You can create a virtual filesystem and mount them as a loopback device. Here are the steps to do it.

  1. Create a empty file with the mount of disk space you need. Here I have created a 1G file.

    [root@unixfoo23 ~]# dd if=/dev/zero of=/root/myfs1 bs=1024 count=1048576
    1048576+0 records in
    1048576+0 records out

    [root@unixfoo23 ~]# ls -l /root/myfs1
    -rw-r–r–  1 root root 1073741824 Jun 25 08:32 /root/myfs1

    [root@unixfoo23 ~]# du -sh /root/myfs1
    1.1G    /root/myfs1
    [root@unixfoo23 ~]#

  2. Create a filesystem on the virtual device (/root/myfs1). I have selected ext3 , whereas you can create the filesystem of your choice.

    [root@unixfoo23 ~]# mkfs.ext3  /root/myfs1
    mke2fs 1.35 (28-Feb-2004)
    /root/myfs1 is not a block special device.
    Proceed anyway? (y,n) y
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    131072 inodes, 262144 blocks
    13107 blocks (5.00%) reserved for the super user
    First data block=0
    Maximum filesystem blocks=268435456
    8 block groups
    32768 blocks per group, 32768 fragments per group
    16384 inodes per group
    Superblock backups stored on blocks:
            32768, 98304, 163840, 229376

    Writing inode tables: done
    Creating journal (8192 blocks): done
    Writing superblocks and filesystem accounting information: done

    This filesystem will be automatically checked every 28 mounts or
    180 days, whichever comes first.  Use tune2fs -c or -i to override.

    [root@unixfoo23 ~]# file /root/myfs1
    /root/myfs1: Linux rev 1.0 ext3 filesystem data (large files)
    [root@unixfoo23 ~]#

  3. Mount the filesystem as a loopback device.

    [root@unixfoo23 ~]# mount -o loop /root/myfs1 /mnt

    [root@unixfoo23 ~]# df /mnt
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /root/myfs1            1032088     34092    945568   4% /mnt
    [root@unixfoo23 ~]#

  4. If you need this permanantly , you can add this to /etc/fstab.

You can even create this virtual filesytem on a nfs mounted directory and again loopback-mount it on the machine.
Nguồn :