Summary: This release adds support for offline deduplication in Btrfs, automatic GPU switching in laptops with dual GPUs, a performance boost for AMD Radeon graphics, better RAID-5 multicore performance, improved handling of out-of-memory situations, improved VFS path name resolution scalability, improvements to the timerless multitasking mode, separate modesetting and rendering device nodes in the graphics DRM layer, improved locking performance for virtualized guests, XFS directory recursion scalability improvements, IPC scalability improvements, tty layer locking improvements, new drivers and many small improvements.
- Prominent features
- Offline data deduplication support in Btrfs
- Graphic performance boost for AMD Radeon hardware
- Automatic GPU switching in laptops with dual GPUs
- Separate devices nodes for graphics mode setting and rendering
- Improved timerless multitasking: allow timekeeping CPU go idle
- RAID5 multithreading
- Improved locking performance for virtualized guests
- New lockref locking scheme, VFS locking improvements
- Better Out-Of-Memory handling
- XFS directory recursion scalability, namespace support
- Improved tty layer locking
- IPC locking improvements
- Drivers and architectures
- Core
- Memory management
- Block layer
- File systems
- Networking
- Crypto
- Virtualization
- Security
- Tracing/perf
- Other news sites that track the changes of this release
1. Prominent features
1.1. Offline data deduplication support in Btrfs
The Btrfs filesystem has gained support for offline data deduplication. Deduplication consists in removing copies of repeated data in the filesystem, since the data is the same only one copy is necessary. In some particular workloads, like virtualization VMs -which often contain similar copies of operating systems- or backups the gains can be enormous. By "offline", it means that the deduplication process is done when the file system is mounted and running, but it's not done automatically and transparently as processes write data, but rather it's triggered by userspace software at a time controlled by the system administrator. Online deduplication will be added future releases.
The bedup deduplication tool has a branch that works against this support. The branch can be found
here.
The author of the deduplication support has also written an sample deduplication tool, duperemove, which can be found
here.
1.2. Graphic performance boost for AMD Radeon hardware
The website
Phoronix.com found that graphic performance in modern AMD Radeon GPUs had improved a lot in Linux 3.12. However, there hasn't been any important modification on in the Radeon driver that can cause such massive gains. After
some investigation, Phoronix found out that the responsible change for this boost wasn't a change in the Radeon driver itself, but
a change to the algorithms in the cpufreq ondemand governor. Apparently, the ondemand governor was oscillating too much between frequencies, and this oscillation harmed graphic performance for Radeon GPUs. The new frequency algorithm eliminates this problem.
1.3. Automatic GPU switching in laptops with dual GPUs
Some laptop hardware, like Nvidia Optimus, have two GPUs, one optimized for performance and other for power saving. Until now, some hacks have been needed to switch between these GPUs. In this release, the driver handles the switch automatically
1.4. Separate devices nodes for graphics mode setting and rendering
Recent hardware development (especially on ARM) shows that rendering (via GPU) and mode-setting (via display-controller) are not necessarily bound to the same graphics device. This release incorporates in the graphics layer support for separate device nodes for mode setting and rendering. The main usage is to allow different access-modes for graphics-compositors (which require the modeset API) and client-side rendering or GPGPU-users (which both require the rendering API).
1.5. Improved timerless multitasking: allow timekeeping CPU go idle
Linux 3.10 added support for
timerless multitasking, that is, the ability to run processes without needing to fire up the timer interrupt that is traditionally used to implement multitasking. This support, however, had a caveat: it could turn off interrupts in all CPUs, except one that is used to track timer information for the other CPUs. But that CPU keeps the timer turned on even if all the CPUs are idle, which was useless. This release allows to disable the timer for the timekeeping CPU when all CPUs are idle.
1.6. RAID5 multithreading
This release attempts to spread the work needed to handle raid 5 stripes to multiple CPUs in the MD ("multiple devices") layer, which allows more IO/sec on fast (SSD) devices.
1.7. Improved locking performance for virtualized guests
The operating system that runs in each virtualized guest also runs its own locks. With some locks, like spinning locks, this causes problems when many guests are present and keep spinning and wasting host CPU time and other problems. This release replaces paravirtualized spinlocks with paravirtualized ticket spinlocks, which have better performance properties for virtualized guests and brings speedups on various benchmarks.
1.8. New lockref locking scheme, VFS locking improvements
This release adds a new locking scheme, called "lockref". The "lockref" structure is a combination "spinlock and reference count" that allows optimized reference count accesses. In particular, it guarantees that the reference count will be updated as if the spinlock was held, but using atomic accesses that cover both the reference count and the spinlock words, it can often do the update without actually having to take the lock. This allows to avoid the nastiest cases of spinlock contention on large machines. When updating the reference counts on a large system, it will still end up with the cache line bouncing around, but that's much less noticeable than actually having to spin waiting for the lock. This release already uses lockref to improve the scalability of heavy pathname lookup in large systems.
1.9. Better Out-Of-Memory handling
The Out-Of-Memory state happens when the computer runs out of RAM and swap memory. When Linux gets into this state, it kills a process in order to free memory. This release includes important changes to how the Out-Of-Memory states are handled, the number of out of memory errors sent to userspace and reliability. For more details see the below link.
1.10. XFS directory recursion scalability, namespace support
XFS has added support for a directory entry file type, the purpose is that readdir can return the type of the inode the dirent points to userspace without first having to read the inode off disk. Performance of directory recursion is much improved. Parallel walk of ~50 million directory entries across hundreds of directories improves significantly, from roughly 500 getdents() calls per second and 250,000 inode lookups per second to determine the inode type at roughly 17,000 read IOPS to 3500 getdents() calls per second at 16,000 IOPS, with no inode lookups at all.
This release has also added XFS support for namespaces, and has reincorporated defragmentation support for the new CRC filesystem format.
1.11. Improved tty layer locking
The tty layer locking got cleaned up and in the process a lot of locking became per-tty, which actually shows up
on some odd loads.
1.12. IPC locking improvements
This release includes improvements on the amount of contention we impose on the ipc lock (kern_ipc_perm.lock). These changes mostly deal with shared memory, previous work has already been done for
semaphores in 3.10 and
message queues in 3.11.
With these chanves, a custom shm microbenchmark stressing shmctl doing IPC_STAT with 4 threads a million times, reduces the execution time by 50%. A similar run, this time with IPC_SET, reduces the execution time from 3 mins and 35 secs to 27 seconds.
2. Drivers and architectures
3. Core
task scheduler: Implement smarter wake-affine logic
commit
seqlock: Add a new locking reader type
commit
-
initmpfs: use initramfs if rootfstype= or root= specified
commit
Lock in place mounts from more privileged users
commit
sysfs: Restrict mounting sysfs
commit
-
modules: add support for soft module dependencies
commit,
commit
Add support to aio ring pages migration
commit
Implement generic deferred AIO completions
commit
4. Memory management
-
Data writeback: add strictlimit feature. The feature prevents mistrusted filesystems (ie: FUSE mounts created by unprivileged users) to grow a large number of dirty pages before throttling.
commit
Page allocator: fair zone allocator policy
commit
-
Account anon transparent huge pages into NR_ANON_PAGES
commit
swap: change block allocation algorithm for SSD
commit
swap: make cluster allocation per-cpu
commit
swap: make swap discard async
commit
5. Block layer
-
dm cache: add data block size limits. Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 32KB and 1GB. Also, it should be a multiple of 32KB
commit
6. File systems
-
Limit the size of delayed allocation ranges, which will limit extent sizes to 128 MB
commit
Allow compressed extents to be merged during defragment
commit
Add mount option to force UUID tree checking
commit
Check UUID tree during mount if required
commit
Create UUID tree if required
commit
Fill UUID tree initially
commit
-
Add support for extent pre-caching through a new fiemap flag. This is critically important when using AIO to a preallocated file
commit,
commit
Allow specifying external journal by pathname mount option
commit
Mark block group as corrupt on block bitmap error
commit
Mark group corrupt on group descriptor checksum
commit
-
Allow specifying external journal by pathname mount option
commit
-
Add support for the Q_XGETQSTATV quotacl command
commit
Introduce object readahead to log recovery
commit
-
-
Add support for controlling the garbage collection policy
commit,
commit
-
Add compression support to pstore
commit
Add decompression support to pstore
commit
Add file extension to pstore file if compressed
commit
-
-
Implement POSIX ACLs support
commit
-
Refuse mount attempts with proto=udp
commit
-
Refuse RW mount of the filesystem instead of making it RO
commit
-
Refuse RW mount of the filesystem instead of making it RO
commit
7. Networking
tcp: TCP_NOTSENT_LOWAT socket option
commit
tcp: TSO packets automatic sizing
commit
tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies
commit
tcp: increase throughput when reordering is high
commit
tcp: prefer packet timing to TS-ECR for RTT
commit
tcp: use RTT from SACK for RTO
commit
ipv6: Add generic UDP Tunnel segmentation
commit
ipv6: drop fragmented ndisc packets by default (RFC 6980)
commit
ipv6: mld: implement RFC3810 MLDv2 mode only
commit
bridge: apply multicast snooping to IPv6 link-local, too
commit
macvlan fdb replace support
commit
Devices: export physical port id via sysfs
commit
igmp: Allow user-space configuration of igmp unsolicited report interval
commit
tcp_probe: add IPv6 support
commit
tcp_probe: allow more advanced ingress filtering by mark
commit
netfilter: add IPv6 SYNPROXY target
commit
-
Use reduced txpower for 5 and 10 MHz
commit
Add packet coalesce support
commit
Allow scanning for 5/10 MHz channels in IBSS
commit
-
-
Add vxlan tunneling support.
commit
Mega flow implementation
commit
pkt_sched: fq: Fair Queue packet scheduler
commit
pktgen: Add UDPCSUM flag to support UDP checksums
commit
qdisc: allow setting default queuing discipline
commit
tun: Add ability to create tun device with given index
commit
tun: Allow to skip filter on attach
commit
tun: Support software transmit time stamping.
commit
tuntap: hardware vlan tx support
commit
vxlan: Add tx-vlan offload support.
commit
vxlan: add ipv6 support
commit
NFC: Add a GET_SE netlink API, which dumps a list of discovered secure elements in an NFC controller
commit
Infiniband: Add receive flow steering support
commit
USBNET: Improving USB3 thoughtput
commit
8. Crypto
omap-sham - Add OMAP5/AM43XX SHAM Support
commit
omap-sham - Add SHA384 and SHA512 Support
commit
Add NEON accelerated XOR implementation
commit
Add OMAP4 random generator support
commit
9. Virtualization
Adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2
commit
vmware: Add support for virtual IOMMU
commit, Add support for virtual IOMMU in VMXNET3
commit
vfio-pci: PCI hot reset interface
commit
vfio: add external user support
commit
xen: Support 64-bit PV guest receiving NMIs
commit
Add xen tpmfront interface
commit
10. Security
-
Add an optional profile attachment string for profiles
commit
Add interface files for profiles and namespaces
commit
Add the ability to report a sha1 hash of loaded policy
commit
Add the profile introspection file to interface
commit
Allow setting any profile into the unconfined state
commit
Enable users to query whether apparmor is enabled
commit
11. Tracing/perf
-
Add option to limit stack depth in callchain dumps
commit
Add option to print stack trace on single line
commit
diff: Add generic order option for compute sorting
commit
diff: Update perf diff documentation for multiple data comparison
commit
gtk/hists: Display callchain overhead also
commit
kvm stat report: Add option to analyze specific VM
commit
-
list: Skip unsupported events
commit
perf report/top: Add option to collapse undesired parts of call graph
commit
stat: Add support for --initial-delay option
commit
symbols: Add support for reading from /proc/kcore
commit
tools: Add 'S' event/group modifier to read sample value
commit
tools: Default to cpu// for events v5
commit
tools: Make it possible to read object code from kernel modules
commit
tools: Make it possible to read object code from vmlinux
commit
top: Add --objdump option
commit
trace: Allow specifying which syscalls to trace
commit
trace: Implement -o/--output filename
commit
trace: Support ! in -e expressions
commit
12. Other news sites that track the changes of this release